A study on the accuracy of frequency measures and its impact on knowledge discovery in single sequences
conference contribution
posted on 2010-01-01, 00:00authored byMin Gan, Honghua Dai
In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures are appropriate for reflecting frequencies accurately? In this paper, taking three major factors (anti-monotonicity, maximum-frequency and window-width restriction) into account, we identify inaccuracies inherent in seven existing frequency measures, and investigate their impacts on the soundness and completeness of two kinds of knowledge, frequent episodes and episode rules, discovered from single sequences. In order to obtain more accurate frequencies and knowledge, we provide three recommendations for defining appropriate frequency measures. Following the recommendations, we introduce a more appropriate frequency measure. Empirical evaluation reveals the inaccuracies and verifies our findings.
History
Event
International Conference on Data Mining Workshops (10th : 2010 : Sydney, N.S.W.)
Pagination
859 - 866
Publisher
IEEE Computer Society
Location
Sydney, NSW
Place of publication
Sydney, NSW
Start date
2010-12-14
ISBN-13
9780769542577
Language
eng
Publication classification
E1 Full written paper - refereed
Copyright notice
2010, IEEE
Editor/Contributor(s)
W Fan, W Hsu, G Webb, B Liu, C Zhang, D Gunopulos, X Wu
Title of proceedings
ICDMW 2010 : Proceedings of 10th IEEE International Conference on Data Mining Workshops