Deakin University
Browse

A study on the accuracy of frequency measures and its impact on knowledge discovery in single sequences

conference contribution
posted on 2010-01-01, 00:00 authored by Min Gan, Honghua Dai
In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures are appropriate for reflecting frequencies accurately? In this paper, taking three major factors (anti-monotonicity, maximum-frequency and window-width restriction) into account, we identify inaccuracies inherent in seven existing frequency measures, and investigate their impacts on the soundness and completeness of two kinds of knowledge, frequent episodes and episode rules, discovered from single sequences. In order to obtain more accurate frequencies and knowledge, we provide three recommendations for defining appropriate frequency measures. Following the recommendations, we introduce a more appropriate frequency measure. Empirical evaluation reveals the inaccuracies and verifies our findings.

History

Event

International Conference on Data Mining Workshops (10th : 2010 : Sydney, N.S.W.)

Pagination

859 - 866

Publisher

IEEE Computer Society

Location

Sydney, NSW

Place of publication

Sydney, NSW

Start date

2010-12-14

ISBN-13

9780769542577

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2010, IEEE

Editor/Contributor(s)

W Fan, W Hsu, G Webb, B Liu, C Zhang, D Gunopulos, X Wu

Title of proceedings

ICDMW 2010 : Proceedings of 10th IEEE International Conference on Data Mining Workshops

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC