Discovering frequent sets from data streams with CPU constraint

Dang, Xuan Hong, Ng, Wee-Keong, Ong, Kok-Leong and Lee, Vincent C. S. 2007, Discovering frequent sets from data streams with CPU constraint, in Data mining and analytics 2007 : proceedings of the sixth Australasian Data Mining Conference (AusDM2007), Gold Coast, Australia, 3-4 December, 2007, Australian Computer Society, Sydney, N.S.W., pp. 117-124.

Attached Files
Name Description MIMEType Size Downloads

Title Discovering frequent sets from data streams with CPU constraint
Author(s) Dang, Xuan Hong
Ng, Wee-Keong
Ong, Kok-Leong
Lee, Vincent C. S.
Conference name Australasian Data Mining Conference (6th : 2007 : Gold Coast, Queensland)
Conference location Gold Coast, Queensland
Conference dates 3-4 December 2007
Title of proceedings Data mining and analytics 2007 : proceedings of the sixth Australasian Data Mining Conference (AusDM2007), Gold Coast, Australia, 3-4 December, 2007
Editor(s) Christen, Peter
Kennedy, Paul J.
Li, Jiuyong
Kolyshkina, Inna
Williams, Graham J.
Publication date 2007
Conference series Australasian Data Mining Conference
Start page 117
End page 124
Publisher Australian Computer Society
Place of publication Sydney, N.S.W.
Keyword(s) data stream
frequent set mining
online algorithm
load shedding
error approximation
Summary Data streams are usually generated in an online fashion characterized by huge volume, rapid unpredictable rates, and fast changing data characteristics. It has been hence recognized that mining over streaming data requires the problem of limited computational resources to be adequately addressed. Since the arrival rate of data streams can significantly increase and exceed the CPU capacity, the machinery must adapt to this change to guarantee the timeliness of the results. We present an online algorithm to approximate a set of frequent patterns from a sliding window over the underlying data stream - given apriori CPU capacity. The algorithm automatically detects overload situations and can adaptively shed unprocessed data to guarantee the timely results. We theoretically prove, using probabilistic and deterministic techniques, that the error on the output results is bounded within a pre-specified threshold. The empirical results on various datasets also confirmed the feasiblity of our proposal.
ISBN 9781920682514
1920682511
Language eng
Field of Research 080604 Database Management
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1 Full written paper - refereed
Copyright notice ©2007 Australian Computer Society, Inc.
Persistent URL http://hdl.handle.net/10536/DRO/DU:30008069

Document type: Conference Paper
Collection: School of Engineering and Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Access Statistics: 380 Abstract Views, 1 File Downloads  -  Detailed Statistics
Created: Mon, 29 Sep 2008, 09:04:11 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.