You are not logged in.
Openly accessible

A novel parallel algorithm for frequent itemsets mining in massive small files datasets

Xia, D., Rong, Z., Zhou, Y., Li, Y., Shen, Y. and Zhang, Z. 2014, A novel parallel algorithm for frequent itemsets mining in massive small files datasets, ICIC express letters, part B: applications, vol. 5, no. 2, pp. 459-466.

Attached Files
Name Description MIMEType Size Downloads
zhang-novelparallelalgorithm-2014.pdf Published version application/pdf 3.16MB 69

Title A novel parallel algorithm for frequent itemsets mining in massive small files datasets
Author(s) Xia, D.
Rong, Z.
Zhou, Y.
Li, Y.
Shen, Y.
Zhang, Z.ORCID iD for Zhang, Z. orcid.org/0000-0002-8721-9333
Journal name ICIC express letters, part B: applications
Volume number 5
Issue number 2
Start page 459
End page 466
Total pages 8
Publisher ICIC International
Place of publication China
Publication date 2014
ISSN 2185-2766
Keyword(s) Big data analysis
Frequent itemsets mining
Hadoop MapReduce
Parallel FP-growth
Small files problem
Summary In big data analysis, frequent itemsets mining plays a key role in mining associations, correlations and causality. Since some traditional frequent itemsets mining algorithms are unable to handle massive small files datasets effectively, such as high memory cost, high I/O overhead, and low computing performance, we propose a novel parallel frequent itemsets mining algorithm based on the FP-Growth algorithm and discuss its applications in this paper. First, we introduce a small files processing strategy for massive small files datasets to compensate defects of low read-write speed and low processing efficiency in Hadoop. Moreover, we use MapReduce to redesign the FP-Growth algorithm for implementing parallel computing, thereby improving the overall performance of frequent itemsets mining. Finally, we apply the proposed algorithm to the association analysis of the data from the national college entrance examination and admission of China. The experimental results show that the proposed algorithm is feasible and valid for a good speedup and a higher mining efficiency, and can meet the actual requirements of frequent itemsets mining for massive small files datasets. © 2014 ISSN 2185-2766.
Language eng
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2014, ICIC International
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30072371

Document type: Journal Article
Collections: School of Information Technology
Open Access Collection
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 3 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 182 Abstract Views, 70 File Downloads  -  Detailed Statistics
Created: Thu, 16 Apr 2015, 13:48:57 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.