Mining frequent itemsets in distorted databases with granula computing
Wang, Jinlong, Xu, Congfu and Li, Gang 2009, Mining frequent itemsets in distorted databases with granula computing, International Journal of pattern recognition and artificial intelligence, vol. 23, no. 4, pp. 825-846.
Attached Files
(Some files may be inaccessible until you login with your Deakin Research Online credentials)
Name
Description
MIMEType
Size
Downloads
Title
Mining frequent itemsets in distorted databases with granula computing
Data perturbation is a popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing work: firstly, the concept of independent granule is introduced, and granule inference is used to distinguish between non-independent itemsets and independent itemsets. We further prove that the support counts of non-independent itemsets can be directly derived from subitemsets, so that the error-prone reconstruction process can be avoided. This could improve the efficiency of the algorithm, and bring more accurate results; secondly, through the granular-bitmap representation, the support counts can be calculated in an efficient way. The empirical results on representative synthetic and real-world databases indicate that the proposed GrC-FIM algorithm outperforms the popular EMASK algorithm in both the efficiency and the support count reconstruction accuracy.
Language
eng
Field of Research
080109 Pattern Recognition and Data Mining
Socio Economic Objective
890205 Information Processing Services (incl. Data Entry and Capture)