Mining frequent itemsets in distorted databases with granula computing

Wang, J; Xu, C; Li, Gang

Mining frequent itemsets in distorted databases with granula computing

journal contribution

posted on 2009-06-01, 00:00 authored by J Wang, C Xu, Gang LiGang Li

Data perturbation is a popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing work: firstly, the concept of independent granule is introduced, and granule inference is used to distinguish between non-independent itemsets and independent itemsets. We further prove that the support counts of non-independent itemsets can be directly derived from subitemsets, so that the error-prone reconstruction process can be avoided. This could improve the efficiency of the algorithm, and bring more accurate results; secondly, through the granular-bitmap representation, the support counts can be calculated in an efficient way. The empirical results on representative synthetic and real-world databases indicate that the proposed GrC-FIM algorithm outperforms the popular EMASK algorithm in both the efficiency and the support count reconstruction accuracy.

History

Journal

International Journal of pattern recognition and artificial intelligence

Volume

23

Issue

4

Pagination

825 - 846

Publisher

World Scientific Publishing Co.

Location

Toh Tuck Link, Singapore

ISSN

0218-0014

eISSN

1793-6381

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

World Scientific Publishing Company

Usage metrics

Keywords

granular computing data mining frequent itemset granule inference Science & Technology Technology Computer Science, Artificial Intelligence Computer Science INFORMATION GRANULATION Artificial Intelligence and Image Processing

Mining frequent itemsets in distorted databases with granula computing

History

Journal

Volume

Issue

Pagination

Publisher

Location

ISSN

eISSN

Language

Publication classification

Copyright notice

Usage metrics

Categories

Keywords

Licence

Exports