Deakin University
Browse

File(s) under permanent embargo

Mining frequent itemsets in distorted databases with granula computing

journal contribution
posted on 2009-06-01, 00:00 authored by J Wang, C Xu, Gang LiGang Li
Data perturbation is a popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing work: firstly, the concept of independent granule is introduced, and granule inference is used to distinguish between non-independent itemsets and independent itemsets. We further prove that the support counts of non-independent itemsets can be directly derived from subitemsets, so that the error-prone reconstruction process can be avoided. This could improve the efficiency of the algorithm, and bring more accurate results; secondly, through the granular-bitmap representation, the support counts can be calculated in an efficient way. The empirical results on representative synthetic and real-world databases indicate that the proposed GrC-FIM algorithm outperforms the popular EMASK algorithm in both the efficiency and the support count reconstruction accuracy.

History

Journal

International Journal of pattern recognition and artificial intelligence

Volume

23

Issue

4

Pagination

825 - 846

Publisher

World Scientific Publishing Co.

Location

Toh Tuck Link, Singapore

ISSN

0218-0014

eISSN

1793-6381

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

World Scientific Publishing Company