Differentially private random forest with high utility

Rana, Santu, Gupta, Sunil and Venkatesh, Svetha 2015, Differentially private random forest with high utility, in ICDM 2015: Proceedings of the 15th IEEE International Conference on Data Mining, IEEE, Piscataway, N.J., pp. 955-960, doi: 10.1109/ICDM.2015.76.

Attached Files
Name Description MIMEType Size Downloads

Title Differentially private random forest with high utility
Author(s) Rana, SantuORCID iD for Rana, Santu orcid.org/0000-0003-2247-850X
Gupta, SunilORCID iD for Gupta, Sunil orcid.org/0000-0002-3308-1930
Venkatesh, SvethaORCID iD for Venkatesh, Svetha orcid.org/0000-0001-8675-6631
Conference name IEEE International Conference on Data Mining (15th : 2015 : Atlantic City, New Jersey)
Conference location Atlantic City, New Jersey
Conference dates 14-17 Nov. 2015
Title of proceedings ICDM 2015: Proceedings of the 15th IEEE International Conference on Data Mining
Editor(s) Aggarwal, Charu
Zhou, Zhi-Hua
Tuzhilin, Alexander
Xiong, Hui
Wu, Xindong
Publication date 2015
Start page 955
End page 960
Total pages 6
Publisher IEEE
Place of publication Piscataway, N.J.
Keyword(s) differential privacy
Decision trees
random forest
privacy preserving data mining
Summary Privacy-preserving data mining has become an active focus of the research community in the domains where data are sensitive and personal in nature. For example, highly sensitive digital repositories of medical or financial records offer enormous values for risk prediction and decision making. However, prediction models derived from such repositories should maintain strict privacy of individuals. We propose a novel random forest algorithm under the framework of differential privacy. Unlike previous works that strictly follow differential privacy and keep the complete data distribution approximately invariant to change in one data instance, we only keep the necessary statistics (e.g. variance of the estimate) invariant. This relaxation results in significantly higher utility. To realize our approach, we propose a novel differentially private decision tree induction algorithm and use them to create an ensemble of decision trees. We also propose feasible adversary models to infer about the attribute and class label of unknown data in presence of the knowledge of all other data. Under these adversary models, we derive bounds on the maximum number of trees that are allowed in the ensemble while maintaining privacy. We focus on binary classification problem and demonstrate our approach on four real-world datasets. Compared to the existing privacy preserving approaches we achieve significantly higher utility.
ISBN 9781467395038
Language eng
DOI 10.1109/ICDM.2015.76
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1 Full written paper - refereed
ERA Research output type E Conference publication
Copyright notice ©2015, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30081982

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 21 times in TR Web of Science
Scopus Citation Count Cited 33 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 677 Abstract Views, 4 File Downloads  -  Detailed Statistics
Created: Mon, 07 Mar 2016, 15:33:58 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.