File(s) under permanent embargo
Differentially private random forest with high utility
conference contribution
posted on 2015-01-01, 00:00 authored by Santu RanaSantu Rana, Sunil GuptaSunil Gupta, Svetha VenkateshSvetha VenkateshPrivacy-preserving data mining has become an active focus of the research community in the domains where data are sensitive and personal in nature. For example, highly sensitive digital repositories of medical or financial records offer enormous values for risk prediction and decision making. However, prediction models derived from such repositories should maintain strict privacy of individuals. We propose a novel random forest algorithm under the framework of differential privacy. Unlike previous works that strictly follow differential privacy and keep the complete data distribution approximately invariant to change in one data instance, we only keep the necessary statistics (e.g. variance of the estimate) invariant. This relaxation results in significantly higher utility. To realize our approach, we propose a novel differentially private decision tree induction algorithm and use them to create an ensemble of decision trees. We also propose feasible adversary models to infer about the attribute and class label of unknown data in presence of the knowledge of all other data. Under these adversary models, we derive bounds on the maximum number of trees that are allowed in the ensemble while maintaining privacy. We focus on binary classification problem and demonstrate our approach on four real-world datasets. Compared to the existing privacy preserving approaches we achieve significantly higher utility.
History
Event
IEEE International Conference on Data Mining (15th : 2015 : Atlantic City, New Jersey)Pagination
955 - 960Publisher
IEEELocation
Atlantic City, New JerseyPlace of publication
Piscataway, N.J.Publisher DOI
Start date
2015-11-14End date
2015-11-17ISBN-13
9781467395038Language
engPublication classification
E Conference publication; E1 Full written paper - refereedCopyright notice
2015, IEEEEditor/Contributor(s)
C Aggarwal, Z Zhou, A Tuzhilin, H Xiong, X WuTitle of proceedings
ICDM 2015: Proceedings of the 15th IEEE International Conference on Data MiningUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC