Differentially private query learning: from data publishing to model publishing
Version 2 2024-06-04, 01:53Version 2 2024-06-04, 01:53
Version 1 2017-10-25, 19:56Version 1 2017-10-25, 19:56
conference contribution
posted on 2024-06-04, 01:53authored byT Zhu, P Xiong, Gang LiGang Li, W Zhou, PS Yu
As one of the most influential privacy definitions, differential privacy provides a rigorous and provable privacy guarantee for data publishing. However, the curator has to release a large number of queries in a batch or a synthetic dataset in the Big Data era. Two challenges need to be tackled: one is how to decrease the correlation between large sets of queries, while the other is how to predict on fresh queries. This paper transfers the data publishing problem to a machine learning problem, in which queries are considered as training samples and a prediction model will be released rather than query results or synthetic datasets. When the model is published, it can be used to answer current submitted queries and predict results for fresh queries from the public. Compared with the traditional method, the proposed prediction model enhances the accuracy of query results for non-interactive publishing. We prove that learning model can successfully retain the utility of published queries while preserving privacy.