Abstract
Privacy-preserving data aggregation is an important problem that has attracted extensive study. The state-of-the-art techniques for solving this problem is differential privacy, which offers a strong privacy guarantee without making strong assumptions about the attacker. However, existing solutions cannot effectively query data aggregation from high-dimensional datasets under differential privacy guarantee. Particularly, when the input dataset contains large number of dimensions, existing solutions must inject large scale of noise into returned aggregates. To address the above issue, this paper proposes an algorithm for querying differentially private approximate aggregates from high-dimensional datasets. Given a dataset D, our algorithm first develops a \(\varepsilon '\)-differentially private feature selection method that is based on a data sampling process over a kd-tree, which allows us to obtain a differentially private low-dimensional dataset with representative instances. After that, our algorithm samples independent samples from the kd-tree aiming at obtaining \((\alpha ', \delta ')\)-approximate aggregates. Finally, a model is proposed to determine the relevance between privacy and utility budgets such that the final aggregate still satisfies the accuracy requirements specified by data consumers. Intuitively, the proposed algorithm circumvents the dilemma of both dimensionality and the height threshold of kd-tree, as it samples a low-dimensional dataset S and queries aggregates from S, instead of the kd-tree. Satisfying user-specified privacy and utility budgets after multiple-stages approximation is significantly challenging, and we presents a novel model to determine the parameters’ relevance.
Similar content being viewed by others
References
Bamunu Mudiyanselage TK, Xiao X, Zhang Y, Pan Y (2020) Deep fuzzy neural networks for biomarker selection for accurate cancer detection. IEEE Trans Fuzzy Syst 28(12):3219–3228
Cai Z, Zheng X (2018) A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE Trans Network Sci Eng 7:766–775
Cai Z, He Z, Guan X, Li Y (2016) Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE Trans Dependable Secure Comput 15(4):577–590
Cai Z, Zheng X, Yu J (2019) A differential-private framework for urban traffic flows estimation via taxi companies. IEEE Trans Ind Inform 15(12):6492–6499
Cai Z, He Z (2019) Trading private range counting over big iot data. In: 2019 IEEE 39th international conference on distributed computing systems (ICDCS). IEEE, pp 144–153
Day W-Y, Li N, Lyu M (2016) Publishing graph degree distribution with node differential privacy. In: Proceedings of the 2016 international conference on management of data. ACM, pp 123–138
Dwork C (2006) Differential privacy. In: Automata. Languages and Programming pp 1–12
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference. Springer, pp 265–284
Friedman JH, Bentley JL, Finkel RA (1976) An algorithm for finding best matches in logarithmic time. ACM Trans Math Softw 3:209–226
He Z, Cai Z, Cheng S, Wang X (2015) Approximate aggregation for tracking quantiles and range countings in wireless sensor networks. Theor Comput Sci 607:381–390
He Z, Cai Z, Yu J, Wang X, Sun Y, Li Y (2016) Cost-efficient strategies for restraining rumor spreading in mobile social networks. IEEE Trans Veh Technol 66(3):2789–2800
He Z, Cai Z, Yu J (2017) Latent-data privacy preserving with customized data utility for social network data. IEEE Trans Veh Technol 67(1):665–673
He Z, Li Y, Li J, Li K, Cai Q, Liang Y (2018) Achieving differential privacy of genomic data releasing via belief propagation. Tsinghua Sci Techno 23(4):389–395
Kasiviswanathan SP, Lee HK, Nissim K, Raskhodnikova S, Smith A (2011) What can we learn privately? SIAM J Comput 40(3):793–826
Lei J (2011) Differentially private m-estimators. In: Advances in neural information processing systems, pp 361–369
Li G, Peng S, Wang C, Niu J, Yuan Y (2019) An energy-efficient data collection scheme using denoising autoencoder in wireless sensor networks. Tsinghua Sci Technol 24(1):86–96
Li M, Wang H, Li J (2020) Mining conditional functional dependency rules on big data. Big Data Mining Anal 3(1):68–84
Liu L, Chen X, Lu Z, Wang L, Wen X (2019) Mobile-edge computing framework with data compression for wireless network in energy internet. Tsinghua Sci Technol 24(3):271–280
Liu J, Wang L, Liu J (2019) Efficient preference clustering via random fourier features. Big Data Min Anal 2(3):195–204
McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: 48th annual IEEE symposium on foundations of computer science (FOCS’07). IEEE, pp 94–103
Su D, Cao J, Li N, Bertino E, Jin H (2016) Differentially private k-means clustering. In: Proceedings of the sixth ACM conference on data and application security and privacy. ACM, pp 26–37
Wu W, Yu Z, He J (2019) A semi-supervised deep network embedding approach based on the neighborhood structure. Big Data Min Anal 2(3):205–216
Zaki MJ, Pan Y (2002) Introduction: recent developments in parallel and distributed data mining. Distrib Parallel Databases 11(2):123–127
Zhang J, Cormode G, Procopiuc CM, Srivastava D, Xiao X (2017) Privbayes: private data release via Bayesian networks. ACM Trans Database Syst 42(4):25
Zhang H, Wang H, Li J, Gao H (2018) A generic data analytics system for manufacturing production. Big Data Min Anal 1(2):160–171
Zhang J, Xiao X, Xie X (2016) Privtree: a differentially private algorithm for hierarchical decompositions. In: Proceedings of the 2016 international conference on management of data, pp 155–170
Zheng X, Cai Z (2020) Privacy-preserved data sharing towards multiple parties in industrial Io Ts. IEEE J Sel Areas Commun 38:968–979
Zheng X, Cai Z, Yu J, Wang C, Li Y (2017) Follow but no track: privacy preserved profile publishing in cyber-physical social systems. IEEE Internet Things J 4(6):1868–1878
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, Z., Sai, A.M.V.V., Huang, Y. et al. Differentially private approximate aggregation based on feature selection. J Comb Optim 41, 318–327 (2021). https://doi.org/10.1007/s10878-020-00666-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-020-00666-1