Skip to main content
Log in

Differentially private approximate aggregation based on feature selection

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Privacy-preserving data aggregation is an important problem that has attracted extensive study. The state-of-the-art techniques for solving this problem is differential privacy, which offers a strong privacy guarantee without making strong assumptions about the attacker. However, existing solutions cannot effectively query data aggregation from high-dimensional datasets under differential privacy guarantee. Particularly, when the input dataset contains large number of dimensions, existing solutions must inject large scale of noise into returned aggregates. To address the above issue, this paper proposes an algorithm for querying differentially private approximate aggregates from high-dimensional datasets. Given a dataset D, our algorithm first develops a \(\varepsilon '\)-differentially private feature selection method that is based on a data sampling process over a kd-tree, which allows us to obtain a differentially private low-dimensional dataset with representative instances. After that, our algorithm samples independent samples from the kd-tree aiming at obtaining \((\alpha ', \delta ')\)-approximate aggregates. Finally, a model is proposed to determine the relevance between privacy and utility budgets such that the final aggregate still satisfies the accuracy requirements specified by data consumers. Intuitively, the proposed algorithm circumvents the dilemma of both dimensionality and the height threshold of kd-tree, as it samples a low-dimensional dataset S and queries aggregates from S, instead of the kd-tree. Satisfying user-specified privacy and utility budgets after multiple-stages approximation is significantly challenging, and we presents a novel model to determine the parameters’ relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bamunu Mudiyanselage TK, Xiao X, Zhang Y, Pan Y (2020) Deep fuzzy neural networks for biomarker selection for accurate cancer detection. IEEE Trans Fuzzy Syst 28(12):3219–3228

    Google Scholar 

  • Cai Z, Zheng X (2018) A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE Trans Network Sci Eng 7:766–775

    Article  MathSciNet  Google Scholar 

  • Cai Z, He Z, Guan X, Li Y (2016) Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE Trans Dependable Secure Comput 15(4):577–590

    Google Scholar 

  • Cai Z, Zheng X, Yu J (2019) A differential-private framework for urban traffic flows estimation via taxi companies. IEEE Trans Ind Inform 15(12):6492–6499

    Article  Google Scholar 

  • Cai Z, He Z (2019) Trading private range counting over big iot data. In: 2019 IEEE 39th international conference on distributed computing systems (ICDCS). IEEE, pp 144–153

  • Day W-Y, Li N, Lyu M (2016) Publishing graph degree distribution with node differential privacy. In: Proceedings of the 2016 international conference on management of data. ACM, pp 123–138

  • Dwork C (2006) Differential privacy. In: Automata. Languages and Programming pp 1–12

  • Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference. Springer, pp 265–284

  • Friedman JH, Bentley JL, Finkel RA (1976) An algorithm for finding best matches in logarithmic time. ACM Trans Math Softw 3:209–226

    Article  Google Scholar 

  • He Z, Cai Z, Cheng S, Wang X (2015) Approximate aggregation for tracking quantiles and range countings in wireless sensor networks. Theor Comput Sci 607:381–390

    Article  MathSciNet  Google Scholar 

  • He Z, Cai Z, Yu J, Wang X, Sun Y, Li Y (2016) Cost-efficient strategies for restraining rumor spreading in mobile social networks. IEEE Trans Veh Technol 66(3):2789–2800

    Article  Google Scholar 

  • He Z, Cai Z, Yu J (2017) Latent-data privacy preserving with customized data utility for social network data. IEEE Trans Veh Technol 67(1):665–673

    Article  Google Scholar 

  • He Z, Li Y, Li J, Li K, Cai Q, Liang Y (2018) Achieving differential privacy of genomic data releasing via belief propagation. Tsinghua Sci Techno 23(4):389–395

    Article  Google Scholar 

  • Kasiviswanathan SP, Lee HK, Nissim K, Raskhodnikova S, Smith A (2011) What can we learn privately? SIAM J Comput 40(3):793–826

    Article  MathSciNet  Google Scholar 

  • Lei J (2011) Differentially private m-estimators. In: Advances in neural information processing systems, pp 361–369

  • Li G, Peng S, Wang C, Niu J, Yuan Y (2019) An energy-efficient data collection scheme using denoising autoencoder in wireless sensor networks. Tsinghua Sci Technol 24(1):86–96

    Article  Google Scholar 

  • Li M, Wang H, Li J (2020) Mining conditional functional dependency rules on big data. Big Data Mining Anal 3(1):68–84

    Article  Google Scholar 

  • Liu L, Chen X, Lu Z, Wang L, Wen X (2019) Mobile-edge computing framework with data compression for wireless network in energy internet. Tsinghua Sci Technol 24(3):271–280

    Article  Google Scholar 

  • Liu J, Wang L, Liu J (2019) Efficient preference clustering via random fourier features. Big Data Min Anal 2(3):195–204

    Article  Google Scholar 

  • McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: 48th annual IEEE symposium on foundations of computer science (FOCS’07). IEEE, pp 94–103

  • Su D, Cao J, Li N, Bertino E, Jin H (2016) Differentially private k-means clustering. In: Proceedings of the sixth ACM conference on data and application security and privacy. ACM, pp 26–37

  • Wu W, Yu Z, He J (2019) A semi-supervised deep network embedding approach based on the neighborhood structure. Big Data Min Anal 2(3):205–216

    Article  Google Scholar 

  • Zaki MJ, Pan Y (2002) Introduction: recent developments in parallel and distributed data mining. Distrib Parallel Databases 11(2):123–127

    Google Scholar 

  • Zhang J, Cormode G, Procopiuc CM, Srivastava D, Xiao X (2017) Privbayes: private data release via Bayesian networks. ACM Trans Database Syst 42(4):25

    Article  MathSciNet  Google Scholar 

  • Zhang H, Wang H, Li J, Gao H (2018) A generic data analytics system for manufacturing production. Big Data Min Anal 1(2):160–171

    Article  Google Scholar 

  • Zhang J, Xiao X, Xie X (2016) Privtree: a differentially private algorithm for hierarchical decompositions. In: Proceedings of the 2016 international conference on management of data, pp 155–170

  • Zheng X, Cai Z (2020) Privacy-preserved data sharing towards multiple parties in industrial Io Ts. IEEE J Sel Areas Commun 38:968–979

    Article  Google Scholar 

  • Zheng X, Cai Z, Yu J, Wang C, Li Y (2017) Follow but no track: privacy preserved profile publishing in cyber-physical social systems. IEEE Internet Things J 4(6):1868–1878

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zaobo He.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Z., Sai, A.M.V.V., Huang, Y. et al. Differentially private approximate aggregation based on feature selection. J Comb Optim 41, 318–327 (2021). https://doi.org/10.1007/s10878-020-00666-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-020-00666-1

Keywords

Navigation