Skip to main content
Log in

DP-QIC: A differential privacy scheme based on quasi-identifier classification for big data publication

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

With the advent of the era of big data, data privacy protection has become a valuable topic in the field of data publication. Unfortunately, traditional methods of privacy protection, k-anonymity, and its extensions are not absolutely secure as an adversary with background knowledge can determine the owner of a record. The emergence of differential privacy provides a reasonable alternative for privacy security, but the existing solutions ignore the correlation between sensitive attributes and other attributes. In this paper, we propose a new differential privacy scheme based on quasi-identifier classification for big data publication (DP-QIC). It is a new data publishing scheme based on the obfuscation of attribute correlation. We innovatively present quasi-identifier classification based on sensitive attributes and the privacy ratio for evaluating the data set vulnerability. DP-QIC achieves data privacy-protecting through four steps: data collection, grouping and shuffling, generalization, merging, and noise adding, which retains the overall statistical characteristics of the data set. Moreover, the exponential mechanism and the Laplace mechanism are integrated to ensure higher flexibility and a stronger level of privacy protection, so DP-QIC can be used for privacy processing of different data groups in future development. Finally, we have compared the performance of our scheme with the other two famous schemes in the industry. Experimental results demonstrate that DP-QIC has obvious advantages in data utility, privacy protection, and processing efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abdrashitov A, Spivak A (2016) Sensor data anonymization based on genetic algorithm clustering with L-Diversity. In: 18th Conference of open innovations association and seminar on information security and protection of information technology (FRUCT-ISPIT), Petersburg, Russia, pp. 3-8

  • Al-Janabi S (2020) Smart system to create an optimal higher education environment using IDA and IOTs. Int J Comput Appl 42(3):244–259. https://doi.org/10.1080/1206212X.2018.1512460

    Article  Google Scholar 

  • Al-Janabi S, Alkaim AF (2020) A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput 23(1):555–569. https://doi.org/10.1007/s00500-019-03972-x

    Article  Google Scholar 

  • Al-Janabi S, Mahdi MA (2019) Evaluation prediction techniques to achievement an optimal biomedical analysis. Int. J. Grid Util Comput 10(5):512–527. https://doi.org/10.1504/IJGUC.2019.10020511

    Article  Google Scholar 

  • Al-Janabi S, Mohammad M, Yousif AY (2020) A new method for prediction of air pollution based on intelligent computation. Soft Comput 24:661–680. https://doi.org/10.1007/s00500-019-04495-1

    Article  Google Scholar 

  • Al-Janabi S, Alkaim AF, Adel Z (2020) An Innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy. Soft Comput 24(14):10943–10962. https://doi.org/10.1007/s00500-020-04905-9

    Article  Google Scholar 

  • Alkaim AF (2012) Miner for OACCR: case of medical data analysis in knowledge discovery. In: IEEE 2012 6th International conference on sciences of electronics, technologies of information and telecommunications (SETIT), Sousse, 962-975. https://doi.org/10.1109/SETIT.2012.6482043

  • Alkaim AF, Al-Janabi S (2019) Multi objectives optimization to gas flaring reduction from oil production. Farhaoui Y. (eds) Big data and networks technologies. BDNT 2019. Lecture notes in networks and systems. https://doi.org/10.1007/978-3-030-23672-4_10

  • Andrés ME, Bordenabe NE, Chatzikokolakis K et al. (2013) Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of ACM conference on computer and communications (CCS), Berlin, Germany, pp. 901–914

  • Avent B, Korolova A, Zeber D et al. (2017) Blender: enabling local search with a hybrid differential privacy model. In: Proceedings of 26th USENIX Security Symposium, Vancouver, BC, Canada, pp. 747-764. https://doi.org/10.29012/jpc.680

  • Chen S, Fu AM, Shen J et al. (2020) RNN-DP: a new differential privacy scheme base on recurrent neural network for dynamic trajectory privacy protection. J Netw Comput Appl. https://doi.org/10.1016/j.jnca.2020.102736

  • Chen ZZ, Fu AM, Zhang YH et al (2020) Secure collaborative deep learning against GAN attacks in the Internet of Things. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2020.3033171

    Article  Google Scholar 

  • Chen JX, Liu G, Liu YN (2020) Lightweight privacy-preserving raw data publishing scheme. IEEE Transac Emerg Topics Comput. https://doi.org/10.1109/TETC.2020.2974183

    Article  Google Scholar 

  • Coulter R, Han QL, Pan L et al (2019) Data driven cyber security in perspective-intelligent traffic analysis. IEEE Trans Cybern 50:3081–3093. https://doi.org/10.1109/TCYB.2019.2940940

    Article  Google Scholar 

  • Drakonakis K, Ilia P, Ioannidis S et al (2019) Please forget where I was last summer: The privacy risks of public location (meta)data. In: 26th Annual network and distributed system security symposium (NDSS). San Diego, USA, pp. 1–15

  • Dwork C, McSherry F, Nissim K et al. (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of theory of cryptography conference, Berlin, Heidelberg, pp. 265-284. https://doi.org/10.1007/11681878_14

  • Dwork C (2006) Differential privacy. Lect Notes Compu Sci. 26:1–12. https://doi.org/10.1007/11787006_1

    Article  MathSciNet  MATH  Google Scholar 

  • Fu AM, Li YH, Yu S et al (2017) Dipor: an ida-based dynamic proof of retrievability scheme for cloud storage systems. J Netw Comput Appl 104:97–106. https://doi.org/10.1016/j.jnca.2017.12.007

    Article  Google Scholar 

  • Fu AM, Chen ZZ, Mu Y (2019) Cloud-based outsourcing for enabling privacy-preserving large-scale non-negative matrix factorization. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2019.2937484

    Article  Google Scholar 

  • Fu AM, Zhang XL, Xiong NX et al (2020) VFL: a verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2020.3036166

    Article  Google Scholar 

  • Hua, J., Gao, Y., Zhong, S. (2015) Differentially private publication of general time-serial trajectory data. In: Proceedings of IEEE international conference on computer communications (INFOCOM), Kowloon, Hong Kong, pp. 549-557

  • Hua JY, Tang A, Fang XY et al (2016) Privacy-preserving utility verification of the data published by non-interactive differentially private mechanisms. IEEE Trans Inf Forens Secur 11:2298–2311. https://doi.org/10.1109/TIFS.2016.2532839

    Article  Google Scholar 

  • Kayem AVDM, Meinel C (2017) Clustering heuristics for efficient t-closeness anonymisation. In: International conference on database and expert systems applications (DEXA), Lyon, pp. 27-34. https://doi.org/10.1007/978-3-319-64471-4_3

  • Ke HF, Fu AM, Yu S et al. (2018) Aq-dp: a new differential privacy scheme based on quasi-identifier classifying in big data. In: Proceedings of IEEE global communications conference (GLOBECOM), Abu Dhabi, United Arab Emirates, pp. 1-6

  • Li M, Zhu L, Zhang Z et al (2017) Achieving differential privacy of trajectory data publishing in participatory sensing. Inf Sci 400:1–13. https://doi.org/10.1016/j.ins.2017.03.015

    Article  MATH  Google Scholar 

  • Lichman M (2013) Uci machine learning repository. [Online]. Available: http://archive.ics.uci.edu/ml

  • Lou X, Tan R, Yau DKY et al. (2017) Cost of differential privacy in demand reporting for smart grid economic dispatch. In: Proceedings of IEEE international conference on computer communications (INFOCOM), Atlanta, GA, USA, pp. 1-9

  • Piao C, Shi Y, Yan J et al (2019) Privacy-preserving governmental data publishing: a fog-computing-based differential privacy approach. Future Gener Comput Syst 90:158–174

    Article  Google Scholar 

  • Pingley A, Zhang N, Fu X et al. (2017)Protection of query privacy for continuous location-based services. In: Proceedings of IEEE international conference on computer communications (INFOCOM), Atlanta, GA, USA, pp. 1710-1718

  • Qu YY, Yu S, Gao LX et al (2017) Big data set privacy preserving through sensitive attribute-based grouping. IEEE International conference on communications (ICC). France, Paris, pp. 1–6

  • Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13:1010–1027. https://doi.org/10.1109/69.971193

    Article  Google Scholar 

  • Soria-Comas J, Domingo-Ferrer J, Sánchez D et al (2015) T-closeness through microaggregation: strict privacy with enhanced utility preservation. IEEE Trans Knowl Data Eng 27:3098–3310. https://doi.org/10.1109/TKDE.2015.2435777

    Article  Google Scholar 

  • Soria-Comas J, Domingo-Ferrer J, Sánchez D et al (2017) Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans Inf Forens Secur 12:1418–1429. https://doi.org/10.1109/TIFS.2017.2663337

    Article  Google Scholar 

  • Sun N, Zhang J, Rimba P et al (2019) Data-driven cybersecurity incident prediction: a survey. IEEE Commun Surveys Tutor 21:1744–1772. https://doi.org/10.1109/COMST.2018.2885561

    Article  Google Scholar 

  • Wang XD, Liu YN, Choo KKR (2020) Fault tolerant multi-subset aggregation scheme for smart grid. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2020.3014401

    Article  Google Scholar 

  • Yang XY, Wang T, Ren XB et al (2017) Survey on improving data utility in differentially private sequential data publishing. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2017.2715334

    Article  Google Scholar 

  • Ye QQ, Hu HB, Meng XF et al. (2019) Privkv: key-value data collection with local differential privacy. In: IEEE symposium on security and privacyLink (S&P). San Francisco, USA, pp. 1-15

  • Zhang H, Shu Y, Cheng P et al (2016) Privacy and performance trade-off in cyber-physical systems. IEEE Netw 30:62–66. https://doi.org/10.1109/MNET.2016.7437026

    Article  Google Scholar 

  • Zhou L, Fu AM, Yang GM et al (2020) Efficient certificateless multi-copy integrity auditing scheme supporting data dynamics. IEEE Trans Depend Sec Comput. https://doi.org/10.1109/TDSC.2020.3013927

    Article  Google Scholar 

  • Zhou CY, Fu AM, Yu S et al (2020) Privacy-Preserving federated learning in fog computing. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2020.2987958

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (62072239, 61702266), the Guangxi Key Laboratory of Trusted Software (KX202029), and Special project of “Higher Education Informatization Research” of China Higher Education Association (No.2020XXHD06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anmin Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Fu, A., Yu, S. et al. DP-QIC: A differential privacy scheme based on quasi-identifier classification for big data publication. Soft Comput 25, 7325–7339 (2021). https://doi.org/10.1007/s00500-021-05692-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05692-7

Keywords

Navigation