Abstract
With the advent of the era of big data, data privacy protection has become a valuable topic in the field of data publication. Unfortunately, traditional methods of privacy protection, k-anonymity, and its extensions are not absolutely secure as an adversary with background knowledge can determine the owner of a record. The emergence of differential privacy provides a reasonable alternative for privacy security, but the existing solutions ignore the correlation between sensitive attributes and other attributes. In this paper, we propose a new differential privacy scheme based on quasi-identifier classification for big data publication (DP-QIC). It is a new data publishing scheme based on the obfuscation of attribute correlation. We innovatively present quasi-identifier classification based on sensitive attributes and the privacy ratio for evaluating the data set vulnerability. DP-QIC achieves data privacy-protecting through four steps: data collection, grouping and shuffling, generalization, merging, and noise adding, which retains the overall statistical characteristics of the data set. Moreover, the exponential mechanism and the Laplace mechanism are integrated to ensure higher flexibility and a stronger level of privacy protection, so DP-QIC can be used for privacy processing of different data groups in future development. Finally, we have compared the performance of our scheme with the other two famous schemes in the industry. Experimental results demonstrate that DP-QIC has obvious advantages in data utility, privacy protection, and processing efficiency.
Similar content being viewed by others
References
Abdrashitov A, Spivak A (2016) Sensor data anonymization based on genetic algorithm clustering with L-Diversity. In: 18th Conference of open innovations association and seminar on information security and protection of information technology (FRUCT-ISPIT), Petersburg, Russia, pp. 3-8
Al-Janabi S (2020) Smart system to create an optimal higher education environment using IDA and IOTs. Int J Comput Appl 42(3):244–259. https://doi.org/10.1080/1206212X.2018.1512460
Al-Janabi S, Alkaim AF (2020) A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput 23(1):555–569. https://doi.org/10.1007/s00500-019-03972-x
Al-Janabi S, Mahdi MA (2019) Evaluation prediction techniques to achievement an optimal biomedical analysis. Int. J. Grid Util Comput 10(5):512–527. https://doi.org/10.1504/IJGUC.2019.10020511
Al-Janabi S, Mohammad M, Yousif AY (2020) A new method for prediction of air pollution based on intelligent computation. Soft Comput 24:661–680. https://doi.org/10.1007/s00500-019-04495-1
Al-Janabi S, Alkaim AF, Adel Z (2020) An Innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy. Soft Comput 24(14):10943–10962. https://doi.org/10.1007/s00500-020-04905-9
Alkaim AF (2012) Miner for OACCR: case of medical data analysis in knowledge discovery. In: IEEE 2012 6th International conference on sciences of electronics, technologies of information and telecommunications (SETIT), Sousse, 962-975. https://doi.org/10.1109/SETIT.2012.6482043
Alkaim AF, Al-Janabi S (2019) Multi objectives optimization to gas flaring reduction from oil production. Farhaoui Y. (eds) Big data and networks technologies. BDNT 2019. Lecture notes in networks and systems. https://doi.org/10.1007/978-3-030-23672-4_10
Andrés ME, Bordenabe NE, Chatzikokolakis K et al. (2013) Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of ACM conference on computer and communications (CCS), Berlin, Germany, pp. 901–914
Avent B, Korolova A, Zeber D et al. (2017) Blender: enabling local search with a hybrid differential privacy model. In: Proceedings of 26th USENIX Security Symposium, Vancouver, BC, Canada, pp. 747-764. https://doi.org/10.29012/jpc.680
Chen S, Fu AM, Shen J et al. (2020) RNN-DP: a new differential privacy scheme base on recurrent neural network for dynamic trajectory privacy protection. J Netw Comput Appl. https://doi.org/10.1016/j.jnca.2020.102736
Chen ZZ, Fu AM, Zhang YH et al (2020) Secure collaborative deep learning against GAN attacks in the Internet of Things. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2020.3033171
Chen JX, Liu G, Liu YN (2020) Lightweight privacy-preserving raw data publishing scheme. IEEE Transac Emerg Topics Comput. https://doi.org/10.1109/TETC.2020.2974183
Coulter R, Han QL, Pan L et al (2019) Data driven cyber security in perspective-intelligent traffic analysis. IEEE Trans Cybern 50:3081–3093. https://doi.org/10.1109/TCYB.2019.2940940
Drakonakis K, Ilia P, Ioannidis S et al (2019) Please forget where I was last summer: The privacy risks of public location (meta)data. In: 26th Annual network and distributed system security symposium (NDSS). San Diego, USA, pp. 1–15
Dwork C, McSherry F, Nissim K et al. (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of theory of cryptography conference, Berlin, Heidelberg, pp. 265-284. https://doi.org/10.1007/11681878_14
Dwork C (2006) Differential privacy. Lect Notes Compu Sci. 26:1–12. https://doi.org/10.1007/11787006_1
Fu AM, Li YH, Yu S et al (2017) Dipor: an ida-based dynamic proof of retrievability scheme for cloud storage systems. J Netw Comput Appl 104:97–106. https://doi.org/10.1016/j.jnca.2017.12.007
Fu AM, Chen ZZ, Mu Y (2019) Cloud-based outsourcing for enabling privacy-preserving large-scale non-negative matrix factorization. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2019.2937484
Fu AM, Zhang XL, Xiong NX et al (2020) VFL: a verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2020.3036166
Hua, J., Gao, Y., Zhong, S. (2015) Differentially private publication of general time-serial trajectory data. In: Proceedings of IEEE international conference on computer communications (INFOCOM), Kowloon, Hong Kong, pp. 549-557
Hua JY, Tang A, Fang XY et al (2016) Privacy-preserving utility verification of the data published by non-interactive differentially private mechanisms. IEEE Trans Inf Forens Secur 11:2298–2311. https://doi.org/10.1109/TIFS.2016.2532839
Kayem AVDM, Meinel C (2017) Clustering heuristics for efficient t-closeness anonymisation. In: International conference on database and expert systems applications (DEXA), Lyon, pp. 27-34. https://doi.org/10.1007/978-3-319-64471-4_3
Ke HF, Fu AM, Yu S et al. (2018) Aq-dp: a new differential privacy scheme based on quasi-identifier classifying in big data. In: Proceedings of IEEE global communications conference (GLOBECOM), Abu Dhabi, United Arab Emirates, pp. 1-6
Li M, Zhu L, Zhang Z et al (2017) Achieving differential privacy of trajectory data publishing in participatory sensing. Inf Sci 400:1–13. https://doi.org/10.1016/j.ins.2017.03.015
Lichman M (2013) Uci machine learning repository. [Online]. Available: http://archive.ics.uci.edu/ml
Lou X, Tan R, Yau DKY et al. (2017) Cost of differential privacy in demand reporting for smart grid economic dispatch. In: Proceedings of IEEE international conference on computer communications (INFOCOM), Atlanta, GA, USA, pp. 1-9
Piao C, Shi Y, Yan J et al (2019) Privacy-preserving governmental data publishing: a fog-computing-based differential privacy approach. Future Gener Comput Syst 90:158–174
Pingley A, Zhang N, Fu X et al. (2017)Protection of query privacy for continuous location-based services. In: Proceedings of IEEE international conference on computer communications (INFOCOM), Atlanta, GA, USA, pp. 1710-1718
Qu YY, Yu S, Gao LX et al (2017) Big data set privacy preserving through sensitive attribute-based grouping. IEEE International conference on communications (ICC). France, Paris, pp. 1–6
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13:1010–1027. https://doi.org/10.1109/69.971193
Soria-Comas J, Domingo-Ferrer J, Sánchez D et al (2015) T-closeness through microaggregation: strict privacy with enhanced utility preservation. IEEE Trans Knowl Data Eng 27:3098–3310. https://doi.org/10.1109/TKDE.2015.2435777
Soria-Comas J, Domingo-Ferrer J, Sánchez D et al (2017) Individual differential privacy: a utility-preserving formulation of differential privacy guarantees. IEEE Trans Inf Forens Secur 12:1418–1429. https://doi.org/10.1109/TIFS.2017.2663337
Sun N, Zhang J, Rimba P et al (2019) Data-driven cybersecurity incident prediction: a survey. IEEE Commun Surveys Tutor 21:1744–1772. https://doi.org/10.1109/COMST.2018.2885561
Wang XD, Liu YN, Choo KKR (2020) Fault tolerant multi-subset aggregation scheme for smart grid. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2020.3014401
Yang XY, Wang T, Ren XB et al (2017) Survey on improving data utility in differentially private sequential data publishing. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2017.2715334
Ye QQ, Hu HB, Meng XF et al. (2019) Privkv: key-value data collection with local differential privacy. In: IEEE symposium on security and privacyLink (S&P). San Francisco, USA, pp. 1-15
Zhang H, Shu Y, Cheng P et al (2016) Privacy and performance trade-off in cyber-physical systems. IEEE Netw 30:62–66. https://doi.org/10.1109/MNET.2016.7437026
Zhou L, Fu AM, Yang GM et al (2020) Efficient certificateless multi-copy integrity auditing scheme supporting data dynamics. IEEE Trans Depend Sec Comput. https://doi.org/10.1109/TDSC.2020.3013927
Zhou CY, Fu AM, Yu S et al (2020) Privacy-Preserving federated learning in fog computing. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2020.2987958
Acknowledgements
This work is supported by the National Natural Science Foundation of China (62072239, 61702266), the Guangxi Key Laboratory of Trusted Software (KX202029), and Special project of “Higher Education Informatization Research” of China Higher Education Association (No.2020XXHD06).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, S., Fu, A., Yu, S. et al. DP-QIC: A differential privacy scheme based on quasi-identifier classification for big data publication. Soft Comput 25, 7325–7339 (2021). https://doi.org/10.1007/s00500-021-05692-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05692-7