Skip to main content
Log in

Achieving Privacy Preservation Constraints in Missing-Value Datasets

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Privacy violation issues must be taken into consideration when datasets are released for public use. To address these issues, there are various anonymization models to be proposed, e.g., k-anonymity, l-diversity, and t-closeness. However, these anonymization models generally propose to address privacy violation issues in datasets which are assumed that all attributes of them must be completed. Thus, these anonymization models could be insufficient to address privacy violation issues in such a dataset which is allowed to collect missing-values, e.g., rating datasets and trajectory datasets. Therefore, a new appropriate privacy preservation model for missing-value datasets is proposed by this work. With the proposed model, aside from privacy preservation, the data utility is also maintained as much as possible. Moreover, a suitable data utility metric for missing-value datasets is also presented  by this work. Furthermore, the proposed model is shown that it is an NP-Complete problem by reduction from the X3C problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A. Approximation algorithms for k-anonymity. J Priv Technol. 2005. http://ilpubs.stanford.edu:8090/645/

  2. Bayardo RJ, Rakesh A. Data privacy through optimal k-anonymization. In: 21st international conference on data engineering (ICDE’05). 2005. p. 217–28. https://doi.org/10.1109/ICDE.2005.42

  3. Bredereck R, Froese V, Hartung S, Nichterlein A, Niedermeier R, Talmon N. The complexity of degree anonymization by vertex addition. In: Gu Q, Hell P, Yang B, editors. Algorithmic aspects in information and management. Cham: Springer International Publishing; 2014. p. 44–55.

    Google Scholar 

  4. Burke R. Knowledge-Based Recommender Systems. In: Encyclopedia of library and information systems; 2000

  5. Byun JW, Kamra A, Bertino E, Li N. Efficient k-anonymization using clustering techniques. In: Kotagiri R, Krishna PR, Mohania M, Nantajeewarawat E, editors. Advances in databases: concepts, systems and applications. Berlin: Springer; 2007. p. 188–200.

    Chapter  Google Scholar 

  6. Chen W, Niu Z, Zhao X, Li Y. A hybrid recommendation algorithm adapted in e-learning environments. World Wide Web. 2014;17(2):271–84. https://doi.org/10.1007/s11280-012-0187-z.

    Article  Google Scholar 

  7. Chi Y, Hong J, Jurek A, Liu W, O’Reilly D. Privacy preserving record linkage in the presence of missing values. Inf Syst. 2017;71:199–210. https://doi.org/10.1016/j.is.2017.07.001.

    Article  Google Scholar 

  8. De Vimercati SDC, Foresti S, Livraga G, Samarati P. Data privacy: deinitions and techniques. Int J Uncertainty, Fuzziness and Knowl Based Syst. 2012;20(6):793–817. https://doi.org/10.1142/S0218488512400247

    Article  Google Scholar 

  9. Fung, BCM, Cao M, Desai BC, Xu H. Privacy protection for rfid data. In: Proceedings of the 2009 ACM symposium on applied computing, SAC ’09. New York: ACM; 2009. p. 1528–35. https://doi.org/10.1145/1529282.1529626

  10. Fung BCM, Wang K, Chen R, Yu PS. Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv. 2010;42(4):14:1–53. https://doi.org/10.1145/1749603.1749605.

    Article  Google Scholar 

  11. Fung BCM, Wang K, Yu PS. Top-down specialization for information and privacy preservation. In: 21st international conference on data engineering (ICDE’05). 2005. p. 205–16. https://doi.org/10.1109/ICDE.2005.143

  12. Garey MR, Johnson DS. Computers and Intractability: a guide to the theory of NP-completeness. New York: W. H. Freeman & Co.; 1979.

    MATH  Google Scholar 

  13. Ghinita G, Karras P, Kalnis P, Mamoulis N. A framework for efficient data anonymization under privacy and accuracy constraints. ACM Trans Database Syst. 2009;34(2):9:1–47. https://doi.org/10.1145/1538909.1538911.

    Article  Google Scholar 

  14. Gionis A, Tassa T. k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng. 2009;21(2):206–19.

    Article  Google Scholar 

  15. Jagannathan G, Wright RN. Privacy-preserving imputation of missing data. Data Knowl Eng. 2008;65(1):40–56. https://doi.org/10.1016/j.datak.2007.06.013.

    Article  Google Scholar 

  16. Kordelas GA, Alexiadis DS, Daras P, Izquierdo E. Content-based guided image filtering, weighted semi-global optimization, and efficient disparity refinement for fast and accurate disparity estimation. IEEE Trans Multimed. 2016;18(2):155–70. https://doi.org/10.1109/TMM.2015.2505905.

    Article  Google Scholar 

  17. LeFevre K, DeWitt DJ, Ramakrishnan R. Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD ’05. ACM. 2005. p. 49–60. https://doi.org/10.1145/1066157.1066164

  18. LeFevre K, DeWitt DJ, Ramakrishnan R. Mondrian multidimensional k-anonymity. In: 22nd international conference on data engineering (ICDE’06). 2006. p. 25. https://doi.org/10.1109/ICDE.2006.101

  19. Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd international conference on data engineering. 2007. p. 106–15. https://doi.org/10.1109/ICDE.2007.367856

  20. Liu J, Tang M, Zheng Z, Liu X, Lyu S. Location-aware and personalized collaborative filtering for web service recommendation. IEEE Trans Serv Comput. 2016;9(5):686–99. https://doi.org/10.1109/TSC.2015.2433251.

    Article  Google Scholar 

  21. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data. 2007;. https://doi.org/10.1145/1217299.1217302.

    Article  Google Scholar 

  22. Meyerson A, Williams R. On the complexity of optimal k-anonymity. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’04. New York: Association for Computing Machinery; 2004. p. 223–28. https://doi.org/10.1145/1055558.1055591.

  23. Nergiz ME, Clifton C. Thoughts on k-anonymization. In: 22nd international conference on data engineering workshops (ICDEW’06). 2006. p. 96. https://doi.org/10.1109/ICDEW.2006.147

  24. Ramakrishnan N, Keller BJ, Mirza BJ, Grama AY, Karypis G. Privacy risks in recommender systems. IEEE Internet Comput. 2001;5(6):54–62. https://doi.org/10.1109/4236.968832.

    Article  Google Scholar 

  25. Riyana S, Harnsamut N, Soontornphand T, Natwichai J. (k, e)-anonymous for ordinal data. In: 2015 18th international conference on network-based information systems. 2015. p. 489–93. https://doi.org/10.1109/NBiS.2015.118

  26. Riyana S, Natwichai J. Privacy preservation for recommendation databases. Serv Oriented Comput Appl. 2018;12(3–4):259–73. https://doi.org/10.1007/s11761-018-0248-y.

    Article  Google Scholar 

  27. Riyana S, Riyana N, Nanthachumphu S. Enhanced (k,e)-anonymous for categorical data. In: Proceedings of the 6th international conference on software and computer applications, ICSCA ’17. New York: ACM; 2017. p. 62–7. https://doi.org/10.1145/3056662.3056668

  28. Schnell R, Bachteler T, Reiher J. Privacy-preserving record linkage using bloom filters. BMC Med Inform Decis Mak. 2009;9:41. https://doi.org/10.1186/1472-6947-9-41.

    Article  Google Scholar 

  29. Sitti S, Riyana S, Riyana N. Scenario of privacy violation within the recommendation databases. In: 2017 international conference on digital arts, media and technology (ICDAMT). 2017. p. 383–88. https://doi.org/10.1109/ICDAMT.2017.7904997

  30. Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst. 2002;10(5):571–88. https://doi.org/10.1142/S021848850200165X.

    Article  MathSciNet  MATH  Google Scholar 

  31. Sweeney L. K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst. 2002;10(5):557–70. https://doi.org/10.1142/S0218488502001648.

    Article  MathSciNet  MATH  Google Scholar 

  32. Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC. Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06. New York, ACM; 2006. p. 785–90, 383–88. https://doi.org/10.1145/1150402.1150504

  33. Zhang Q, Koudas N, Srivastava D, Yu T. Aggregate query answering on anonymized tables. In: 2007 IEEE 23rd international conference on data engineering. 2007. p. 116–25. https://doi.org/10.1109/ICDE.2007.367857

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Surapon Riyana.

Ethics declarations

Conflict of interest

Author declares that they have no conflict of interest.

Ethical approval

This paper does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Privacy, Data Protection and Digital Identity” guest edited by Fernando Boavida, Andrea Praitano and Georgios V. Lioudakis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riyana, S., Nanthachumphu, S. & Riyana, N. Achieving Privacy Preservation Constraints in Missing-Value Datasets. SN COMPUT. SCI. 1, 227 (2020). https://doi.org/10.1007/s42979-020-00241-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00241-9

Keywords

Navigation