A framework for utility enhanced incomplete microdata anonymization

Gong, Qiyuan; Yang, Ming; Chen, Zhouguo; Wu, Wenjia; Luo, Junzhou

doi:10.1007/s10586-017-0795-6

A framework for utility enhanced incomplete microdata anonymization

Published: 28 February 2017

Volume 20, pages 1749–1764, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Qiyuan Gong^1,2,
Ming Yang^1,2,
Zhouguo Chen²,
Wenjia Wu¹ &
…
Junzhou Luo¹

389 Accesses
6 Citations
Explore all metrics

Abstract

Incomplete microdata, i.e., microdata with missing value, is very common in real-world datasets. However, existing anonymization techniques, which were developed for complete datasets, suffer from serious information loss on incomplete microdata, due to the missing value pollution. In this paper, we propose a framework for utility enhanced anonymization of incomplete microdata to address this issue. First, we study the properties of missing value pollution on generalization. Guided by these properties, we develop two top-down anonymization algorithms to preserve data utility on incomplete microdata. Extensive experiments on real-world datasets show that our techniques outperform the state-of-the-art techniques in terms of information loss and missing value pollution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AIM: A New Privacy Preservation Algorithm for Incomplete Microdata Based on Anatomy

Anonymization of Data Sets with NULL Values

k-Anonymity of Microdata with NULL Values

Notes

Mondrian, Enhanced Mondrian and semi-partition.
Downloadable at http://archive.ics.uci.edu/ml/datasets/Adult.
Downloadable at https://sites.google.com/site/informsdataminingcontest/.
According to the documents provided by UCI and INFORMS, ‘?’ in Adult data and -1, -7, -8, -9 in INFORMS are considered as missing values.
We assume age range is [1, 100], and Zipcode range is [10001, 50000].

References

Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Survey 42, 14:1–14:53 (2010). doi:10.1145/1749603.1749605
Article Google Scholar
Markkula, J.: Dynamic geographic personal data—new opportunity and challenge introduced by the location-aware mobile networks. Cluster Comput. 4(4), 369–377 (2001)
Article Google Scholar
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE’06: Proceedings of the 22nd International Conference on Data Engineering, p. 25. IEEE Computer Society, Washington, DC (2006)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C.: Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’06, pp. 785–790. ACM, New York (2006). doi:10.1145/1150402.1150504
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, ser. VLDB’07. VLDB Endowment, pp. 758–769 (2007). Available http://portal.acm.org/citation.cfm?id=1325851.1325938
Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21(8), 1104–1117 (2009)
Article Google Scholar
Gong, Q., Luo, J., Yang, M.: Aim: a new privacy preservation algorithm for incomplete microdata based on anatomy. In: Proceedings of the 2012 International Conference on Pervasive Computing and the Networked World, ser. ICPCA/SWS’12, pp. 194–208. Springer, Berlin (2013). doi:10.1007/978-3-642-37015-1_16
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)
Article MathSciNet MATH Google Scholar
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)
Article Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE), IEEE, pp. 106–115 (2007)
Cao, J., Karras, P.: Publishing microdata with a robust privacy guarantee. Proc. VLDB Endow. 5(11), 1388–1399 (2012). doi:10.14778/2350229.2350255
Article Google Scholar
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1(1), 115–125 (2008). doi:10.1145/1453856.1453874
Article Google Scholar
Gong, Q., Luo, J., Yang, M., Ni, W., Li, X.-B.: Anonymizing 1:m microdata with high utility. Knowl. Based Syst. 115, 15–26 (2017)
Article Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS’04: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM, New York (2004)
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In EDBT’10: Proceedings of the 13th International Conference on Extending Database Technology, pp. 135–146. ACM, New York (2010)
He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2(1), 934–945 (2009)
Article Google Scholar
Zakerzadeh, H., Aggarwal, C.C., Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, ser. SSDBM’15, pp. 26:1–26:11. ACM, New York (2015). doi:10.1145/2791347.2791380
Ni, W., Chong, Z.: Clustering-oriented privacy-preserving data publishing. Knowl. Based Syst. 35, 264–270 (2012)
Article Google Scholar
Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl. Based Syst. 46, 95–108 (2013)
Bhuyan, H.K., Kamila, N.K.: Privacy preserving sub-feature selection based on fuzzy probabilities. Cluster Comput. 17(4), 1383–1399 (2014)
Article Google Scholar
Wong, W.K., Mamoulis, N., Cheung, D.W.L.: Non-homogeneous generalization in privacy preserving data publishing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD’10, pp. 747–758. ACM, New York (2010). doi:10.1145/1807167.1807248
Xue, M., Karras, P., Raïssi, C., Vaidya, J., Tan, K.-L.: Anonymizing set-valued data by nonreciprocal recoding. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’12, pp. 1050–1058. ACM, New York (2012). doi:10.1145/2339530.2339696
Doka, K., Xue, M., Tsoumakos, D., Karras, P.: k-anonymization by freeform generalization. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ser. ASIA CCS’15, pp. 519–530. ACM, New York (2015). doi:10.1145/2714576.2714590
Rubin, D.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet MATH Google Scholar
Brown, M.L., Kros, J.F.: Data mining and the impact of missing data. Ind. Manag. Data Syst. 103(8), 611–621 (2003)
Article Google Scholar
Zhang, S., Zhang, J., Zhu, X., Qin, Y., Zhang, C.: Missing value imputation based on data clustering. In: Gavrilova, M., Tan, C. (eds.) Transactions on Computational Science I, ser. Lecture Notes in Computer Science, vol. 4750, pp. 128–138. Springer, Berlin (2008). doi:10.1007/978-3-540-79299-4_7
Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1), 110–121 (2011)
Article Google Scholar
Zhang, X., Leckie, C., Dou, W., Chen, J., Kotagiri, R., Salcic, Z.: Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM’16, pp. 1793–1802. ACM, New York (2016). doi:10.1145/2983323.2983841
Chen, B., Tan, C., Zou, X.: Cloud service platform of electronic identity in cyberspace. Cluster Comput. 1–13 (2017). doi:10.1007/s10586-017-0731-9
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2005)
Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) (2013)
Bayardo, R.J., Agrawal, R.: Data Privacy Through Optimal k-Anonymization. IEEE Computer Society, Los Alamitos (2005)
Book Google Scholar
Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications, ser. DASFAA’07, pp. 188–200. Springer, Berlin (2007)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grants No. 61572130, 61632008, 61320106007, 61502100 and 61402104, Jiangsu Provincial Natural Science Foundation under Grants BK20150628, BK20140648 and BK20150637, Jiangsu Provincial Key Technology R&D Program under Grant BE2014603, Qing Lan Project of Jiangsu Province, Jiangsu Provincial Key Laboratory of Network and Information Security under Grant BM2003201, and Key Laboratory of Computer Network and Information Integration of Ministry of Education of China under Grant 93K-9.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, China
Qiyuan Gong, Ming Yang, Wenjia Wu & Junzhou Luo
Science and Technology on Communication Security Laboratory, Chengdu, China
Qiyuan Gong, Ming Yang & Zhouguo Chen

Authors

Qiyuan Gong
View author publications
You can also search for this author in PubMed Google Scholar
Ming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhouguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Wu
View author publications
You can also search for this author in PubMed Google Scholar
Junzhou Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junzhou Luo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, Q., Yang, M., Chen, Z. et al. A framework for utility enhanced incomplete microdata anonymization. Cluster Comput 20, 1749–1764 (2017). https://doi.org/10.1007/s10586-017-0795-6

Download citation

Received: 31 July 2016
Revised: 27 January 2017
Accepted: 15 February 2017
Published: 28 February 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10586-017-0795-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for utility enhanced incomplete microdata anonymization

Abstract

Access this article

Similar content being viewed by others

AIM: A New Privacy Preservation Algorithm for Incomplete Microdata Based on Anatomy

Anonymization of Data Sets with NULL Values

k-Anonymity of Microdata with NULL Values

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for utility enhanced incomplete microdata anonymization

Abstract

Access this article

Similar content being viewed by others

AIM: A New Privacy Preservation Algorithm for Incomplete Microdata Based on Anatomy

Anonymization of Data Sets with NULL Values

k-Anonymity of Microdata with NULL Values

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation