Scalable Iterative Implementation of Mondrian for Big Data Multidimensional Anonymisation

Zhang, Xuyun; Qi, Lianyong; He, Qiang; Dou, Wanchun

doi:10.1007/978-3-319-49145-5_31

Xuyun Zhang¹⁷,
Lianyong Qi^18,19,
Qiang He²⁰ &
…
Wanchun Dou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10067))

Included in the following conference series:

International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage

942 Accesses
2 Citations
3 Altmetric

Abstract

Scalable data processing platforms built on cloud computing are becoming increasingly attractive as infrastructure for supporting big data mining and analytics applications. But privacy concerns are one of the major obstacles to make use of public cloud platforms. Practically, data generalisation is a widely adopted anonymisation technique for data privacy preservation in data publishing or sharing scenarios. Multidimensional anonymisation, a global-recoding generalisation scheme, has been a recent focus due to its capability of balancing data obfuscation and data usability. Existing approaches handled the scalability problem of multidimensional anonymisation for data sets much larger than main memory by storing data on disk at runtime, which incurs an impractical serial I/O cost. In this paper, we propose a scalable iterative multidimensional anonymisation approach for big data sets based on MapReduce, a state-of-the-art large-scale data processing paradigm. Our basic and intuitive idea is to partition a large data set recursively into smaller data partitions using MapReduce until all partitions can fit in memory of each computing node. A tree indexing structure is proposed to achieve recursive computation on MapReduce for data partitioning in multidimensional anonymisation. Experimental results on real-life data sets demonstrate that the proposed approach can significantly improve the scalability and time-efficiency of multidimensional anonymisation over existing approaches, and therefore is applicable to big data applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://archive.ics.uci.edu/ml/datasets/Adult.

References

Chaudhuri, S.: What next?: a half-dozen data management research goals for big data and the cloud. In: Proceedings of the PODS 2012, pp. 1–4 (2012)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Article Google Scholar
Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013)
Article Google Scholar
Ferreira Cordeiro, R.L., Traina Jr., C., Machado Traina, A.J., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: Proceedings of the SIGKDD 2011, pp. 690–698 (2011)
Google Scholar
Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14 (2010)
Article Google Scholar
Fung, B.C., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE TKDE 19(5), 711–725 (2007)
Google Scholar
Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest-a framework for fast decision tree construction of large datasets. In: Proceedings of the VLDB 1998, pp. 416–427 (1998)
Google Scholar
Iwuchukwu, T., Naughton, J.F.: K-anonymization as spatial indexing: toward scalable and incremental anonymization. In: Proceedings of the VLDB 2007, pp. 746–757 (2007)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the ICDE 2006, p. 25 (2006)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM TODS 33(3), 17 (2008)
Article Google Scholar
Lin, J., Ryaboy, D.: Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explor. Newslett. 14(2), 6–19 (2013)
Article Google Scholar
Mohammed, N., Fung, B., Hung, P.C., Lee, C.K.: Centralized and distributed anonymization for high-dimensional healthcare data. ACM TKDD 4(4), 18 (2010)
Google Scholar
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness 10(05), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE TKDE 26(1), 97–107 (2014)
Google Scholar
Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of the SIGMOD 2006, pp. 229–240 (2006)
Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the SIGKDD 2006, pp. 785–790 (2006)
Google Scholar
Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J.: Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Trans. Comput. PP(99) (2014)
Google Scholar
Zhang, X., Yang, C., Nepal, S., Liu, C., Dou, W., Chen, J.: A mapreduce based approach of scalable multidimensional anonymization for big data privacy preservation on cloud. In: Proceedings of the 3rd International Conference on Cloud and Green Computing (CGC2013), pp. 105–112 (2013)
Google Scholar
Zhang, X., Yang, L.T., Liu, C., Chen, J.: A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE TPDS 25(2), 363–373 (2014)
Google Scholar

Download references

Acknowledgments

This paper is partially supported by Open Project of State Key Laboratory for Novel Software Technology (No. KFKT2015A03), Natural Science Foundation of China (No. 61402258), China Postdoctoral Science Foundation (No. 2015M571739), Open Project of State Key Laboratory for Novel Software Technology (No. KFKT2016B22).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Auckland, Auckland, 1023, New Zealand
Xuyun Zhang
State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China
Lianyong Qi & Wanchun Dou
School of Information Science and Engineering, Qufu Normal University, Qufu, 276826, China
Lianyong Qi
School of Software and Electrical Engineering, Swinburne University of Technology, Victoria, 3122, Australia
Qiang He

Authors

Xuyun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lianyong Qi
View author publications
You can also search for this author in PubMed Google Scholar
Qiang He
View author publications
You can also search for this author in PubMed Google Scholar
Wanchun Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuyun Zhang .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
Colorado State University, Fort Collins, Colorado, USA
Indrakshi Ray
University of the West of Scotland, Paisley, Glasgow, United Kingdom
Jose M. Alcaraz Calero
Indian Institute of Information Technology and Management Kerala (IIITM-K), Trivandrum, Kerala, India
Sabu M. Thampi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Qi, L., He, Q., Dou, W. (2016). Scalable Iterative Implementation of Mondrian for Big Data Multidimensional Anonymisation. In: Wang, G., Ray, I., Alcaraz Calero, J., Thampi, S. (eds) Security, Privacy and Anonymity in Computation, Communication and Storage. SpaCCS 2016. Lecture Notes in Computer Science(), vol 10067. Springer, Cham. https://doi.org/10.1007/978-3-319-49145-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-49145-5_31
Published: 10 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49144-8
Online ISBN: 978-3-319-49145-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics