Skip to main content

Scalable Iterative Implementation of Mondrian for Big Data Multidimensional Anonymisation

  • Conference paper
  • First Online:
Security, Privacy and Anonymity in Computation, Communication and Storage (SpaCCS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10067))

Abstract

Scalable data processing platforms built on cloud computing are becoming increasingly attractive as infrastructure for supporting big data mining and analytics applications. But privacy concerns are one of the major obstacles to make use of public cloud platforms. Practically, data generalisation is a widely adopted anonymisation technique for data privacy preservation in data publishing or sharing scenarios. Multidimensional anonymisation, a global-recoding generalisation scheme, has been a recent focus due to its capability of balancing data obfuscation and data usability. Existing approaches handled the scalability problem of multidimensional anonymisation for data sets much larger than main memory by storing data on disk at runtime, which incurs an impractical serial I/O cost. In this paper, we propose a scalable iterative multidimensional anonymisation approach for big data sets based on MapReduce, a state-of-the-art large-scale data processing paradigm. Our basic and intuitive idea is to partition a large data set recursively into smaller data partitions using MapReduce until all partitions can fit in memory of each computing node. A tree indexing structure is proposed to achieve recursive computation on MapReduce for data partitioning in multidimensional anonymisation. Experimental results on real-life data sets demonstrate that the proposed approach can significantly improve the scalability and time-efficiency of multidimensional anonymisation over existing approaches, and therefore is applicable to big data applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/datasets/Adult.

References

  1. Chaudhuri, S.: What next?: a half-dozen data management research goals for big data and the cloud. In: Proceedings of the PODS 2012, pp. 1–4 (2012)

    Google Scholar 

  2. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  3. Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013)

    Article  Google Scholar 

  4. Ferreira Cordeiro, R.L., Traina Jr., C., Machado Traina, A.J., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: Proceedings of the SIGKDD 2011, pp. 690–698 (2011)

    Google Scholar 

  5. Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14 (2010)

    Article  Google Scholar 

  6. Fung, B.C., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE TKDE 19(5), 711–725 (2007)

    Google Scholar 

  7. Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest-a framework for fast decision tree construction of large datasets. In: Proceedings of the VLDB 1998, pp. 416–427 (1998)

    Google Scholar 

  8. Iwuchukwu, T., Naughton, J.F.: K-anonymization as spatial indexing: toward scalable and incremental anonymization. In: Proceedings of the VLDB 2007, pp. 746–757 (2007)

    Google Scholar 

  9. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the ICDE 2006, p. 25 (2006)

    Google Scholar 

  10. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM TODS 33(3), 17 (2008)

    Article  Google Scholar 

  11. Lin, J., Ryaboy, D.: Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explor. Newslett. 14(2), 6–19 (2013)

    Article  Google Scholar 

  12. Mohammed, N., Fung, B., Hung, P.C., Lee, C.K.: Centralized and distributed anonymization for high-dimensional healthcare data. ACM TKDD 4(4), 18 (2010)

    Google Scholar 

  13. Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE TKDE 26(1), 97–107 (2014)

    Google Scholar 

  15. Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of the SIGMOD 2006, pp. 229–240 (2006)

    Google Scholar 

  16. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: Proceedings of the SIGKDD 2006, pp. 785–790 (2006)

    Google Scholar 

  17. Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., Chen, J.: Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Trans. Comput. PP(99) (2014)

    Google Scholar 

  18. Zhang, X., Yang, C., Nepal, S., Liu, C., Dou, W., Chen, J.: A mapreduce based approach of scalable multidimensional anonymization for big data privacy preservation on cloud. In: Proceedings of the 3rd International Conference on Cloud and Green Computing (CGC2013), pp. 105–112 (2013)

    Google Scholar 

  19. Zhang, X., Yang, L.T., Liu, C., Chen, J.: A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE TPDS 25(2), 363–373 (2014)

    Google Scholar 

Download references

Acknowledgments

This paper is partially supported by Open Project of State Key Laboratory for Novel Software Technology (No. KFKT2015A03), Natural Science Foundation of China (No. 61402258), China Postdoctoral Science Foundation (No. 2015M571739), Open Project of State Key Laboratory for Novel Software Technology (No. KFKT2016B22).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuyun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Zhang, X., Qi, L., He, Q., Dou, W. (2016). Scalable Iterative Implementation of Mondrian for Big Data Multidimensional Anonymisation. In: Wang, G., Ray, I., Alcaraz Calero, J., Thampi, S. (eds) Security, Privacy and Anonymity in Computation, Communication and Storage. SpaCCS 2016. Lecture Notes in Computer Science(), vol 10067. Springer, Cham. https://doi.org/10.1007/978-3-319-49145-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49145-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49144-8

  • Online ISBN: 978-3-319-49145-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics