Skip to main content
Log in

M-generalization for multipurpose transactional data publication

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Transactional data collection and sharing currently face the challenge of how to prevent information leakage and protect data from privacy breaches while maintaining high-quality data utilities. Data anonymization methods such as perturbation, generalization, and suppression have been proposed for privacy protection. However, many of these methods incur excessive information loss and cannot satisfy multipurpose utility requirements. In this paper, we propose a multidimensional generalization method to provide multipurpose optimization when anonymizing transactional data in order to offer better data utility for different applications. Our methodology uses bipartite graphs with generalizing attribute, grouping item and perturbing outlier. Experiments on real-life datasets are performed and show that our solution considerably improves data utility compared to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chang C C, Thompson B, Wang H W, Yao D. Towards publishing recommendation data with predictive anonymization. In: Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security. 2010, 24–35

    Google Scholar 

  2. Zheng Z J, Kohavi R, Mason L. Real world performance of association rule algorithms. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2001, 401–406

    Google Scholar 

  3. Wang L E, Li X X. A hybrid optimization approach for anonymizing transactional data. In: Proceedings of International Conference on Algorithms and Architectures for Parallel Processing. 2015, 120–132

    Chapter  Google Scholar 

  4. Ghinita G, Tao Y F, Kalnis P. On the anonymization of sparse highdimensional data. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 715–724

    Google Scholar 

  5. Terrovitis M, Mamoulis N, Kalnis P. Privacy-preserving anonymization of set-valued data. Proceedings of the VLDB Endowment, 2008, 1(1): 115–125

    Article  Google Scholar 

  6. Terrovitis M, Mamoulis N, Kalnis P. Local and global recoding methods for anonymizing set-valued data. The VLDB Journal—The International Journal on Very Large Data Bases, 2011, 20(1): 83–106

    Article  Google Scholar 

  7. He Y Y, Naughton J F. Anonymization of set-valued data via topdown, local generalization. Proceedings of the VLDB Endowment, 2009, 2(1): 934–945

    Article  Google Scholar 

  8. Liu J Q, Wang K. Anonymizing transaction data by integrating suppression and generalization. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2010, 171–180

    Chapter  Google Scholar 

  9. Xu Y B, Wang K, Fu A W C, Yu P S. Anonymizing transaction databases for publication. In: Proceedings of the 14th ACM SIGKDD Nternational Conference on Knowledge Discovery and Data Mining. 2008, 767–775

    Chapter  Google Scholar 

  10. Ghinita G, Kalnis P, Tao Y F. Anonymous publication of sensitive transactional data. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(2): 161–174

    Article  Google Scholar 

  11. Chen B, Kifer D, Le Fevre K, Machanavajjhala A. Privacy-preserving data publishing. Foundations and Trends in databases, 2009, 2(1–2): 1–167

    Article  Google Scholar 

  12. Fung B C M, Wang K, Chen R, Yu P S. Privacy-preserving data publishing: a survey on recent developments. ACM Computing Surveys (CSUR), 2010, 42(4): 14

    Article  Google Scholar 

  13. Poulis G, Loukides G, Gkoulalas-Divanis A, Skiadopoulos S. Anonymizing data with relational and transaction attributes. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2013, 353–369

    Google Scholar 

  14. Takahashi T, Sobataka K, Takenouchi T, Toyoda Y, Mori T, Kohro T. Top-down itemset recoding for releasing private complex data. In: Proceedings of the 11th IEEE Annual International Conference on Privacy, Security and Trust. 2013, 373–376

    Google Scholar 

  15. Gkoulalas-Divanis A, Loukides G. Utility-guided clustering-based transaction data anonymization. Transactions on Data Privacy, 2012, 5(1): 223–251

    MathSciNet  Google Scholar 

  16. Cormode G, Srivastava D, Yu T, Zhang, Q. Anonymizing bipartite graph data using safe groupings. The VLDB Journal—The International Journal on Very Large Data Bases, 2010, 19(1): 115–139

    Article  Google Scholar 

  17. Wong W K, Mamoulis N, Cheung D WL. Non-homogeneous generalization in privacy preserving data publishing. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 747–758

    Google Scholar 

  18. Samarati P. Protecting respondents’ identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 2001, 13(6): 1010–1027

    Article  Google Scholar 

  19. Sweeney L. k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(05): 557–570

    Article  MathSciNet  MATH  Google Scholar 

  20. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. ldiversity: privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1): 3

    Article  Google Scholar 

  21. Li N H, Li T C, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 106–115

    Google Scholar 

  22. Xue M Q, Karras P, Raïssi C, Vaidya J, Tan K L. Anonymizing setvalued data by nonreciprocal recoding. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1050–1058

    Google Scholar 

  23. Cao J N, Karras P, Raïssi C, Tan K L. ?-uncertainty: inference-proof transaction anonymization. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1033–1044

    Article  Google Scholar 

  24. Loukides G, Gkoulalas-Divanis A, Shao J H. Anonymizing transaction data to eliminate sensitive inferences. In: Proceedings of International Conference on Database and Expert Systems Applications. 2010, 400–415

    Chapter  Google Scholar 

  25. Loukides G, Gkoulalas-Divanis A, Shao J H. Efficient and flexible anonymization of transaction data. Knowledge and Information Systems, 2013, 36(1): 153–210

    Article  Google Scholar 

  26. Zhou J, Jing J W, Xiang J, Wang L. Privacy preserving social network publication on bipartite graphs. In: Proceedings of IFIP International Workshop on Information Security Theory and Practice. 2012, 58–70

    Google Scholar 

  27. Wang L E, Li X X. A clustering-based bipartite graph privacypreserving approach for sharing high-dimensional data. International Journal of Software Engineering and Knowledge Engineering, 2014, 24(07): 1091–1111

    Article  Google Scholar 

  28. Wang L E, Li XX. Personalized privacy protection for transactional data. In: Proceedings of International Conference on Advanced Data Mining and Applications. 2014, 253–266

    Google Scholar 

  29. Loukides G, Gkoulalas-Divanis A, Malin B. COAT: constraint-based anonymization of transactions. Knowledge and Information Systems, 2011, 28(2): 251–282

    Article  Google Scholar 

  30. Gionis A, Mazza A, Tassa T. k-Anonymization revisited. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 744–753

    Google Scholar 

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant Nos. 61662008, 61272535, 61502111), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, Guangxi Natural Science Foundation (2015GXNSFBA139246, 2014GXNSFBA118288 and 2013GXNSFBA019263), and Guangxi Special Project of Science and Technology Base and Talents (AD16380008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-E Wang.

Additional information

Xianxian Li is a professor of Guangxi Normal University, China and a PhD supervisor of Beihang University, China. His current research interests mainly include data security and software theory.

Peipei Sui received her bachelor and master degrees from Yanshan University, China. She is currently pursuing the PhD degree with the School of Computer Science and Engineering, Beihang University, China. Her current research interests include social network data, spatial-temporal data mining and data privacy.

Yan Bai is an associate professor of University of Washington Tacoma, USA. Her general research interests are in the areas of cyber security and computer networking.

Li-E Wang received her bachelor and master degrees from Hunan University, China. Now she is an associate professor of Guangxi Normal University, China. Her current research interests mainly include data security and computer networking.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Sui, P., Bai, Y. et al. M-generalization for multipurpose transactional data publication. Front. Comput. Sci. 12, 1241–1254 (2018). https://doi.org/10.1007/s11704-016-6061-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-6061-x

Keywords

Navigation