Skip to main content

An Information-Driven Genetic Algorithm for Privacy-Preserving Data Publishing

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2022 (WISE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13724))

Included in the following conference series:

Abstract

Due to the expanding requirements for data publishing and growing concerns regarding data privacy, the privacy-preserving data publishing (PPDP) problem has received considerable attention from research communities, industries, and governments. However, it is challenging to tackle the trade-off between privacy preservation and data quality maintenance in PPDP. In this paper, an information-driven genetic algorithm (ID-GA) is designed to achieve optimal anonymization based on attribute generalization and record suppression. In ID-GA, an information-driven crossover operator is designed to efficiently exchange information between different anonymization solutions; an information-driven mutation operator is proposed to promote information release during anonymization; a two-dimension selection operator is designed to identify the qualities of different anonymization solutions. Experimental results verify the advantages of ID-GA in terms of solution accuracy and convergence speed. Besides, the impacts of all the proposed components are verified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/82xm-y6g8.

References

  1. Ayyoubzadeh, S.M., Ayyoubzadeh, S.M., Zahedi, H., Ahmadi, M., Kalhori, S.R.N.: Predicting COVID-19 incidence through analysis of Google trends data in Iran: data mining and deep learning pilot study. JMIR Public Health Surveill. 6(2), e18828 (2020). https://doi.org/10.2196/18828

  2. Bennett, J., Lanning, S.: The Netflix prize. In: Proceedings of KDD Cup and Workshop 2007, pp. 3–6 (2007)

    Google Scholar 

  3. Cheng, K., et al.: Secure k-NN query on encrypted cloud data with multiple keys. IEEE Trans. Big Data 7(4), 689–702 (2017). https://doi.org/10.1109/tbdata.2017.2707552

    Article  Google Scholar 

  4. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  5. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4) (2010). https://doi.org/10.1145/1749603.1749605

  6. Ge, Y.-F., Cao, J., Wang, H., Zhang, Y., Chen, Z.: Distributed differential evolution for anonymity-driven vertical fragmentation in outsourced data storage. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2020. LNCS, vol. 12343, pp. 213–226. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62008-0_15

    Chapter  Google Scholar 

  7. Ge, Y.F., Orlowska, M., Cao, J., Wang, H., Zhang, Y.: Knowledge transfer-based distributed differential evolution for dynamic database fragmentation. Knowl.-Based Syst. 229, 107325 (2021). https://doi.org/10.1016/j.knosys.2021.107325

  8. Ge, Y.F., Orlowska, M., Cao, J., Wang, H., Zhang, Y.: MDDE: multitasking distributed differential evolution for privacy-preserving database fragmentation. VLDB J. (2022). https://doi.org/10.1007/s00778-021-00718-w

  9. Ge, Y.F., et al.: Distributed memetic algorithm for outsourced database fragmentation. IEEE Trans. Cybern. 51(10), 4808–4821 (2021). https://doi.org/10.1109/tcyb.2020.3027962

    Article  Google Scholar 

  10. Ge, Y.F., et al.: Distributed differential evolution based on adaptive mergence and split for large-scale optimization. IEEE Trans. Cybern. 48(7), 2166–2180 (2018). https://doi.org/10.1109/tcyb.2017.2728725

    Article  Google Scholar 

  11. Ge, Y.F., Yu, W.J., Zhang, J.: Diversity-based multi-population differential evolution for large-scale optimization. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. ACM (2016). https://doi.org/10.1145/2908961.2908995

  12. Gong, D., Sun, J., Miao, Z.: A set-based genetic algorithm for interval many-objective optimization problems. IEEE Trans. Evol. Comput. 22(1), 47–60 (2018). https://doi.org/10.1109/tevc.2016.2634625

    Article  Google Scholar 

  13. Kabir, M.E., Mahmood, A.N., Wang, H., Mustafa, A.K.: Microaggregation sorting framework for k-anonymity statistical disclosure control in cloud computing. IEEE Trans. Cloud Comput. 8(2), 408–417 (2020). https://doi.org/10.1109/tcc.2015.2469649

    Article  Google Scholar 

  14. Kabir, M.E., Wang, H.: Conditional purpose based access control model for privacy protection. In: Proceedings of the Twentieth Australasian Conference on Australasian Database, pp. 135–142 (2009)

    Google Scholar 

  15. Kabir, M.E., Wang, H., Bertino, E.: A role-involved purpose-based access control model. Inf. Syst. Front. 14(3), 809–822 (2011). https://doi.org/10.1007/s10796-011-9305-1

    Article  Google Scholar 

  16. Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 2011 International Conference on Management of Data. ACM Press (2011). https://doi.org/10.1145/1989323.1989345

  17. Kohlmayer, F., Prasser, F., Eckert, C., Kemper, A., Kuhn, K.A.: Flash: efficient, stable and optimal \(k\)-anonymity. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE (2012). https://doi.org/10.1109/socialcom-passat.2012.52

  18. Lau, B.P.L., et al.: A survey of data fusion in smart city applications. Inf. Fusion 52, 357–374 (2019). https://doi.org/10.1016/j.inffus.2019.05.004

    Article  Google Scholar 

  19. Li, J.Y., Zhan, Z.H., Wang, H., Zhang, J.: Data-driven evolutionary algorithm with perturbation-based ensemble surrogates. IEEE Trans. Cybern. 51(8), 3925–3937 (2021). https://doi.org/10.1109/tcyb.2020.3008280

    Article  Google Scholar 

  20. Liu, C., Chen, S., Zhou, S., Guan, J., Ma, Y.: A novel privacy preserving method for data publication. Inf. Sci. 501, 421–435 (2019). https://doi.org/10.1016/j.ins.2019.06.022

    Article  MathSciNet  Google Scholar 

  21. Mahanan, W., Chaovalitwongse, W.A., Natwichai, J.: Data anonymization: a novel optimal k-anonymity algorithm for identical generalization hierarchy data in IoT. SOCA 14(2), 89–100 (2020). https://doi.org/10.1007/s11761-020-00287-w

    Article  Google Scholar 

  22. Martin, K.D., Murphy, P.E.: The role of data privacy in marketing. J. Acad. Mark. Sci. 45(2), 135–155 (2016). https://doi.org/10.1007/s11747-016-0495-4

    Article  Google Scholar 

  23. Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., Guo, S.: Protection of big data privacy. IEEE Access 4, 1821–1834 (2016). https://doi.org/10.1109/access.2016.2558446

    Article  Google Scholar 

  24. Mirjalili, S.: Evolutionary Algorithms and Neural Networks. SCI, vol. 780. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-93025-1

    Book  MATH  Google Scholar 

  25. Romero, C., Ventura, S.: Educational data mining and learning analytics: an updated survey. WIREs Data Min. Knowl. Discov. 10(3) (2020). https://doi.org/10.1002/widm.1355

  26. Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press (1998). https://doi.org/10.1145/275487.275508

  27. Srinivas, M., Patnaik, L.: Genetic algorithms: a survey. Computer 27(6), 17–26 (1994). https://doi.org/10.1109/2.294849

    Article  Google Scholar 

  28. Sun, L., Ma, J., Wang, H., Zhang, Y., Yong, J.: Cloud service description model: an extension of USDL for cloud services. IEEE Trans. Serv. Comput. 11(2), 354–368 (2018). https://doi.org/10.1109/tsc.2015.2474386

    Article  Google Scholar 

  29. Sun, X., Wang, H., Li, J., Zhang, Y.: Satisfying privacy requirements before data anonymization. Comput. J. 55(4), 422–437 (2011). https://doi.org/10.1093/comjnl/bxr028

    Article  Google Scholar 

  30. Sun, X., Li, M., Wang, H.: A family of enhanced (l, \(\alpha \))-diversity models for privacy preserving data publishing. Futur. Gener. Comput. Syst. 27(3), 348–356 (2011). https://doi.org/10.1016/j.future.2010.07.007

    Article  Google Scholar 

  31. Sun, X., Li, M., Wang, H., Plank, A.: An efficient hash-based algorithm for minimal k-anonymity. In: Conferences in Research and Practice in Information Technology, vol. 74, pp. 101–107 (2008)

    Google Scholar 

  32. Sun, X., Wang, H., Li, J., Pei, J.: Publishing anonymous survey rating data. Data Min. Knowl. Disc. 23(3), 379–406 (2010). https://doi.org/10.1007/s10618-010-0208-4

    Article  MathSciNet  MATH  Google Scholar 

  33. Sun, X., Wang, H., Li, J., Zhang, Y.: Injecting purpose and trust into data anonymisation. Comput. Secur. 30(5), 332–345 (2011). https://doi.org/10.1016/j.cose.2011.05.005

    Article  Google Scholar 

  34. Sun, Y., Xue, B., Zhang, M., Yen, G.G., Lv, J.: Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. 50(9), 3840–3854 (2020). https://doi.org/10.1109/tcyb.2020.2983860

    Article  Google Scholar 

  35. Wang, H., Cao, J., Zhang, Y.: Ticket-based service access scheme for mobile users. Austral. Comput. Sci. Commun. 24(1), 285–292 (2002)

    Google Scholar 

  36. Wang, H., Sun, L.: Trust-involved access control in collaborative open social networks. In: 2010 Fourth International Conference on Network and System Security. IEEE (2010). https://doi.org/10.1109/nss.2010.13

  37. Wang, H., Sun, L., Bertino, E.: Building access control policy model for privacy preserving and testing policy conflicting problems. J. Comput. Syst. Sci. 80(8), 1493–1503 (2014). https://doi.org/10.1016/j.jcss.2014.04.017

    Article  MathSciNet  MATH  Google Scholar 

  38. Wang, H., Wang, Y., Taleb, T., Jiang, X.: Editorial: special issue on security and privacy in network computing. World Wide Web 23(2), 951–957 (2019). https://doi.org/10.1007/s11280-019-00704-x

    Article  Google Scholar 

  39. Wang, H., Zhang, Y., Cao, J., Varadharajan, V.: Achieving secure and flexible m-services through tickets. IEEE Trans. Syst. Man Cybern. - Part A: Syst. Hum. 33(6), 697–708 (2003). https://doi.org/10.1109/tsmca.2003.819917

    Article  Google Scholar 

  40. Yang, J., et al.: Brief introduction of medical database and data mining technology in big data era. J. Evid. Based Med. 13(1), 57–69 (2020). https://doi.org/10.1111/jebm.12373

    Article  Google Scholar 

  41. Zheng, X., Luo, G., Cai, Z.: A fair mechanism for private data publication in online social networks. IEEE Trans. Netw. Sci. Eng. 7(2), 880–891 (2020). https://doi.org/10.1109/tnse.2018.2801798

    Article  MathSciNet  Google Scholar 

  42. Zhou, M., et al.: Adaptive genetic algorithm-aided neural network with channel state information tensor decomposition for indoor localization. IEEE Trans. Evol. Comput. 25(5), 913–927 (2021). https://doi.org/10.1109/tevc.2021.3085906

    Article  Google Scholar 

  43. Zhu, T., Li, G., Zhou, W., Yu, P.S.: Differentially private data publishing and analysis: a survey. IEEE Trans. Knowl. Data Eng. 29(8), 1619–1638 (2017). https://doi.org/10.1109/tkde.2017.2697856

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by The Major Key Project of PCL (Grant No. PCL2022A03, PCL2021A02, PCL2021A09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong-Feng Ge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ge, YF., Wang, H., Cao, J., Zhang, Y. (2022). An Information-Driven Genetic Algorithm for Privacy-Preserving Data Publishing. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20891-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20890-4

  • Online ISBN: 978-3-031-20891-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics