skip to main content
research-article

Distributed Cooperative Coevolution of Data Publishing Privacy and Transparency

Published: 06 September 2023 Publication History

Abstract

Data transparency is beneficial to data participants’ awareness, users’ fairness, and research work’s reproducibility. However, when addressing transparency requirements, we cannot ignore data privacy. This article defines the multi-objective data publishing (MODP) problem, optimizing data privacy and transparency at the same time. Accordingly, we propose a distributed cooperative coevolutionary genetic algorithm (DCCGA) to optimize the MODP problem. In the population of DCCGA, each individual represents an anonymization solution to MODP. Three modules in DCCGA, i.e., grouping module, cooperative coevolutionary module, and evolving module, are proposed for distributed sub-population update and evaluation, improving DCCGA’s optimization performance and parallel efficiency. Moreover, a matrix-based crossover operator and a matrix-based mutation operator are designed to exchange and adjust anonymization information in the individuals efficiently. Experimental results demonstrate that the proposed DCCGA outperforms the competitors with respect to solution accuracy, convergence speed, and scalability. Besides, we verify the effectiveness of all the proposed components in DCCGA.

References

[1]
Luis Miguel Antonio and Carlos A. Coello Coello. 2013. Use of cooperative coevolution for solving large scale multiobjective optimization problems. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation. IEEE, 2758–2765. DOI:
[2]
Luis Miguel Antonio, Carlos A. Coello Coello, Silvia González Brambila, Josué Figueroa González, and Guadalupe Castillo Tapia. 2019. Operational decomposition for large scale multi-objective optimization problems. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM. DOI:
[3]
Youakim Badr and Rahul Sharma. 2022. Data transparency and fairness analysis of the NYPD stop-and-frisk program. Journal of Data and Information Quality 14, 2 (2022), 1–14. DOI:
[4]
Mahmoud Barhamgi and Elisa Bertino. 2022. Editorial: Special issue on data transparency—uses cases and applications. Journal of Data and Information Quality 14, 2 (2022), 1–3. DOI:
[5]
Maya Benarous, Eran Toch, and Irad Ben-gal. 2022. Synthesis of longitudinal human location sequences: Balancing utility and privacy. ACM Transactions on Knowledge Discovery from Data 16, 6 (2022), 1–27. DOI:
[6]
Elisa Bertino. 2020. The quest for data transparency. IEEE Security & Privacy 18, 3 (2020), 67–68. DOI:
[7]
Kim Cameron. 2005. The laws of identity. Microsoft Corporation 12 (2005), 8–11.
[8]
Bin Cao, Jianwei Zhao, Yu Gu, Yingbiao Ling, and Xiaoliang Ma. 2020. Applying graph-based differential grouping for multiobjective large-scale optimization. Swarm and Evolutionary Computation 53 (2020), 100626. DOI:
[9]
Bin Cao, Jianwei Zhao, Zhihan Lv, and Xin Liu. 2017. A distributed parallel cooperative coevolutionary multiobjective evolutionary algorithm for large-scale optimization. IEEE Transactions on Industrial Informatics 13, 4 (2017), 2030–2038. DOI:
[10]
Chien-Lun Chen, Leana Golubchik, and Ranjan Pal. 2022. Achieving transparency report privacy in linear time. Journal of Data and Information Quality 14, 2 (2022), 1–56. DOI:
[11]
Chenglong Dai, Dechang Pi, Stefanie I. Becker, Jia Wu, Lin Cui, and Blake Johnson. 2020. CenEEGs: Valid EEG selection for classification. ACM Transactions on Knowledge Discovery from Data 14, 2 (2020), 1–25. DOI:
[12]
Chenglong Dai, Jia Wu, Dechang Pi, Stefanie I. Becker, Lin Cui, Qin Zhang, and Blake Johnson. 2022. Brain EEG time-series clustering using maximum-weight clique. IEEE Transactions on Cybernetics 52, 1 (2022), 357–371. DOI:
[13]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182–197. DOI:
[14]
Qi Deng, Qi Kang, Liang Zhang, Meng Chu Zhou, and Jing An. 2022. Objective space-based population generation to accelerate evolutionary algorithms for large-scale many-objective optimization. IEEE Transactions on Evolutionary Computation 27, 2 (2022), 326–340. DOI:
[15]
Wei Du, Weimin Zhong, Yang Tang, Wenli Du, and Yaochu Jin. 2019. High-dimensional robust multi-objective optimization for order scheduling: A decision variable classification approach. IEEE Transactions on Industrial Informatics 15, 1 (2019), 293–304. DOI:
[16]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography. Springer, 265–284. DOI:
[17]
Khaled El Emam, Fida Kamal Dankar, Romeo Issa, Elizabeth Jonker, Daniel Amyot, Elise Cogo, Jean-Pierre Corriveau, Mark Walker, Sadrul Chowdhury, Regis Vaillancourt, Tyson Roffey, and Jim Bottomley. 2009. A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association 16, 5 (2009), 670–682. DOI:
[18]
Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 42, 4 (2010), 1–53. DOI:
[19]
Yong-Feng Ge, Maria Orlowska, Jinli Cao, Hua Wang, and Yanchun Zhang. 2021. Knowledge transfer-based distributed differential evolution for dynamic database fragmentation. Knowledge-Based Systems 229 (2021), 107325. DOI:
[20]
Yong-Feng Ge, Maria Orlowska, Jinli Cao, Hua Wang, and Yanchun Zhang. 2022. MDDE: Multitasking distributed differential evolution for privacy-preserving database fragmentation. The VLDB Journal 31 (2022), 957–975. DOI:
[21]
Yong-Feng Ge, Wei-Jie Yu, Jinli Cao, Hua Wang, Zhi-Hui Zhan, Yanchun Zhang, and Jun Zhang. 2021. Distributed memetic algorithm for outsourced database fragmentation. IEEE Transactions on Cybernetics 51, 10 (2021), 4808–4821.
[22]
Yong-Feng Ge, Zhi-Hui Zhan, Jinli Cao, Hua Wang, Yanchun Zhang, Kuei-Kuei Lai, and Jun Zhang. 2022. DSGA: A distributed segment-based genetic algorithm for multi-objective outsourced database partitioning. Information Sciences 612 (2022), 864–886. DOI:
[23]
Andreia P. Guerreiro, Carlos M. Fonseca, and Luís Paquete. 2021. The hypervolume indicator. ACM Computing Surveys 54, 6 (2021), 1–42. DOI:
[24]
Benjamin Haibe-Kains, George Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, Massive Analysis Quality Control (MAQC) Society Board of Directors, Levi Waldron, Bo Wang, Chris McIntosh, Anna Goldenberg, Anshul Kundaje, Casey S. Greene, Tamara Broderick, Michael M. Hoffman, Jeffrey T. Leek, Keegan Korthauer, Wolfgang Huber, Alvis Brazma, Joelle Pineau, Robert Tibshirani, Trevor Hastie, John P. A. Ioannidis, John Quackenbush, and Hugo J. W. L. Aerts. 2020. Transparency and reproducibility in artificial intelligence. Nature 586, 7829 (2020), E14–E16. DOI:
[25]
Cheng He, Ran Cheng, Chuanji Zhang, Ye Tian, Qin Chen, and Xin Yao. 2020. Evolutionary large-scale multiobjective optimization for ratio error estimation of voltage transformers. IEEE Transactions on Evolutionary Computation 24, 5 (2020), 868–881. DOI:
[26]
Cheng He, Lianghao Li, Ye Tian, Xingyi Zhang, Ran Cheng, Yaochu Jin, and Xin Yao. 2019. Accelerating large-scale multiobjective optimization via problem reformulation. IEEE Transactions on Evolutionary Computation 23, 6 (2019), 949–961. DOI:
[27]
C. A. R. Hoare. 1962. Quicksort. The Computer Journal 5, 1 (1962), 10–16. DOI:
[28]
Georgios Kambourakis. 2014. Anonymity and closely related terms in the cyberspace: An analysis by example. Journal of Information Security and Applications 19, 1 (2014), 2–17. DOI:
[29]
Sourabh Katoch, Sumit Singh Chauhan, and Vijay Kumar. 2020. A review on genetic algorithm: Past, present, and future. Multimedia Tools and Applications 80, 5 (2020), 8091–8126. DOI:
[30]
Rashid Hussain Khokhar, Benjamin C. M. Fung, Farkhund Iqbal, Khalil Al-Hussaeni, and Mohammed Hussain. 2023. Differentially private release of heterogeneous network for managing healthcare data. ACM Transactions on Knowledge Discovery from Data 17, 6 (2023), 1–30. DOI:
[31]
Florian Kohlmayer, Fabian Prasser, Claudia Eckert, Alfons Kemper, and Klaus A. Kuhn. 2012. Flash: Efficient, stable and optimal K-anonymity. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing. IEEE. DOI:
[32]
Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2005. Incognito. In Proceedings of the 2005 ACM International Conference on Management of Data. ACM Press. DOI:
[33]
Minghan Li and Jingxuan Wei. 2018. A cooperative co-evolutionary algorithm for large-scale multi-objective optimization problems. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM. DOI:
[34]
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE). IEEE. DOI:
[35]
Chien-Chih Liao and Chuan-Kang Ting. 2018. A novel integer-coded memetic algorithm for the set k-cover problem in wireless sensor networks. IEEE Transactions on Cybernetics 48, 8 (2018), 2245–2258. DOI:
[36]
Xiao Liu, Bonan Gao, Basem Suleiman, Han You, Zisu Ma, Yu Liu, and Ali Anaissi. 2023. Privacy-preserving personalized fitness recommender system (P3FitRec): A multi-level deep learning approach. ACM Transactions on Knowledge Discovery from Data 17, 6 (2023), 1–24. DOI:
[37]
Zhihan Lv, Ranran Lou, and Amit Kumar Singh. 2021. AI empowered communication systems for intelligent transportation systems. IEEE Transactions on Intelligent Transportation Systems 22, 7 (2021), 4579–4587. DOI:
[38]
Xiaoliang Ma, Fang Liu, Yutao Qi, Xiaodong Wang, Lingling Li, Licheng Jiao, Minglei Yin, and Maoguo Gong. 2016. A multiobjective evolutionary algorithm based on decision variable analyses for multiobjective optimization problems with large-scale variables. IEEE Transactions on Evolutionary Computation 20, 2 (2016), 275–298. DOI:
[39]
Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. 2006. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). IEEE. DOI:
[40]
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 3. DOI:
[41]
Waranya Mahanan, W. Art Chaovalitwongse, and Juggapong Natwichai. 2020. Data anonymization: A novel optimal k-anonymity algorithm for identical generalization hierarchy data in IoT. Service Oriented Computing and Applications 14, 2 (2020), 89–100. DOI:
[42]
Adam Meyerson and Ryan Williams. 2004. On the complexity of optimal K-anonymity. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM Press. DOI:
[43]
Hong Qian and Yang Yu. 2017. Solving high-dimensional multi-objective optimization problems with low effective dimensions. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (2017), 875–881. DOI:
[44]
Pranav Rajpurkar, Emma Chen, Oishi Banerjee, and Eric J. Topol. 2022. AI in health and medicine. Nature Medicine 28, 1 (2022), 31–38. DOI:
[45]
Xuebin Ren, Chia-Mu Yu, Weiren Yu, Shusen Yang, Xinyu Yang, Julie A. McCann, and Philip S. Yu. 2018. LoPub: High-dimensional crowdsourced data publication with local differential privacy. IEEE Transactions on Information Forensics and Security 13, 9 (2018), 2151–2166. DOI:
[46]
Pierangela Samarati and Latanya Sweeney. 1998. Generalizing data to provide anonymity when disclosing information. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press. DOI:
[47]
An Song, Qiang Yang, Wei-Neng Chen, and Jun Zhang. 2016. A random-based dynamic grouping strategy for large scale multi-objective optimization. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE. DOI:
[48]
Xing Su, Shan Xue, Fanzhen Liu, Jia Wu, Jian Yang, Chuan Zhou, Wenbin Hu, Cecile Paris, Surya Nepal, Di Jin, Quan Z. Sheng, and Philip S. Yu. 2022. A comprehensive survey on community detection with deep learning. IEEE Transactions on Neural Networks and Learning Systems (2022), 1–21. DOI:
[49]
Liucheng Sun, Chenwei Weng, Chengfu Huo, Weijun Ren, Guochuan Zhang, and Xin Li. 2021. Traffic shaping in e-commercial search engine: Multi-objective online welfare maximization. Proceedings of the AAAI Conference on Artificial Intelligence 35, 1 (2021), 574–581. DOI:
[50]
Mingjing Sun, Chengcheng Zhao, Jianping He, Peng Cheng, and Daniel E. Quevedo. 2021. Privacy-preserving correlated data publication: Privacy analysis and optimal noise design. IEEE Transactions on Network Science and Engineering 8, 3 (2021), 2014–2024. DOI:
[51]
Latanya Sweeney. 2002. k-Anonmity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002), 557–570. DOI:
[52]
The California State Legislature. 2018. The California Consumer Privacy Act (CCPA). Retrieved from https://leginfo.legislature.ca.gov
[53]
The European Union. 2016. Regulation (EU) 2016/680: General Data Protection Regulation (GDPR). Retrieved from https://gdpr-info.eu
[54]
Ye Tian, Chang Lu, Xingyi Zhang, Kay Chen Tan, and Yaochu Jin. 2021. Solving large-scale multiobjective optimization problems with sparse optimal solutions via unsupervised neural networks. IEEE Transactions on Cybernetics 51, 6 (2021), 3115–3128. DOI:
[55]
Bin Wang, Pengfei Guo, Xing Wang, Yongzhong He, and Wei Wang. 2022. Transparent aspect-level sentiment analysis based on dependency syntax analysis and its application on COVID-19. Journal of Data and Information Quality 14, 2 (2022), 1–24. DOI:
[56]
Shaowei Wang, Yuqiu Qian, Jiachun Du, Wei Yang, Liusheng Huang, and Hongli Xu. 2020. Set-valued data publication with local privacy. Proceedings of the VLDB Endowment 13, 8 (2020), 1234–1247. DOI:
[57]
Lyndon While, Phil Hingston, Luigi Barone, and Simon Huband. 2006. A faster algorithm for calculating hypervolume. IEEE Transactions on Evolutionary Computation 10, 1 (2006), 29–38. DOI:
[58]
Honghui Xu, Zhipeng Cai, and Wei Li. 2022. Privacy-preserving mechanisms for multi-label image recognition. ACM Transactions on Knowledge Discovery from Data 16, 4 (2022), 1–21. DOI:
[59]
Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi, and Ada Wai-Chee Fu. 2006. Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press. DOI:
[60]
Kai Zhang, Chaonan Shen, and Gary G. Yen. 2022. Multipopulation-based differential evolution for large-scale many-objective optimization. IEEE Transactions on Cybernetics (2022). DOI:in press.
[61]
Qingfu Zhang and Hui Li. 2007. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation 11, 6 (2007), 712–731. DOI:
[62]
Xingyi Zhang, Ye Tian, Ran Cheng, and Yaochu Jin. 2018. A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization. IEEE Transactions on Evolutionary Computation 22, 1 (2018), 97–112. DOI:
[63]
Ying Zhao and Jinjun Chen. 2022. A survey on differential privacy for unstructured data content. ACM Computing Surveys 54, 10s (2022), 1–28. DOI:
[64]
Xu Zheng, Guangchun Luo, and Zhipeng Cai. 2020. A fair mechanism for private data publication in online social networks. IEEE Transactions on Network Science and Engineering 7, 2 (2020), 880–891. DOI:
[65]
Xu Zheng, Ling Tian, Guangchun Luo, and Zhipeng Cai. 2020. A collaborative mechanism for private data publication in smart cities. IEEE Internet of Things Journal 7, 9 (2020), 7883–7891. DOI:
[66]
Tianqing Zhu, Gang Li, Wanlei Zhou, and Philip S. Yu. 2017. Differentially private data publishing and analysis: A survey. IEEE Transactions on Knowledge and Data Engineering 29, 8 (2017), 1619–1638. DOI:
[67]
Heiner Zille, Hisao Ishibuchi, Sanaz Mostaghim, and Yusuke Nojima. 2018. A framework for large-scale multiobjective optimization based on problem transformation. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 260–275. DOI:

Cited By

View all
  • (2025)Landmark-v6: A stable IPv6 landmark representation method based on multi-feature clusteringInformation Processing & Management10.1016/j.ipm.2024.10392162:1(103921)Online publication date: Jan-2025
  • (2024)A hybrid intrusion detection system with K-means and CNN+LSTMICST Transactions on Scalable Information Systems10.4108/eetsis.566711:6Online publication date: 26-Jun-2024
  • (2024)Cross-Sectional Analysis of Australian Dental Practitioners’ Perceptions of TeledentistryICST Transactions on Scalable Information Systems10.4108/eetsis.536611Online publication date: 16-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 1
January 2024
854 pages
EISSN:1556-472X
DOI:10.1145/3613504
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 September 2023
Online AM: 07 August 2023
Accepted: 28 July 2023
Revised: 28 June 2023
Received: 08 March 2023
Published in TKDD Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Large-scale multi-objective optimization
  2. data privacy and transparency
  3. genetic algorithm
  4. cooperative coevolution

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)175
  • Downloads (Last 6 weeks)11
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Landmark-v6: A stable IPv6 landmark representation method based on multi-feature clusteringInformation Processing & Management10.1016/j.ipm.2024.10392162:1(103921)Online publication date: Jan-2025
  • (2024)A hybrid intrusion detection system with K-means and CNN+LSTMICST Transactions on Scalable Information Systems10.4108/eetsis.566711:6Online publication date: 26-Jun-2024
  • (2024)Cross-Sectional Analysis of Australian Dental Practitioners’ Perceptions of TeledentistryICST Transactions on Scalable Information Systems10.4108/eetsis.536611Online publication date: 16-Jul-2024
  • (2024)Privacy Preservation of Electronic Health Records in the Modern Era: A Systematic SurveyACM Computing Surveys10.1145/365329756:8(1-37)Online publication date: 26-Apr-2024
  • (2024)Federated Genetic Algorithm: Two-Layer Privacy-Preserving Trajectory Data PublishingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654200(749-758)Online publication date: 14-Jul-2024
  • (2024)Insider Threat Detection: A Review2024 International Conference on Networking and Network Applications (NaNA)10.1109/NaNA63151.2024.00031(147-153)Online publication date: 9-Aug-2024
  • (2024)A Variation-Based Genetic Algorithm for Privacy-Preserving Data Publishing2024 11th International Conference on Machine Intelligence Theory and Applications (MiTA)10.1109/MiTA60795.2024.10751708(1-8)Online publication date: 14-Jul-2024
  • (2024)Advancing MobileNet Security: Weighted Adversarial Learning in Convolutional Neural Networks2024 11th International Conference on Machine Intelligence Theory and Applications (MiTA)10.1109/MiTA60795.2024.10751690(1-8)Online publication date: 14-Jul-2024
  • (2024)Optimization Techniques for Asthma Exacerbation Prediction Models: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.344050212(110862-110890)Online publication date: 2024
  • (2024)Predicting ride-hailing passenger demandFuture Generation Computer Systems10.1016/j.future.2024.02.026156:C(168-178)Online publication date: 18-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media