skip to main content
survey

Survey on Privacy-Preserving Techniques for Microdata Publication

Published:17 July 2023Publication History
Skip Abstract Section

Abstract

The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals’ privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques (PPTs). However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. The main goal is to protect the individual’s privacy while maintaining the interpretability of the data (i.e., its usefulness). Statistical Disclosure Control is an area that is expanding and needs to be explored since there is still no solution that guarantees optimal privacy and utility. This survey focuses on all steps of the de-identification process. We present existing PPTs used in microdata de-identification, privacy measures suitable for several disclosure types, and information loss and predictive performance measures. In this survey, we discuss the main challenges raised by privacy constraints, describe the main approaches to handle these obstacles, review the taxonomies of PPTs, provide a theoretical analysis of existing comparative studies, and raise multiple open issues.

REFERENCES

  1. [1] Adam Nabil R. and Worthmann John C.. 1989. Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 4 (1989), 515556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Aggarwal Charu C. and Philip S. Yu. 2008. Privacy-Preserving Data Mining: Models and Algorithms. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] GmbH Aircloak. 2021. Aircloak. Retrieved November 1, 2021 from https://aircloak.com/.Google ScholarGoogle Scholar
  4. [4] Anjum Adeel, Ahmad Naveed, Malik Saif U. R., Zubair Samiya, and Shahzad Basit. 2018. An efficient approach for publishing microdata for multiple sensitive attributes. Journal of Supercomputing 74, 10 (2018), 51275155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Arjovsky Martin, Chintala Soumith, and Bottou Léon. 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. 214223.Google ScholarGoogle Scholar
  6. [6] ARX. 2013. ARX Data Anonymization Tool. Retrieved November 1, 2021 from https://arx.deidentifier.org/.Google ScholarGoogle Scholar
  7. [7] Bacher Johann, Brand Ruth, and Bender Stefan. 2002. Re-identifying register data by survey data using cluster analysis: An empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002), 589607.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bagdasaryan Eugene, Poursaeed Omid, and Shmatikov Vitaly. 2019. Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS’19). 15453–15462. https://proceedings.neurips.cc/paper/2019/hash/fc0de4e0396fff257ea362983c2dda5a-Abstract.html.Google ScholarGoogle Scholar
  9. [9] Bandara Eranga, Liang Xueping, Foytik Peter, Shetty Sachin, Hall Crissie, Bowden Daniel, Ranasinghe Nalin, and Zoysa Kasun De. 2021. A blockchain empowered and privacy preserving digital contact tracing platform. Information Processing & Management 58, 4 (2021), 102572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Bayardo Roberto J. and Agrawal Rakesh. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, Los Alamitos, CA, 217228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Beaulieu-Jones Brett K., Wu Zhiwei Steven, Williams Chris, Lee Ran, Bhavnani Sanjeev P., Byrd James Brian, and Greene Casey S.. 2019. Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes 12, 7 (2019), e005122.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Bellovin Steven M., Dutta Preetam K., and Reitinger Nathan. 2019. Privacy and synthetic datasets. Stanford Technology Law Review 22 (2019), 1.Google ScholarGoogle Scholar
  13. [13] Benedetti Roberto, Capobianchi A., and Franconi L.. 1998. Individual risk of disclosure using sampling design information. Contributi Istat 1412003 (1998), 1–15.Google ScholarGoogle Scholar
  14. [14] Benschop Thijs, Machingauta Cathrine, and Welch Matthew. 2019. Statistical disclosure control: A practice guide. Read the Docs. Retrieved April 5, 2023 from https://buildmedia.readthedocs.org/media/pdf/sdcpractice/latest/sdcpractice.pdf.Google ScholarGoogle Scholar
  15. [15] Bethlehem Jelke G., Keller Wouter J., and Pannekoek Jeroen. 1990. Disclosure control of microdata. Journal of the American Statistical Association 85, 409 (1990), 3845.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Blanco-Justicia Alberto, Sanchez David, Domingo-Ferrer Josep, and Muralidhar Krishnamurty. 2022. A critical review on the use (and misuse) of differential privacy in machine learning. arXiv preprint arXiv:2206.04621 (2022).Google ScholarGoogle Scholar
  17. [17] Boedihardjo March, Strohmer Thomas, and Vershynin Roman. 2022. Private sampling: A noiseless approach for generating differentially private synthetic data. SIAM Journal on Mathematics of Data Science 4, 3 (2022), 10821115.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Brand Ruth. 2002. Microdata protection through noise addition. In Inference Control in Statistical Databases. Springer, 97116.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Brickell Justin and Shmatikov Vitaly. 2008. The cost of privacy: Destruction of data-mining utility in anonymized data publishing. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 7078.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Budiardjo W. Widodo, Eko Kuswardono, and Wahyu Catur Wibowo. 2019. Privacy preserving data publishing with multiple sensitive attributes based on overlapped slicing. Information 10, 12 (2019), 362.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Buratović Ines, Miličević Mario, and Žubrinić Krunoslav. 2012. Effects of data anonymization on the data mining results. In Proceedings of the 2012 35th International Convention MIPRO. IEEE, Los Alamitos, CA, 16191623.Google ScholarGoogle Scholar
  22. [22] Cao Jianneng and Karras Panagiotis. 2012. Publishing microdata with a robust privacy guarantee. Proceedings of the VLDB Endowment 5, 11 (2012), 1388–1399.Google ScholarGoogle Scholar
  23. [23] Carvalho Tânia, Faria Pedro, Antunes Luís, and Moniz Nuno. 2021. Fundamental privacy rights in a pandemic state. PLoS One 16, 6 (2021), e0252169.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Carvalho Tânia and Moniz Nuno. 2021. The compromise of data privacy in predictive performance. In Advances in Intelligent Data Analysis XIX, Abreu Pedro Henriques, Rodrigues Pedro Pereira, Fernández Alberto, and Gama João (Eds.). Springer International Publishing, Cham, Switzerland, 426438.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Carvalho Tânia, Moniz Nuno, Faria Pedro, and Antunes Luís. 2022. Towards a data privacy-predictive performance trade-off. arxiv:2201.05226 [cs.LG] (2022).Google ScholarGoogle Scholar
  26. [26] Carvalho Tânia, Moniz Nuno, Faria Pedro, Antunes Luís, and Chawla Nitesh. 2022. Privacy-preserving data synthetisation for secure information sharing. arXiv preprint arXiv:2212.00484 (2022).Google ScholarGoogle Scholar
  27. [27] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Choi Edward, Biswal Siddharth, Malin Bradley, Duke Jon, Stewart Walter F., and Sun Jimeng. 2017. Generating multi-label discrete patient records using generative adversarial networks. In Proceedings of the Machine Learning for Healthcare Conference. 286305.Google ScholarGoogle Scholar
  29. [29] Group Cornell Database. 2009. Cornell Anonymization Toolkit. Retrieved November 1, 2021 from https://sourceforge.net/projects/anony-toolkit/.Google ScholarGoogle Scholar
  30. [30] Europe Council of. 1981. Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data. Retrieved December 1, 2022 from https://rm.coe.int/1680078b37.Google ScholarGoogle Scholar
  31. [31] Cox Lawrence H.. 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 370 (1980), 377385.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Crato Nuno and Paruolo Paolo. 2019. The power of microdata: An introduction. In Data-Driven Policy Impact Evaluation. Springer, Cham, Switzerland, 114.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Cunha Mariana, Mendes Ricardo, and Vilela João P.. 2021. A survey of privacy-preserving mechanisms for heterogeneous data types. Computer Science Review 41 (2021), 100403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Dalenius Tore. 1981. A simple procedure for controlled rounding. Statistik Tidskrift 3 (1981), 202208.Google ScholarGoogle Scholar
  35. [35] Dalenius Tore and Reiss Steven P.. 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 1 (1982), 7385.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Dandekar Ramesh A., Domingo-Ferrer Josep, and Sebé Francesc. 2002. LHS-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases. Springer, 153162.Google ScholarGoogle Scholar
  37. [37] Danezis George, Domingo-Ferrer Josep, Hansen Marit, Hoepman Jaap-Henk, Métayer Daniel Le, Tirtea Rodica, and Schiffner Stefan. 2014. Privacy and Data Protection by Design—From Policy to Engineering. European Union Agency for Network and Information Security (ENISA), Heraklion, Greece.Google ScholarGoogle Scholar
  38. [38] Dankar Fida Kamal, Emam Khaled El, Neisa Angelica, and Roffey Tyson. 2012. Estimating the re-identification risk of clinical data sets. BMC Medical Informatics and Decision Making 12, 1 (2012), 115.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Dankar Fida K. and Ibrahim Mahmoud. 2021. Fake it till you make it: Guidelines for effective synthetic data generation. Applied Sciences 11, 5 (2021), 2158.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Davies David L. and Bouldin Donald W.. 1979. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence2 (1979), 224227.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Waal A. G. De, Hundepool A. J., and Willenborg L. C. R. J.. 1996. Argus: Software for statistical disclosure control of microdata. In Proceedings of the 1996 Annual Research Conference.Google ScholarGoogle Scholar
  42. [42] Waal Ton De and Willenborg Leon Cornelis Roelof Johannes. 1996. A view on statistical disclosure control for microdata. Survey Methodology 22, 1 (1996), 95103.Google ScholarGoogle Scholar
  43. [43] Defays D. and Nanopoulos P.. 1993. Panels of enterprises and confidentiality: The small aggregates method. In Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys. 195204.Google ScholarGoogle Scholar
  44. [44] Domingo-Ferrer Josep. 2008. A survey of inference control methods for privacy-preserving data mining. In Privacy-Preserving Data Mining. Springer, 5380.Google ScholarGoogle Scholar
  45. [45] Domingo-Ferrer Josep, Farras Oriol, Ribes-González Jordi, and Sánchez David. 2019. Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges. Computer Communications 140 (2019), 3860.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Domingo-Ferrer Josep and González-Nicolás Úrsula. 2010. Hybrid microdata using microaggregation. Information Sciences 180, 15 (2010), 28342844.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Domingo-Ferrer Josep, Martínez-Ballesté Antoni, Mateo-Sanz Josep Maria, and Sebé Francesc. 2006. Efficient multivariate data-oriented microaggregation. VLDB Journal 15, 4 (2006), 355369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Domingo-Ferrer Josep and Mateo-Sanz Josep Maria. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 1 (2002), 189201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Domingo-Ferrer Josep, Mateo-Sanz Josep M., and Torra Vincenc. 2001. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In Pre-Proceedings of ETK-NTTS, Vol. 2. 807826.Google ScholarGoogle Scholar
  50. [50] Domingo-Ferrer Josep, Oganian Anna, Torres Àngel, and Mateo-Sanz Josep M.. 2002. On the security of microaggregation with individual ranking: Analytical attacks. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 477491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Domingo-Ferrer Josep, Sánchez David, and Soria-Comas Jordi. 2016. Database anonymization: Privacy models, data utility, and microaggregation-based inter-model connections. Synthesis Lectures on Information Security, Privacy, and Trust 8, 1 (2016), 1136.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Domingo-Ferrer Josep and Torra Vicenc. 2001. Disclosure control methods and information loss for microdata. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies 2001 (2001), 91110.Google ScholarGoogle Scholar
  53. [53] Domingo-Ferrer Josep and Torra Vicenç. 2002. Distance-based and probabilistic record linkage for re-identification of records with categorical variables. Butlletí de lACIA, Associació Catalana dIntelligència Artificial 2002 (2002), 243250.Google ScholarGoogle Scholar
  54. [54] Domingo-Ferrer Josep and Torra Vicenç. 2004. Disclosure risk assessment in statistical data protection. Journal of Computational and Applied Mathematics 164 (2004), 285293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] George Duncan and Stephen Roehrig. 2001. Disclosure limitation methods and information loss for tabular data. Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies 2001 (2001), 135166.Google ScholarGoogle Scholar
  56. [56] Dupriez Olivier and Boyko Ernie. 2010. Dissemination of Microdata Files: Principles Procedures and Practices. International Household Survey Network.Google ScholarGoogle Scholar
  57. [57] Dwork Cynthia. 2006. Differential privacy. In Automata, Languages and Programming. Lecture Notes in Computer Science, Vol. 4052. Springer, 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Emam Khaled El and Dankar Fida Kamal. 2008. Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15, 5 (2008), 627637.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Elliot Mark J., Manning Anna M., and Ford Rupert W.. 2002. A computational algorithm for handling the special uniques problem. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 493509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Commission European. 2014. Guidelines on output checking. CROS. Retrieved November 1, 2022 from https://ec.europa.eu/eurostat/cros/content/guidelines-output-checking_en.Google ScholarGoogle Scholar
  61. [61] Commission European. 2014. Opinion 05/2014 on Anonymisation Techniques. Retrieved February 5, 2021 from https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf.Google ScholarGoogle Scholar
  62. [62] Commission European. 2017. Guidelines on Personal Data Breach Notification Under Regulation 2016/679 (wp250rev.01). Retrieved September 1, 2021 from https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=612052.Google ScholarGoogle Scholar
  63. [63] Commission European. 2021. Statistical Disclosure Control for Business Microdata. Retrieved September 1, 2021 from https://ec.europa.eu/eurostat/documents/54610/7779382/Statistical-Disclosure-Control-in-business-statistics.pdf.Google ScholarGoogle Scholar
  64. [64] Commission European. 2022. Microdata Access. Retrieved November 1, 2022 from https://ec.europa.eu/eurostat/cros/content/microdata-access_en.Google ScholarGoogle Scholar
  65. [65] Board European Data Protection. 2021. Guidelines 07/2020 on the Concepts of Controller and Processor in the GDPR. Retrieved October 1, 2021 from https://edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-072020-concepts-controller-and-processor-gdpr_en.Google ScholarGoogle Scholar
  66. [66] Supervisor European Data Protection. 2022. Accountability. Retrieved December 1, 2022 from https://edps.europa.eu/data-protection/our-work/subjects/accountability_en.Google ScholarGoogle Scholar
  67. [67] Union European. 1995. Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. EUR-Lex. Retrieved December 1, 2022 from https://eur-lex.europa.eu/eli/dir/1995/46/oj.Google ScholarGoogle Scholar
  68. [68] Ewens Warren John. 1990. Population genetics theory—The past and the future. In Mathematical and Statistical Developments of Evolutionary Theory. Springer, 177227.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Fadel Augusto César, Ochi Luiz Satoru, Brito José André de Moura, and Semaan Gustavo Silva. 2021. Microaggregation heuristic applied to statistical disclosure control. Information Sciences 548 (2021), 3755.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Fang Mei Ling, Dhami Devendra Singh, and Kersting Kristian. 2022. DP-CTGAN: Differentially private medical data generation using CTGANs. In Proceedings of the International Conference on Artificial Intelligence in Medicine. 178188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Fellegi Ivan P. and Sunter Alan B.. 1969. A theory for record linkage. Journal of the American Statistical Association 64, 328 (1969), 11831210.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Fienberg Stephen E. and McIntyre Julie. 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In Privacy in Statistical Databases, Domingo-Ferrer Josep and Torra Vicenç (Eds.). Springer, Berlin, Germany, 1429.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Figueira Alvaro and Vaz Bruno. 2022. Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10, 15 (2022), 2733.Google ScholarGoogle Scholar
  74. [74] Fiore Marco, Katsikouli Panagiota, Zavou Elli, Cunche Mathieu, Fessant Françoise, Hello Dominique Le, Aïvodji Ulrich Matchi, Olivier Baptiste, Quertier Tony, and Stanica Razvan. 2019. Privacy of trajectory micro-data: A survey. arxiv:1903.12211 (2019).Google ScholarGoogle Scholar
  75. [75] Fletcher Sam and Islam Md. Zahidul. 2015. Measuring information quality for privacy preserving data mining. International Journal of Computer Theory and Engineering 7, 1 (2015), 21.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Foschi Flavio. 2011. Disclosure risk for high dimensional business microdata. In Proceedings of the Joint UNECE-Eurostat Work Session on Statistical Data Confidentiality.2628.Google ScholarGoogle Scholar
  77. [77] Fowlkes Edward B. and Mallows Colin L.. 1983. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78, 383 (1983), 553569.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Fredrikson Matthew, Lantz Eric, Jha Somesh, Lin Simon, Page David, and Ristenpart Thomas. 2014. Privacy in pharmacogenetics: An end-to-end study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). 1732.Google ScholarGoogle Scholar
  79. [79] Fung Benjamin C. M., Wang Ke, Chen Rui, and Yu Philip S.. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 42, 4 (2010), 153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Fung Benjamin C. M., Wang Ke, Fu Ada Wai-Chee, and Philip S. Yu. 2010. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques. CRC Press, Boca Raton, FL.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Fung Benjamin C. M., Wang Ke, Wang Lingyu, and Debbabi Mourad. 2008. A framework for privacy-preserving cluster analysis. In Proceedings of the 2008 IEEE International Conference on Intelligence and Security Informatics. IEEE, Los Alamitos, CA, 4651.Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] Fung Benjamin C. M., Wang Ke, Wang Lingyu, and Hung Patrick C. K.. 2009. Privacy-preserving data publishing for cluster analysis. Data & Knowledge Engineering 68, 6 (2009), 552575.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Fung Benjamin C. M., Wang Ke, and Yu Philip S.. 2005. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, Los Alamitos, CA, 205216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Gallacher Guillermo and Hossain Iqbal. 2020. Remote work and employment dynamics under COVID-19: Evidence from Canada. Canadian Public Policy 46, S1 (2020), 4454.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Gardner Lauren, Ratcliff Jeremy, Dong Ensheng, and Katz Aaron. 2021. A need for open public data standards and sharing in light of COVID-19. Lancet Infectious Diseases 21, 4 (2021), e80.Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] Gouweleeuw José, Kooiman Peter, Willenborg Leon, and Wolf Paul P. de. 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14, 4 (1998), 463.Google ScholarGoogle Scholar
  87. [87] Gretel. 2019. Gretel. Accessed December 1, 2022 from https://gretel.ai/.Google ScholarGoogle Scholar
  88. [88] Gretel. 2020. Gretel Synthetics. Retrieved December 1, 2022 from https://github.com/gretelai/gretel-synthetics.Google ScholarGoogle Scholar
  89. [89] Hall Rob and Fienberg Stephen E.. 2010. Privacy-preserving record linkage. In Proceedings of the International Conference on Privacy in Statistical Databases. 269283.Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Han Jianmin, Luo Fangwei, Lu Jianfeng, and Peng Hao. 2013. SLOMS: A privacy preserving data publishing method for multiple sensitive attributes microdata. Journal of Software 8, 12 (2013), 30963104.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Hansen Stephen Lee and Mukherjee Sumitra. 2003. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering 15, 4 (2003), 10431044.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. [92] Hardt Moritz, Ligett Katrina, and McSherry Frank. 2012. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems 25.Google ScholarGoogle Scholar
  93. [93] Hasan A. S. M. Touhidul, Jiang Qingshan, Luo Jun, Li Chengming, and Chen Lifei. 2016. An effective value swapping method for privacy preserving data publishing. Security and Communication Networks 9, 16 (2016), 32193228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. [94] He Xianmang, Xiao Yanghua, Li Yujia, Wang Qing, Wang Wei, and Shi Baile. 2012. Permutation anonymization: Improving anatomy for privacy preservation in data publication. In New Frontiers in Applied Data Mining, Cao Longbing, Huang Joshua Zhexue, Bailey James, Koh Yun Sing, and Luo Jun (Eds.). Springer, Berlin, Germany, 111123.Google ScholarGoogle Scholar
  95. [95] Heer G. R.. 1993. A bootstrap procedure to preserve statistical confidentiality in contingency tables. In Proceedings of the International Seminar on Statistical Confidentiality. 261271.Google ScholarGoogle Scholar
  96. [96] Herzog Thomas N., Scheuren Fritz J., and Winkler William E.. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. [97] Hittmeir Markus, Ekelhart Andreas, and Mayer Rudolf. 2019. On the utility of synthetic data: An empirical evaluation on machine learning tasks. In Proceedings of the 14th International Conference on Availability, Reliability, and Security. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. [98] Hittmeir Markus, Ekelhart Andreas, and Mayer Rudolf. 2019. Utility and privacy assessments of synthetic data for regression tasks. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 57635772.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Hoffman Lance J.. 1969. Computers and privacy: A survey. ACM Computing Surveys 1, 2 (1969), 85103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. [100] Hoshino Nobuaki. 2001. Applying Pitman’s sampling formula to microdata disclosure risk assessment. Journal of Official Statistics 17, 4 (2001), 499.Google ScholarGoogle Scholar
  101. [101] Humbert Mathias, Trubert Benjamin, and Huguenin Kévin. 2019. A survey on interdependent privacy. ACM Computing Surveys 52, 6 (2019), 140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. [102] Hundepool Anco, Domingo-Ferrer Josep, Franconi Luisa, Giessing Sarah, Lenz Rainer, Longhurst Jane, Nordholt E. Schulte, Seri Giovanni, and Wolf P.. 2010. Handbook on Statistical Disclosure Control. ESSnet on Statistical Disclosure Control.Google ScholarGoogle Scholar
  103. [103] Hundepool Anco, Domingo-Ferrer Josep, Franconi Luisa, Giessing Sarah, Nordholt Eric Schulte, Spicer Keith, and Wolf Peter-Paul De. 2012. Statistical Disclosure Control. Vol. 2. Wiley, New York, NY.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Hurkens C. A. J. and Tiourine S. R.. 1998. Models and methods for the microdata protection problem. Journal of Official Statistics 14, 4 (1998), 437.Google ScholarGoogle Scholar
  105. [105] Hutter Frank, Kotthoff Lars, and Vanschoren Joaquin (Eds.). 2018. Automated Machine Learning: Methods, Systems, Challenges. Springer.Google ScholarGoogle Scholar
  106. [106] Ichim Daniela. 2009. Disclosure control of business microdata: A density-based approach. International Statistical Review 77, 2 (2009), 196211.Google ScholarGoogle ScholarCross RefCross Ref
  107. [107] Iftikhar Masooma, Wang Qing, and Lin Yu. 2019. Publishing differentially private datasets via stable microaggregation. In Proceedings of the 22nd International Conference on Extending Database Technology (EDBT’19). 662665.Google ScholarGoogle Scholar
  108. [108] Inan Ali, Kantarcioglu Murat, and Bertino Elisa. 2009. Using anonymized data for classification. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering. IEEE, Los Alamitos, CA, 429440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. [109] Office Information Commissioner’s. 2022. Accountability and governance. ICO. Retrieved December 1, 2022 from https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/.Google ScholarGoogle Scholar
  110. [110] Office Information Commissioner’s. 2022. What does it mean if you are a controller? ICO. Retrieved December 1, 2022 from https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/controllers-and-processors/what-does-it-mean-if-you-are-a-controller/.Google ScholarGoogle Scholar
  111. [111] Ito Shinsuke and Hoshino Naomi. 2014. Data swapping as a more efficient tool to create anonymized census microdata in Japan. In Proceedings of Privacy in Statistical Databases. 114.Google ScholarGoogle Scholar
  112. [112] Ito Shinsuke, Yoshitake Toru, Kikuchi Ryo, and Akutsu Fumika. 2018. Comparative study of the effectiveness of perturbative methods for creating official microdata in Japan. In Privacy in Statistical Databases, Domingo-Ferrer Josep and Montes Francisco (Eds.). Springer International Publishing, Cham, Switzerland, 200214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Iyengar Vijay S.. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 279288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Jaro Matthew A.. 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association 84, 406 (1989), 414420.Google ScholarGoogle ScholarCross RefCross Ref
  115. [115] Jordon James, Yoon Jinsung, and Schaar Mihaela Van Der. 2018. PATE-GAN: Generating synthetic data with differential privacy guarantees. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  116. [116] Jung Gyuwon, Lee Hyunsoo, Kim Auk, and Lee Uichin. 2020. Too much information: Assessing privacy risks of contact trace data disclosure on people with COVID-19 in South Korea. Frontiers in Public Health 8 (2020), 305.Google ScholarGoogle Scholar
  117. [117] Kent Allen, Berry Madeline M., Luehrs Fred U., and Perry J. W.. 1955. Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation 6, 2 (1955), 93101.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Kifer Daniel and Gehrke Johannes. 2006. Injecting utility into anonymized datasets. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 217228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. [119] Kim Jay J.. 1986. A method for limiting disclosure in microdata based on random noise and transformation. In Proceedings of the Section on Survey Research Methods. American Statistical Association, Alexandria, VA, 303308.Google ScholarGoogle Scholar
  120. [120] Kotal Anantaa, Piplai Aritran, Chukkapalli Sai Sree Laya, and Joshi Anupam. 2022. PriveTAB: Secure and privacy-preserving sharing of tabular data. In Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics. 3545.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. [121] Kowarik A., Templ M., Meindl B., and Fonteneau F.. 2013. sdcMicroGUI: Graphical user interface for package sdcMicro. Retrieved April 5, 2023 from https://rdrr.io/cran/sdcMicroGUI/.Google ScholarGoogle Scholar
  122. [122] Kubat Miroslav, Holte Robert C., and Matwin Stan. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30, 2 (1998), 195215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. [123] Kullback Solomon and Leibler Richard A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 1 (1951), 7986.Google ScholarGoogle ScholarCross RefCross Ref
  124. [124] Kunar Aditya. 2021. Effective and privacy preserving tabular data synthesizing. arXiv preprint arXiv:2108.10064 (2021).Google ScholarGoogle Scholar
  125. [125] Laszlo Michael and Mukherjee Sumitra. 2005. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17, 7 (2005), 902911.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. [126] Laszlo Michael and Mukherjee Sumitra. 2009. Approximation bounds for minimum information loss microaggregation. IEEE Transactions on Knowledge and Data Engineering 21, 11 (2009), 16431647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. [127] Lee Jaewoo and Clifton Chris. 2011. How much is enough? Choosing \(\varepsilon\) for differential privacy. In Proceedings of the International Conference on Information Security. 325340.Google ScholarGoogle ScholarCross RefCross Ref
  128. [128] LeFevre Kristen, DeWitt David J., and Ramakrishnan Raghu. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 4960.Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. [129] LeFevre Kristen, DeWitt David J., and Ramakrishnan Raghu. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, Los Alamitos, CA, 2525.Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. [130] LeFevre Kristen, DeWitt David J., and Ramakrishnan Raghu. 2006. Workload-aware anonymization. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 277286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. [131] Li Boyu, He Kun, and Sun Geng. 2023. Local generalization and bucketization technique for personalized privacy preservation. Journal of King Saud University: Computer and Information Sciences 35, 1 (2023), 393–404.Google ScholarGoogle Scholar
  132. [132] Li Boyu, Liu Yanheng, Han Xu, and Zhang Jindong. 2017. Cross-bucket generalization for information and privacy preservation. IEEE Transactions on Knowledge and Data Engineering 30, 3 (2017), 449459.Google ScholarGoogle ScholarCross RefCross Ref
  133. [133] Li Jiuyong, Liu Jixue, Baig Muzammil, and Wong Raymond Chi-Wing. 2011. Information based data anonymization for classification utility. Data & Knowledge Engineering 70, 12 (2011), 10301045.Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. [134] Li Jiexun, Wang G. Alan, and Chen Hsinchun. 2011. Identity matching using personal and social identity features. Information Systems Frontiers 13, 1 (2011), 101113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. [135] Li Ninghui, Li Tiancheng, and Venkatasubramanian Suresh. 2007. T-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering. IEEE, Los Alamitos, CA, 106115.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Li Tiancheng and Li Ninghui. 2009. On the tradeoff between privacy and utility in data publishing. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 517526.Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. [137] Li Tiancheng, Li Ninghui, Zhang Jian, and Molloy Ian. 2010. Slicing: A new approach for privacy preserving data publishing. IEEE Transactions on Knowledge and Data Engineering 24, 3 (2010), 561574.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Liao Dan, Li Hui, Sun Gang, Zhang Ming, and Chang Victor. 2018. Location and trajectory privacy preservation in 5G-enabled vehicle social network services. Journal of Network and Computer Applications 110 (2018), 108118.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Lin Jun-Lin, Chang Pei-Chann, Liu Julie Yu-Chih, and Wen Tsung-Hsien. 2010. Comparison of microaggregation approaches on anonymized data quality. Expert Systems with Applications 37, 12 (2010), 81618165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. [140] Little Roderick J. A.. 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 2 (1993), 407.Google ScholarGoogle Scholar
  141. [141] Little Roderick J. A., Liu Fang, and Raghunathan Trivellore E.. 2004. Statistical disclosure techniques based on multiple imputation. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives: An Essential Journey with Donald Rubin’s Statistical Family, Andrew Gelman and Xiao-Li Meng (Eds.). Wiley, 141152.Google ScholarGoogle Scholar
  142. [142] Liu Jiaxiang, Oya Simon, and Kerschbaum Florian. 2021. Generalization techniques empirically outperform differential privacy against membership inference. arXiv preprint arXiv:2110.05524 (2021). https://arxiv.org/abs/2110.05524.Google ScholarGoogle Scholar
  143. [143] Liu Kun, Liu Wenyan, Cheng Junhong, and Lu Xingjian. 2019. UHRP: Uncertainty-based pruning method for anonymized data linear regression. In Proceedings of the International Conference on Database Systems for Advanced Applications. 1933.Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. [144] Liu Tianen, Wang Yingjie, Cai Zhipeng, Tong Xiangrong, Pan Qingxian, and Zhao Jindong. 2020. A dynamic privacy protection mechanism for spatiotemporal crowdsourcing. Security and Communication Networks 2020 (2020), 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. [145] Liu Yi, Peng Jialiang, James J. Q., and Wu Yi. 2019. PPGAN: Privacy-preserving generative adversarial network. In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS’19). IEEE, Los Alamitos, CA, 985989.Google ScholarGoogle ScholarCross RefCross Ref
  146. [146] Lyu Lingjuan, Law Yee Wei, Ng Kee Siong, Xue Shibei, Zhao Jun, Yang Mengmeng, and Liu Lei. 2020. Towards distributed privacy-preserving prediction. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC’20). IEEE, Los Alamitos, CA, 41794184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. [147] Machanavajjhala Ashwin, Kifer Daniel, Abowd John, Gehrke Johannes, and Vilhuber Lars. 2008. Privacy: Theory meets practice on the map. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. IEEE, Los Alamitos, CA, 277286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. [148] Machanavajjhala Ashwin, Kifer Daniel, Gehrke Johannes, and Venkitasubramaniam Muthuramakrishnan. 2007. l-Diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 3–es.Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. [149] Mackey Elaine, Elliot Mark, and O’Hara Kieron. 2016. The Anonymisation Decision-Making Framework. UKAN Publications.Google ScholarGoogle Scholar
  150. [150] Majeed Abdul and Lee Sungchang. 2021. Anonymization techniques for privacy preserving data publishing: A comprehensive survey. IEEE Access 9 (2021), 85128545.Google ScholarGoogle ScholarCross RefCross Ref
  151. [151] Manning Anna M., Haglin David J., and Keane John A.. 2008. A recursive search algorithm for statistical disclosure assessment. Data Mining and Knowledge Discovery 16, 2 (2008), 165196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. [152] Martínez Sergio, Sánchez David, and Valls Aida. 2012. Semantic adaptive microaggregation of categorical microdata. Computers & Security 31, 5 (2012), 653672.Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. [153] Mateo-Sanz Josep Maria, Sebé Francesc, and Domingo-Ferrer Josep. 2004. Outlier protection in continuous microdata masking. In Proceedings of the International Workshop on Privacy in Statistical Databases. 201215.Google ScholarGoogle ScholarCross RefCross Ref
  154. [154] Matthews Gregory J. and Harel Ofer. 2011. Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy. Statistics Surveys 5 (2011), 129.Google ScholarGoogle ScholarCross RefCross Ref
  155. [155] Matwin Stan, Nin Jordi, Sehatkar Morvarid, and Szapiro Tomasz. 2015. A review of attribute disclosure control. In Advanced Research in Data Privacy. Studies in Computational Intelligence, Vol. 567. Springer, 4161.Google ScholarGoogle ScholarCross RefCross Ref
  156. [156] Mitchell Margaret, Wu Simone, Zaldivar Andrew, Barnes Parker, Vasserman Lucy, Hutchinson Ben, Spitzer Elena, Raji Inioluwa Deborah, and Gebru Timnit. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 220229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. [157] Mivule Kato. 2013. Utilizing noise addition for data privacy, an overview. arXiv preprint arXiv:1309.3958 (2013).Google ScholarGoogle Scholar
  158. [158] Mivule Kato and Turner Claude. 2013. A comparative analysis of data privacy and utility parameter adjustment, using machine learning classification as a gauge. Procedia Computer Science 20 (2013), 414419.Google ScholarGoogle ScholarCross RefCross Ref
  159. [159] Mivule Kato, Turner Claude, and Ji Soo-Yeon. 2012. Towards a differential privacy and utility preserving machine learning classifier. Procedia Computer Science 12 (2012), 176181.Google ScholarGoogle ScholarCross RefCross Ref
  160. [160] Mohammed Noman, Fung Benjamin C. M., Hung Patrick C. K., and Lee Cheuk-Kwong. 2010. Centralized and distributed anonymization for high-dimensional healthcare data. ACM Transactions on Knowledge Discovery from Data 4, 4 (2010), 133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. [161] Moore Richard. 1996. Controlled Data-Swapping Techniques for Masking Public Use Microdata Sets. U.S. Census Bureau.Google ScholarGoogle Scholar
  162. [162] AI MOSTLY. 2017. MOSTLY AI. Retrieved December 1, 2022 from https://mostly.ai/.Google ScholarGoogle Scholar
  163. [163] AI MOSTLY. 2020. Virtual Data Lab (VDL). Retrieved December 1, 2022 from https://github.com/mostly-ai/virtualdatalab.Google ScholarGoogle Scholar
  164. [164] Muralidhar Krishnamurty and Domingo-Ferrer Josep. 2016. Rank-based record linkage for re-identification risk assessment. In Proceedings of the International Conference on Privacy in Statistical Databases. 225236.Google ScholarGoogle ScholarCross RefCross Ref
  165. [165] Muralidhar Krishnamurty, Domingo-Ferrer Josep, and Martínez Sergio. 2020. \(\epsilon\)-Differential privacy for microdata releases does not guarantee confidentiality (let alone utility). In Proceedings of the International Conference on Privacy in Statistical Databases. 2131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. [166] Muralidhar Krishnamurty and Sarathy Rathindra. 2003. A theoretical basis for perturbation methods. Statistics and Computing 13, 4 (2003), 329335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. [167] Muralidhar Krishnamurty and Sarathy Rathindra. 2003. A rejoinder to the comments by Polettini and Stander. Statistics and Computing 13, 4 (2003), 339342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. [168] Muralidhar Krishnamurty and Sarathy Rathindra. 2006. Data shuffling—A new masking approach for numerical data. Management Science 52, 5 (2006), 658670.Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. [169] Muralidhar Krish, Sarathy Rathindra, and Dandekar Ramesh. 2006. Why swap when you can shuffle? A comparison of the proximity swap and data shuffle for numeric data. In Proceedings of the International Conference on Privacy in Statistical Databases. 164176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. [170] Jr. Jeffrey Murray, Mashhadi Afra, Lagesse Brent, and Stiber Michael. 2021. Privacy preserving techniques applied to CPNI data: Analysis and recommendations. arXiv preprint arXiv:2101.09834 (2021).Google ScholarGoogle Scholar
  171. [171] Nanni Mirco, Andrienko Gennady, Barabási Albert-László, Boldrini Chiara, Bonchi Francesco, Cattuto Ciro, Chiaromonte Francesca, et al. 2021. Give more data, awareness and control to individual citizens, and they will help COVID-19 containment. Ethics and Information Technology 23, 1 (2021), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. [172] Narayanan Arvind and Shmatikov Vitaly. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP’08). IEEE, Los Alamitos, CA, 111125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. [173] Nawaz Asif and Kazemian Hassan. 2021. A fuzzy approach to identity resolution. In Proceedings of the International Conference on Engineering Applications of Neural Networks. 307318.Google ScholarGoogle ScholarCross RefCross Ref
  174. [174] Nayak Tapan K., Sinha Bimal, and Zayatz Laura. 2011. Statistical properties of multiplicative noise masking for confidentiality protection. Journal of Official Statistics 27, 3 (2011), 527.Google ScholarGoogle Scholar
  175. [175] Nergiz Mehmet Ercan, Atzori Maurizio, and Clifton Chris. 2007. Hiding the presence of individuals from shared databases. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 665676.Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. [176] Nergiz M. Ercan and Clifton Chris. 2007. Thoughts on k-anonymization. Data & Knowledge Engineering 63, 3 (2007), 622645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. [177] Nin Jordi, Herranz Javier, and Torra Vicenç. 2008. Rethinking rank swapping to decrease disclosure risk. Data & Knowledge Engineering 64, 1 (2008), 346364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. [178] Nowok Beata. 2015. Utility of synthetic microdata generated using tree-based methods. In Proceedings of the UNECE Statistical Data Confidentiality Work Session. 1–11.Google ScholarGoogle Scholar
  179. [179] Ochoa Salvador, Rasmussen Jamie, Robson Christine, and Salib Michael. 2001. Reidentification of Individuals in Chicago’s Homicide Database: A Technical and Legal Study. Massachusetts Institute of Technology, Cambridge, MA.Google ScholarGoogle Scholar
  180. [180] Ohm Paul. 2009. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57 (2009), 1701.Google ScholarGoogle Scholar
  181. [181] Ohno-Machado Lucila, Vinterbo Staal, and Dreiseitl Stephan. 2002. Effects of data anonymization by cell suppression on descriptive statistics and predictive modeling performance. Journal of the American Medical Informatics Association 9, Suppl. 6 (2002), 115119.Google ScholarGoogle ScholarCross RefCross Ref
  182. [182] Oliveira Stanley R. M. and Zaiane Osmar R.. 2010. Privacy preserving clustering by data transformation. Journal of Information and Data Management 1, 1 (2010), 37.Google ScholarGoogle Scholar
  183. [183] OpenAIRE. 2021. Amnesia. Retrieved November 1, 2021 from https://amnesia.openaire.eu.Google ScholarGoogle Scholar
  184. [184] Orooji Marmar and Knapp Gerald M.. 2019. Improving suppression to reduce disclosure risk and enhance data utility. arXiv preprint arXiv:1901.00716 (2019).Google ScholarGoogle Scholar
  185. [185] Pagliuca D. and Seri G.. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey. Esprit SDC Project, Deliverable MI-3/S1. Esprit.Google ScholarGoogle Scholar
  186. [186] Patki Neha, Wedge Roy, and Veeramachaneni Kalyan. 2016. The synthetic data vault. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA’16). 399410. Google ScholarGoogle ScholarCross RefCross Ref
  187. [187] Peiffer-Smadja Nathan, Maatoug Redwan, Lescure François-Xavier, D’ortenzio Eric, Pineau Joëlle, and King Jean-Rémi. 2020. Machine learning for COVID-19 needs global collaboration and data-sharing. Nature Machine Intelligence 2, 6 (2020), 293294.Google ScholarGoogle ScholarCross RefCross Ref
  188. [188] Ping Haoyue, Stoyanovich Julia, and Howe Bill. 2017. DataSynthesizer: Privacy-preserving synthetic datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. [189] Prasser Fabian, Eicher Johanna, Spengler Helmut, Bild Raffael, and Kuhn Klaus A.. 2020. Flexible data anonymization using ARX—Current status and challenges ahead. Software: Practice and Experience 50, 7 (2020), 12771304.Google ScholarGoogle ScholarCross RefCross Ref
  190. [190] Prasser Fabian, Kohlmayer Florian, and Kuhn Klaus A.. 2016. The importance of context: Risk-based de-identification of biomedical data. Methods of Information in Medicine 55, 4 (2016), 347355.Google ScholarGoogle ScholarCross RefCross Ref
  191. [191] Radanliev Petar, Roure David De, and Walton Rob. 2020. Data mining and analysis of scientific research data records on Covid-19 mortality, immunity, and vaccine development—In the first wave of the Covid-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 14, 5 (2020), 11211132.Google ScholarGoogle ScholarCross RefCross Ref
  192. [192] Rand William M.. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 336 (1971), 846850.Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. [193] Rankin Debbie, Black Michaela, Bond Raymond, Wallace Jonathan, Mulvenna Maurice, and Gorka Epelde. 2020. Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8, 7 (2020), e18910.Google ScholarGoogle ScholarCross RefCross Ref
  194. [194] Reiter Jerome P.. 2005. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 472 (2005), 11031112.Google ScholarGoogle ScholarCross RefCross Ref
  195. [195] Reiter Jerome P.. 2005. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21, 3 (2005), 441.Google ScholarGoogle Scholar
  196. [196] Rijsbergen C. J. Van. 1979. Information Retrieval. Butterworth-Heinemann.Google ScholarGoogle ScholarDigital LibraryDigital Library
  197. [197] Ritchie Felix. 2009. UK release practices for official microdata. Statistical Journal of the IAOS 26, 3, 4 (2009), 103111.Google ScholarGoogle Scholar
  198. [198] Rocher Luc, Hendrickx Julien M., and Montjoye Yves-Alexandre De. 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, 1 (2019), 19.Google ScholarGoogle ScholarCross RefCross Ref
  199. [199] Rockett Ian R. H., Caine Eric D., Connery Hilary S., D’Onofrio Gail, Gunnell David J., Miller Ted R., Nolte Kurt B., et al. 2018. Discerning suicide in drug intoxication deaths: Paucity and primacy of suicide notes and psychiatric history. PLoS One 13, 1 (2018), e0190200.Google ScholarGoogle ScholarCross RefCross Ref
  200. [200] Rohilla Shivani and Bhardwaj Manish. 2017. Efficient anonymization algorithms to prevent generalized losses and membership disclosure in microdata. American Journal of Data Mining and Knowledge Discovery 2, 2 (2017), 5461.Google ScholarGoogle Scholar
  201. [201] Rosenblatt Lucas, Liu Xiaoyan, Pouyanfar Samira, Leon Eduardo de, Desai Anuj, and Allen Joshua. 2020. Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537 (2020).Google ScholarGoogle Scholar
  202. [202] Rousseeuw Peter J.. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 5365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  203. [203] Rubin Donald B.. 1993. Discussion statistical disclosure limitation. Journal of Official Statistics 9, 2 (1993), 461.Google ScholarGoogle Scholar
  204. [204] Rustad Michael L. and Koenig Thomas H.. 2019. Towards a global data privacy standard. Florida Law Review 71 (2019), 365.Google ScholarGoogle Scholar
  205. [205] Group Safe Data Access Professionals Working. 2019. Handbook on Statistical Disclosure Control for Outputs. Retrieved November 1, 2022 from https://ukdataservice.ac.uk/app/uploads/thf_datareport_aw_web.pdf.Google ScholarGoogle Scholar
  206. [206] Samarati Pierangela. 2001. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 6 (2001), 10101027.Google ScholarGoogle ScholarDigital LibraryDigital Library
  207. [207] Sari W. Widodo, Irma Permata, and Murien Nugraheni. 2020. ASENVA: Summarizing anatomy model by aggregating sensitive values. In Proceedings of the 2020 International Conference on Electrical Engineering and Informatics (ICELTICs’20). IEEE, Los Alamitos, CA, 14.Google ScholarGoogle Scholar
  208. [208] Skinner C. J. and Holmes David J.. 1998. Estimating the re-identification risk per record in microdata. Journal of Official Statistics 14, 4 (1998), 361.Google ScholarGoogle Scholar
  209. [209] Skinner Chris, Marsh Catherine, Openshaw Stan, and Wymer Colin. 1994. Disclosure control for census microdata. Journal of Official Statistics–Stockholm 10 (1994), 31.Google ScholarGoogle Scholar
  210. [210] Skinner Chris J. and Elliot M. J.. 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 4 (2002), 855867.Google ScholarGoogle ScholarCross RefCross Ref
  211. [211] Soria-Comas Jordi, Domingo-Ferrer Josep, Sánchez David, and Martínez Sergio. 2014. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB Journal 23, 5 (2014), 771794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  212. [212] Soria-Comas Jordi, Domingo-Ferrer Josep, Sánchez David, and Martínez Sergio. 2014. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal 23, 5 (2014), 771794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  213. [213] Spruill Nancy. 1983. The confidentiality and analytic usefulness of masked business microdata. Proceedings of the Section on Survey Research Methods 1983 (1983), 602607.Google ScholarGoogle Scholar
  214. [214] Netherlands Statistics. 2014. \(\mu\)-ARGUS. Retrieved November 1, 2021 from https://github.com/sdcTools/muargus.Google ScholarGoogle Scholar
  215. [215] Sullivan Gary R.. 1989. The Use of Added Error to Avoid Disclosure in Microdata Releases. Ph. D. Dissertation. Iowa State University.Google ScholarGoogle ScholarDigital LibraryDigital Library
  216. [216] Susan V. Shyamala and Christopher T.. 2016. Anatomisation with slicing: A new privacy preservation approach for multiple sensitive attributes. SpringerPlus 5, 1 (2016), 121.Google ScholarGoogle ScholarCross RefCross Ref
  217. [217] Sweeney Latanya. 2000. Simple demographics often identify people uniquely. Health (San Francisco) 671, 2000 (2000), 134.Google ScholarGoogle Scholar
  218. [218] Sweeney Latanya. 2002. k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  219. [219] Akimichi Takemura. 1999. Local Recoding by Maximum Weight Matching for Disclosure Control of Microdata Sets. CIRJE F-Series CIRJE-F-40, CIRJE, Faculty of Economics, University of Tokyo.Google ScholarGoogle Scholar
  220. [220] Akimichi Takemura. 1999. Some superpopulation models for estimating the number of population uniques. In Proceedings of the Conference on Statistical Data Protection. 4558.Google ScholarGoogle Scholar
  221. [221] Tao Yufei, Chen Hekang, Xiao Xiaokui, Zhou Shuigeng, and Zhang Donghui. 2009. Angel: Enhancing the utility of generalization for privacy preserving publication. IEEE Transactions on Knowledge and Data Engineering 21, 7 (2009), 10731087.Google ScholarGoogle ScholarDigital LibraryDigital Library
  222. [222] Templ Matthias, Kowarik Alexander, and Meindl Bernhard. 2015. Statistical disclosure control for micro-data using the R package sdcMicro. Journal of Statistical Software 67, 4 (2015), 136.Google ScholarGoogle ScholarCross RefCross Ref
  223. [223] Templ Matthias and Meindl Bernhard. 2008. Robust statistics meets SDC: New disclosure risk measures for continuous microdata masking. In Proceedings of the International Conference on Privacy in Statistical Databases. 177189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  224. [224] Tendick Patrick. 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27, 3 (1991), 341353.Google ScholarGoogle ScholarCross RefCross Ref
  225. [225] Torra Vicenç. 2004. Microaggregation for categorical variables: A median based approach. In Proceedings of the International Workshop on Privacy in Statistical Databases. 162174.Google ScholarGoogle ScholarCross RefCross Ref
  226. [226] Torra Vicenç. 2017. Privacy models and disclosure risk measures. In Data Privacy: Foundations, New Developments and the Big Data Challenge. Springer, 111189.Google ScholarGoogle Scholar
  227. [227] Torra Vicenç. 2022. Guide to Data Privacy: Models, Technologies, Solutions. Springer Nature.Google ScholarGoogle Scholar
  228. [228] Torra Vicenç, Abowd John M., and Domingo-Ferrer Josep. 2006. Using Mahalanobis distance-based record linkage for disclosure risk assessment. In Proceedings of the International Conference on Privacy in Statistical Databases. 233242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  229. [229] Truta Traian Marius, Fotouhi Farshad, and Barth-Jones Daniel. 2006. Global disclosure risk for microdata with continuous attributes. In Privacy and Technologies of Identity. Springer, 349363.Google ScholarGoogle ScholarCross RefCross Ref
  230. [230] Truta Traian Marius and Vinay Bindu. 2006. Privacy protection: P-sensitive k-anonymity property. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE, Los Alamitos, CA, 94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  231. [231] Lab UT Dallas Data Security and Privacy. 2012. UTD Anonymisation ToolBox. http://cs.utdallas.edu/dspl/cgi-bin/toolbox/. Accessed Nov 2021.Google ScholarGoogle Scholar
  232. [232] Vaidya Jaideep and Clifton Chris. 2004. Privacy-preserving outlier detection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04). IEEE, Los Alamitos, CA, 233240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  233. [233] Vanichayavisalsakul Peerapong and Piromsopa Krerk. 2018. An evaluation of anonymized models and ensemble classifiers. In Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things. 1822.Google ScholarGoogle ScholarDigital LibraryDigital Library
  234. [234] Wagner Isabel and Eckhoff David. 2018. Technical privacy metrics: A systematic survey. ACM Computing Surveys 51, 3 (2018), 138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  235. [235] Wang Ke and Fung Benjamin C. M.. 2006. Anonymizing sequential releases. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 414423.Google ScholarGoogle ScholarDigital LibraryDigital Library
  236. [236] Wang Ke, Xu Yabo, Wong Raymond Chi-Wing, and Fu Ada Wai-Chee. 2010. Anonymizing temporal data. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, Los Alamitos, CA, 11091114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  237. [237] Wang Ke, Yu Philip S., and Chakraborty Sourav. 2004. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE, Los Alamitos, CA, 249256.Google ScholarGoogle ScholarCross RefCross Ref
  238. [238] Weng Cheng G. and Poon Josiah. 2008. A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian Data Mining Conference, Vol. 87 2732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  239. [239] Willenborg Leon and Waal Ton De. 1996. Statistical Disclosure Control in Practice. Vol. 111. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  240. [240] Willenborg Leon Cornelis Roelof Johannes and Waal Ton De. 2000. Elements of Statistical Disclosure Control. Lecture Notes in Statistics, Vol. 144. Springer.Google ScholarGoogle Scholar
  241. [241] Wilson Rick L. and Rosen Peter A.. 2003. Protecting data through perturbation techniques: The impact on knowledge discovery in databases. Journal of Database Management 14, 2 (2003), 1426.Google ScholarGoogle ScholarCross RefCross Ref
  242. [242] Wong Raymond Chi-Wing, Li Jiuyong, Fu Ada Wai-Chee, and Wang Ke. 2006. (\(\alpha\), k)-Anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 754759.Google ScholarGoogle ScholarDigital LibraryDigital Library
  243. [243] Xiao Xiaokui and Tao Yufei. 2006. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd International Conference on Very Large Data Bases. 139150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  244. [244] Xiao Xiaokui and Tao Yufei. 2007. M-invariance: towards privacy preserving re-publication of dynamic datasets. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 689700.Google ScholarGoogle ScholarDigital LibraryDigital Library
  245. [245] Xie Liyang, Lin Kaixiang, Wang Shu, Wang Fei, and Zhou Jiayu. 2018. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018).Google ScholarGoogle Scholar
  246. [246] Xu Jian, Wang Wei, Pei Jian, Wang Xiaoyuan, Shi Baile, and Fu Ada Wai-Chee. 2006. Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785790.Google ScholarGoogle ScholarDigital LibraryDigital Library
  247. [247] Xu Lei, Skoularidou Maria, Cuesta-Infante Alfredo, and Veeramachaneni Kalyan. 2019. Modeling tabular data using conditional GAN. In Advances in Neural Information Processing Systems 32.Google ScholarGoogle Scholar
  248. [248] Yale Andrew, Dash Saloni, Dutta Ritik, Guyon Isabelle, Pavao Adrien, and Bennett Kristin P.. 2020. Generation and evaluation of privacy preserving synthetic health data. Neurocomputing 416 (2020), 244255.Google ScholarGoogle ScholarCross RefCross Ref
  249. [249] YData. 2019. YData. Retrieved December 1, 2022 from https://ydata.ai/.Google ScholarGoogle Scholar
  250. [250] YData. 2021. YData Synthetic. Retrieved December 1, 2022 from https://github.com/ydataai/ydata-synthetic.Google ScholarGoogle Scholar
  251. [251] Ye Yifan, Wang Lixxia, Han Jianmin, Qiu Sheng, and Luo Fangwei. 2017. An anonymization method combining anatomy and permutation for protecting privacy in microdata with multiple sensitive attributes. In Proceedings of the 2017 International Conference on Machine Learning and Cybernetics (ICMLC’17), Vol. 2. IEEE, Los Alamitos, CA, 404411.Google ScholarGoogle ScholarCross RefCross Ref
  252. [252] Yeom Samuel, Giacomelli Irene, Fredrikson Matt, and Jha Somesh. 2018. Privacy risk in machine learning: Analyzing the connection to overfitting. In Proceedings of the 2018 IEEE 31st Computer Security Foundations Symposium (CSF’18). IEEE, Los Alamitos, CA, 268282.Google ScholarGoogle ScholarCross RefCross Ref
  253. [253] Zhang Qing, Koudas Nick, Srivastava Divesh, and Yu Ting. 2007. Aggregate query answering on anonymized tables. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering. 116125.Google ScholarGoogle ScholarCross RefCross Ref
  254. [254] Zhao Benjamin Zi Hao, Agrawal Aviral, Coburn Catisha, Asghar Hassan Jameel, Bhaskar Raghav, Kaafar Mohamed Ali, Webb Darren, and Dickinson Peter. 2021. On the (in) feasibility of attribute inference attacks on machine learning models. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P’21). IEEE, Los Alamitos, CA, 232251.Google ScholarGoogle ScholarCross RefCross Ref
  255. [255] Zhiwei Kong, Weimin Wei, Shuo Yang, Hua Feng, and Yan Zhao. 2017. Research progress of anonymous data release. In Proceedings of the 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS’17). IEEE, Los Alamitos, CA, 226230.Google ScholarGoogle ScholarCross RefCross Ref
  256. [256] Zigomitros Athanasios, Casino Fran, Solanas Agusti, and Patsakis Constantinos. 2020. A survey on privacy properties for data publishing of relational data. IEEE Access 8 (2020), 5107151099.Google ScholarGoogle ScholarCross RefCross Ref
  257. [257] Zorarpacı Ezgi and Özel Selma Ayşe. 2020. Privacy preserving classification over differentially private data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. Early access, December 13, 2020.Google ScholarGoogle Scholar

Index Terms

  1. Survey on Privacy-Preserving Techniques for Microdata Publication

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 55, Issue 14s
          December 2023
          1355 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/3606253
          Issue’s Table of Contents

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 July 2023
          • Online AM: 28 March 2023
          • Accepted: 10 March 2023
          • Revised: 25 December 2022
          • Received: 19 January 2022
          Published in csur Volume 55, Issue 14s

          Check for updates

          Qualifiers

          • survey

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text