survey

Survey on Privacy-Preserving Techniques for Microdata Publication

Authors:
Tânia Carvalho

University of Porto

University of Porto

0000-0002-7700-1955
Search about this author

,
Nuno Moniz

INESC TEC/University of Porto

INESC TEC/University of Porto

0000-0003-4322-1076
Search about this author

,
Pedro Faria

TekPrivacy

TekPrivacy

0009-0004-1633-7887
Search about this author

,
Luís Antunes

University of Porto

University of Porto

0000-0002-9988-594X
Search about this author

Authors Info & Claims

ACM Computing Surveys Volume 55 Issue 14sArticle No.: 309pp 1–42https://doi.org/10.1145/3588765

Published:17 July 2023Publication History

ACM Computing Surveys

Abstract

The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals’ privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques (PPTs). However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. The main goal is to protect the individual’s privacy while maintaining the interpretability of the data (i.e., its usefulness). Statistical Disclosure Control is an area that is expanding and needs to be explored since there is still no solution that guarantees optimal privacy and utility. This survey focuses on all steps of the de-identification process. We present existing PPTs used in microdata de-identification, privacy measures suitable for several disclosure types, and information loss and predictive performance measures. In this survey, we discuss the main challenges raised by privacy constraints, describe the main approaches to handle these obstacles, review the taxonomies of PPTs, provide a theoretical analysis of existing comparative studies, and raise multiple open issues.

REFERENCES

[1] Adam Nabil R. and Worthmann John C.. 1989. Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 4 (1989), 515–556.Google ScholarDigital Library
[2] Aggarwal Charu C. and Philip S. Yu. 2008. Privacy-Preserving Data Mining: Models and Algorithms. Springer Science & Business Media.Google ScholarCross Ref
[3] GmbH Aircloak. 2021. Aircloak. Retrieved November 1, 2021 from https://aircloak.com/.Google Scholar
[4] Anjum Adeel, Ahmad Naveed, Malik Saif U. R., Zubair Samiya, and Shahzad Basit. 2018. An efficient approach for publishing microdata for multiple sensitive attributes. Journal of Supercomputing 74, 10 (2018), 5127–5155.Google ScholarDigital Library
[5] Arjovsky Martin, Chintala Soumith, and Bottou Léon. 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. 214–223.Google Scholar
[6] ARX. 2013. ARX Data Anonymization Tool. Retrieved November 1, 2021 from https://arx.deidentifier.org/.Google Scholar
[7] Bacher Johann, Brand Ruth, and Bender Stefan. 2002. Re-identifying register data by survey data using cluster analysis: An empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002), 589–607.Google ScholarDigital Library
[8] Bagdasaryan Eugene, Poursaeed Omid, and Shmatikov Vitaly. 2019. Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS’19). 15453–15462. https://proceedings.neurips.cc/paper/2019/hash/fc0de4e0396fff257ea362983c2dda5a-Abstract.html.Google Scholar
[9] Bandara Eranga, Liang Xueping, Foytik Peter, Shetty Sachin, Hall Crissie, Bowden Daniel, Ranasinghe Nalin, and Zoysa Kasun De. 2021. A blockchain empowered and privacy preserving digital contact tracing platform. Information Processing & Management 58, 4 (2021), 102572.Google ScholarDigital Library
[10] Bayardo Roberto J. and Agrawal Rakesh. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, Los Alamitos, CA, 217–228.Google ScholarDigital Library
[11] Beaulieu-Jones Brett K., Wu Zhiwei Steven, Williams Chris, Lee Ran, Bhavnani Sanjeev P., Byrd James Brian, and Greene Casey S.. 2019. Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes 12, 7 (2019), e005122.Google ScholarCross Ref
[12] Bellovin Steven M., Dutta Preetam K., and Reitinger Nathan. 2019. Privacy and synthetic datasets. Stanford Technology Law Review 22 (2019), 1.Google Scholar
[13] Benedetti Roberto, Capobianchi A., and Franconi L.. 1998. Individual risk of disclosure using sampling design information. Contributi Istat 1412003 (1998), 1–15.Google Scholar
[14] Benschop Thijs, Machingauta Cathrine, and Welch Matthew. 2019. Statistical disclosure control: A practice guide. Read the Docs. Retrieved April 5, 2023 from https://buildmedia.readthedocs.org/media/pdf/sdcpractice/latest/sdcpractice.pdf.Google Scholar
[15] Bethlehem Jelke G., Keller Wouter J., and Pannekoek Jeroen. 1990. Disclosure control of microdata. Journal of the American Statistical Association 85, 409 (1990), 38–45.Google ScholarCross Ref
[16] Blanco-Justicia Alberto, Sanchez David, Domingo-Ferrer Josep, and Muralidhar Krishnamurty. 2022. A critical review on the use (and misuse) of differential privacy in machine learning. arXiv preprint arXiv:2206.04621 (2022).Google Scholar
[17] Boedihardjo March, Strohmer Thomas, and Vershynin Roman. 2022. Private sampling: A noiseless approach for generating differentially private synthetic data. SIAM Journal on Mathematics of Data Science 4, 3 (2022), 1082–1115.Google ScholarCross Ref
[18] Brand Ruth. 2002. Microdata protection through noise addition. In Inference Control in Statistical Databases. Springer, 97–116.Google ScholarCross Ref
[19] Brickell Justin and Shmatikov Vitaly. 2008. The cost of privacy: Destruction of data-mining utility in anonymized data publishing. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 70–78.Google ScholarDigital Library
[20] Budiardjo W. Widodo, Eko Kuswardono, and Wahyu Catur Wibowo. 2019. Privacy preserving data publishing with multiple sensitive attributes based on overlapped slicing. Information 10, 12 (2019), 362.Google ScholarCross Ref
[21] Buratović Ines, Miličević Mario, and Žubrinić Krunoslav. 2012. Effects of data anonymization on the data mining results. In Proceedings of the 2012 35th International Convention MIPRO. IEEE, Los Alamitos, CA, 1619–1623.Google Scholar
[22] Cao Jianneng and Karras Panagiotis. 2012. Publishing microdata with a robust privacy guarantee. Proceedings of the VLDB Endowment 5, 11 (2012), 1388–1399.Google Scholar
[23] Carvalho Tânia, Faria Pedro, Antunes Luís, and Moniz Nuno. 2021. Fundamental privacy rights in a pandemic state. PLoS One 16, 6 (2021), e0252169.Google ScholarCross Ref
[24] Carvalho Tânia and Moniz Nuno. 2021. The compromise of data privacy in predictive performance. In Advances in Intelligent Data Analysis XIX, Abreu Pedro Henriques, Rodrigues Pedro Pereira, Fernández Alberto, and Gama João (Eds.). Springer International Publishing, Cham, Switzerland, 426–438.Google ScholarDigital Library
[25] Carvalho Tânia, Moniz Nuno, Faria Pedro, and Antunes Luís. 2022. Towards a data privacy-predictive performance trade-off. arxiv:2201.05226 [cs.LG] (2022).Google Scholar
[26] Carvalho Tânia, Moniz Nuno, Faria Pedro, Antunes Luís, and Chawla Nitesh. 2022. Privacy-preserving data synthetisation for secure information sharing. arXiv preprint arXiv:2212.00484 (2022).Google Scholar
[27] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.Google ScholarDigital Library
[28] Choi Edward, Biswal Siddharth, Malin Bradley, Duke Jon, Stewart Walter F., and Sun Jimeng. 2017. Generating multi-label discrete patient records using generative adversarial networks. In Proceedings of the Machine Learning for Healthcare Conference. 286–305.Google Scholar
[29] Group Cornell Database. 2009. Cornell Anonymization Toolkit. Retrieved November 1, 2021 from https://sourceforge.net/projects/anony-toolkit/.Google Scholar
[30] Europe Council of. 1981. Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data. Retrieved December 1, 2022 from https://rm.coe.int/1680078b37.Google Scholar
[31] Cox Lawrence H.. 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 370 (1980), 377–385.Google ScholarCross Ref
[32] Crato Nuno and Paruolo Paolo. 2019. The power of microdata: An introduction. In Data-Driven Policy Impact Evaluation. Springer, Cham, Switzerland, 1–14.Google ScholarCross Ref
[33] Cunha Mariana, Mendes Ricardo, and Vilela João P.. 2021. A survey of privacy-preserving mechanisms for heterogeneous data types. Computer Science Review 41 (2021), 100403.Google ScholarDigital Library
[34] Dalenius Tore. 1981. A simple procedure for controlled rounding. Statistik Tidskrift 3 (1981), 202–208.Google Scholar
[35] Dalenius Tore and Reiss Steven P.. 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 1 (1982), 73–85.Google ScholarCross Ref
[36] Dandekar Ramesh A., Domingo-Ferrer Josep, and Sebé Francesc. 2002. LHS-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases. Springer, 153–162.Google Scholar
[37] Danezis George, Domingo-Ferrer Josep, Hansen Marit, Hoepman Jaap-Henk, Métayer Daniel Le, Tirtea Rodica, and Schiffner Stefan. 2014. Privacy and Data Protection by Design—From Policy to Engineering. European Union Agency for Network and Information Security (ENISA), Heraklion, Greece.Google Scholar
[38] Dankar Fida Kamal, Emam Khaled El, Neisa Angelica, and Roffey Tyson. 2012. Estimating the re-identification risk of clinical data sets. BMC Medical Informatics and Decision Making 12, 1 (2012), 1–15.Google ScholarCross Ref
[39] Dankar Fida K. and Ibrahim Mahmoud. 2021. Fake it till you make it: Guidelines for effective synthetic data generation. Applied Sciences 11, 5 (2021), 2158.Google ScholarCross Ref
[40] Davies David L. and Bouldin Donald W.. 1979. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence2 (1979), 224–227.Google ScholarDigital Library
[41] Waal A. G. De, Hundepool A. J., and Willenborg L. C. R. J.. 1996. Argus: Software for statistical disclosure control of microdata. In Proceedings of the 1996 Annual Research Conference.Google Scholar
[42] Waal Ton De and Willenborg Leon Cornelis Roelof Johannes. 1996. A view on statistical disclosure control for microdata. Survey Methodology 22, 1 (1996), 95–103.Google Scholar
[43] Defays D. and Nanopoulos P.. 1993. Panels of enterprises and confidentiality: The small aggregates method. In Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys. 195–204.Google Scholar
[44] Domingo-Ferrer Josep. 2008. A survey of inference control methods for privacy-preserving data mining. In Privacy-Preserving Data Mining. Springer, 53–80.Google Scholar
[45] Domingo-Ferrer Josep, Farras Oriol, Ribes-González Jordi, and Sánchez David. 2019. Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges. Computer Communications 140 (2019), 38–60.Google ScholarDigital Library
[46] Domingo-Ferrer Josep and González-Nicolás Úrsula. 2010. Hybrid microdata using microaggregation. Information Sciences 180, 15 (2010), 2834–2844.Google ScholarDigital Library
[47] Domingo-Ferrer Josep, Martínez-Ballesté Antoni, Mateo-Sanz Josep Maria, and Sebé Francesc. 2006. Efficient multivariate data-oriented microaggregation. VLDB Journal 15, 4 (2006), 355–369.Google ScholarDigital Library
[48] Domingo-Ferrer Josep and Mateo-Sanz Josep Maria. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 1 (2002), 189–201.Google ScholarDigital Library
[49] Domingo-Ferrer Josep, Mateo-Sanz Josep M., and Torra Vincenc. 2001. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In Pre-Proceedings of ETK-NTTS, Vol. 2. 807–826.Google Scholar
[50] Domingo-Ferrer Josep, Oganian Anna, Torres Àngel, and Mateo-Sanz Josep M.. 2002. On the security of microaggregation with individual ranking: Analytical attacks. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 477–491.Google ScholarDigital Library
[51] Domingo-Ferrer Josep, Sánchez David, and Soria-Comas Jordi. 2016. Database anonymization: Privacy models, data utility, and microaggregation-based inter-model connections. Synthesis Lectures on Information Security, Privacy, and Trust 8, 1 (2016), 1–136.Google ScholarCross Ref
[52] Domingo-Ferrer Josep and Torra Vicenc. 2001. Disclosure control methods and information loss for microdata. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies 2001 (2001), 91–110.Google Scholar
[53] Domingo-Ferrer Josep and Torra Vicenç. 2002. Distance-based and probabilistic record linkage for re-identification of records with categorical variables. Butlletí de lACIA, Associació Catalana dIntelligència Artificial 2002 (2002), 243–250.Google Scholar
[54] Domingo-Ferrer Josep and Torra Vicenç. 2004. Disclosure risk assessment in statistical data protection. Journal of Computational and Applied Mathematics 164 (2004), 285–293.Google ScholarDigital Library
[55] George Duncan and Stephen Roehrig. 2001. Disclosure limitation methods and information loss for tabular data. Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies 2001 (2001), 135–166.Google Scholar
[56] Dupriez Olivier and Boyko Ernie. 2010. Dissemination of Microdata Files: Principles Procedures and Practices. International Household Survey Network.Google Scholar
[57] Dwork Cynthia. 2006. Differential privacy. In Automata, Languages and Programming. Lecture Notes in Computer Science, Vol. 4052. Springer, 1–12.Google ScholarDigital Library
[58] Emam Khaled El and Dankar Fida Kamal. 2008. Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15, 5 (2008), 627–637.Google ScholarCross Ref
[59] Elliot Mark J., Manning Anna M., and Ford Rupert W.. 2002. A computational algorithm for handling the special uniques problem. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 493–509.Google ScholarDigital Library
[60] Commission European. 2014. Guidelines on output checking. CROS. Retrieved November 1, 2022 from https://ec.europa.eu/eurostat/cros/content/guidelines-output-checking_en.Google Scholar
[61] Commission European. 2014. Opinion 05/2014 on Anonymisation Techniques. Retrieved February 5, 2021 from https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf.Google Scholar
[62] Commission European. 2017. Guidelines on Personal Data Breach Notification Under Regulation 2016/679 (wp250rev.01). Retrieved September 1, 2021 from https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=612052.Google Scholar
[63] Commission European. 2021. Statistical Disclosure Control for Business Microdata. Retrieved September 1, 2021 from https://ec.europa.eu/eurostat/documents/54610/7779382/Statistical-Disclosure-Control-in-business-statistics.pdf.Google Scholar
[64] Commission European. 2022. Microdata Access. Retrieved November 1, 2022 from https://ec.europa.eu/eurostat/cros/content/microdata-access_en.Google Scholar
[65] Board European Data Protection. 2021. Guidelines 07/2020 on the Concepts of Controller and Processor in the GDPR. Retrieved October 1, 2021 from https://edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-072020-concepts-controller-and-processor-gdpr_en.Google Scholar
[66] Supervisor European Data Protection. 2022. Accountability. Retrieved December 1, 2022 from https://edps.europa.eu/data-protection/our-work/subjects/accountability_en.Google Scholar
[67] Union European. 1995. Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. EUR-Lex. Retrieved December 1, 2022 from https://eur-lex.europa.eu/eli/dir/1995/46/oj.Google Scholar
[68] Ewens Warren John. 1990. Population genetics theory—The past and the future. In Mathematical and Statistical Developments of Evolutionary Theory. Springer, 177–227.Google ScholarCross Ref
[69] Fadel Augusto César, Ochi Luiz Satoru, Brito José André de Moura, and Semaan Gustavo Silva. 2021. Microaggregation heuristic applied to statistical disclosure control. Information Sciences 548 (2021), 37–55.Google ScholarCross Ref
[70] Fang Mei Ling, Dhami Devendra Singh, and Kersting Kristian. 2022. DP-CTGAN: Differentially private medical data generation using CTGANs. In Proceedings of the International Conference on Artificial Intelligence in Medicine. 178–188.Google ScholarDigital Library
[71] Fellegi Ivan P. and Sunter Alan B.. 1969. A theory for record linkage. Journal of the American Statistical Association 64, 328 (1969), 1183–1210.Google ScholarCross Ref
[72] Fienberg Stephen E. and McIntyre Julie. 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In Privacy in Statistical Databases, Domingo-Ferrer Josep and Torra Vicenç (Eds.). Springer, Berlin, Germany, 14–29.Google ScholarCross Ref
[73] Figueira Alvaro and Vaz Bruno. 2022. Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10, 15 (2022), 2733.Google Scholar
[74] Fiore Marco, Katsikouli Panagiota, Zavou Elli, Cunche Mathieu, Fessant Françoise, Hello Dominique Le, Aïvodji Ulrich Matchi, Olivier Baptiste, Quertier Tony, and Stanica Razvan. 2019. Privacy of trajectory micro-data: A survey. arxiv:1903.12211 (2019).Google Scholar
[75] Fletcher Sam and Islam Md. Zahidul. 2015. Measuring information quality for privacy preserving data mining. International Journal of Computer Theory and Engineering 7, 1 (2015), 21.Google ScholarCross Ref
[76] Foschi Flavio. 2011. Disclosure risk for high dimensional business microdata. In Proceedings of the Joint UNECE-Eurostat Work Session on Statistical Data Confidentiality.26–28.Google Scholar
[77] Fowlkes Edward B. and Mallows Colin L.. 1983. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78, 383 (1983), 553–569.Google ScholarCross Ref
[78] Fredrikson Matthew, Lantz Eric, Jha Somesh, Lin Simon, Page David, and Ristenpart Thomas. 2014. Privacy in pharmacogenetics: An end-to-end study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’14). 17–32.Google Scholar
[79] Fung Benjamin C. M., Wang Ke, Chen Rui, and Yu Philip S.. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 42, 4 (2010), 1–53.Google ScholarDigital Library
[80] Fung Benjamin C. M., Wang Ke, Fu Ada Wai-Chee, and Philip S. Yu. 2010. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques. CRC Press, Boca Raton, FL.Google ScholarCross Ref
[81] Fung Benjamin C. M., Wang Ke, Wang Lingyu, and Debbabi Mourad. 2008. A framework for privacy-preserving cluster analysis. In Proceedings of the 2008 IEEE International Conference on Intelligence and Security Informatics. IEEE, Los Alamitos, CA, 46–51.Google ScholarCross Ref
[82] Fung Benjamin C. M., Wang Ke, Wang Lingyu, and Hung Patrick C. K.. 2009. Privacy-preserving data publishing for cluster analysis. Data & Knowledge Engineering 68, 6 (2009), 552–575.Google ScholarDigital Library
[83] Fung Benjamin C. M., Wang Ke, and Yu Philip S.. 2005. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, Los Alamitos, CA, 205–216.Google ScholarDigital Library
[84] Gallacher Guillermo and Hossain Iqbal. 2020. Remote work and employment dynamics under COVID-19: Evidence from Canada. Canadian Public Policy 46, S1 (2020), 44–54.Google ScholarCross Ref
[85] Gardner Lauren, Ratcliff Jeremy, Dong Ensheng, and Katz Aaron. 2021. A need for open public data standards and sharing in light of COVID-19. Lancet Infectious Diseases 21, 4 (2021), e80.Google ScholarCross Ref
[86] Gouweleeuw José, Kooiman Peter, Willenborg Leon, and Wolf Paul P. de. 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14, 4 (1998), 463.Google Scholar
[87] Gretel. 2019. Gretel. Accessed December 1, 2022 from https://gretel.ai/.Google Scholar
[88] Gretel. 2020. Gretel Synthetics. Retrieved December 1, 2022 from https://github.com/gretelai/gretel-synthetics.Google Scholar
[89] Hall Rob and Fienberg Stephen E.. 2010. Privacy-preserving record linkage. In Proceedings of the International Conference on Privacy in Statistical Databases. 269–283.Google ScholarCross Ref
[90] Han Jianmin, Luo Fangwei, Lu Jianfeng, and Peng Hao. 2013. SLOMS: A privacy preserving data publishing method for multiple sensitive attributes microdata. Journal of Software 8, 12 (2013), 3096–3104.Google ScholarCross Ref
[91] Hansen Stephen Lee and Mukherjee Sumitra. 2003. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering 15, 4 (2003), 1043–1044.Google ScholarDigital Library
[92] Hardt Moritz, Ligett Katrina, and McSherry Frank. 2012. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems 25.Google Scholar
[93] Hasan A. S. M. Touhidul, Jiang Qingshan, Luo Jun, Li Chengming, and Chen Lifei. 2016. An effective value swapping method for privacy preserving data publishing. Security and Communication Networks 9, 16 (2016), 3219–3228.Google ScholarDigital Library
[94] He Xianmang, Xiao Yanghua, Li Yujia, Wang Qing, Wang Wei, and Shi Baile. 2012. Permutation anonymization: Improving anatomy for privacy preservation in data publication. In New Frontiers in Applied Data Mining, Cao Longbing, Huang Joshua Zhexue, Bailey James, Koh Yun Sing, and Luo Jun (Eds.). Springer, Berlin, Germany, 111–123.Google Scholar
[95] Heer G. R.. 1993. A bootstrap procedure to preserve statistical confidentiality in contingency tables. In Proceedings of the International Seminar on Statistical Confidentiality. 261–271.Google Scholar
[96] Herzog Thomas N., Scheuren Fritz J., and Winkler William E.. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.Google ScholarDigital Library
[97] Hittmeir Markus, Ekelhart Andreas, and Mayer Rudolf. 2019. On the utility of synthetic data: An empirical evaluation on machine learning tasks. In Proceedings of the 14th International Conference on Availability, Reliability, and Security. 1–6.Google ScholarDigital Library
[98] Hittmeir Markus, Ekelhart Andreas, and Mayer Rudolf. 2019. Utility and privacy assessments of synthetic data for regression tasks. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 5763–5772.Google ScholarCross Ref
[99] Hoffman Lance J.. 1969. Computers and privacy: A survey. ACM Computing Surveys 1, 2 (1969), 85–103.Google ScholarDigital Library
[100] Hoshino Nobuaki. 2001. Applying Pitman’s sampling formula to microdata disclosure risk assessment. Journal of Official Statistics 17, 4 (2001), 499.Google Scholar
[101] Humbert Mathias, Trubert Benjamin, and Huguenin Kévin. 2019. A survey on interdependent privacy. ACM Computing Surveys 52, 6 (2019), 1–40.Google ScholarDigital Library
[102] Hundepool Anco, Domingo-Ferrer Josep, Franconi Luisa, Giessing Sarah, Lenz Rainer, Longhurst Jane, Nordholt E. Schulte, Seri Giovanni, and Wolf P.. 2010. Handbook on Statistical Disclosure Control. ESSnet on Statistical Disclosure Control.Google Scholar
[103] Hundepool Anco, Domingo-Ferrer Josep, Franconi Luisa, Giessing Sarah, Nordholt Eric Schulte, Spicer Keith, and Wolf Peter-Paul De. 2012. Statistical Disclosure Control. Vol. 2. Wiley, New York, NY.Google ScholarCross Ref
[104] Hurkens C. A. J. and Tiourine S. R.. 1998. Models and methods for the microdata protection problem. Journal of Official Statistics 14, 4 (1998), 437.Google Scholar
[105] Hutter Frank, Kotthoff Lars, and Vanschoren Joaquin (Eds.). 2018. Automated Machine Learning: Methods, Systems, Challenges. Springer.Google Scholar
[106] Ichim Daniela. 2009. Disclosure control of business microdata: A density-based approach. International Statistical Review 77, 2 (2009), 196–211.Google ScholarCross Ref
[107] Iftikhar Masooma, Wang Qing, and Lin Yu. 2019. Publishing differentially private datasets via stable microaggregation. In Proceedings of the 22nd International Conference on Extending Database Technology (EDBT’19). 662–665.Google Scholar
[108] Inan Ali, Kantarcioglu Murat, and Bertino Elisa. 2009. Using anonymized data for classification. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering. IEEE, Los Alamitos, CA, 429–440.Google ScholarDigital Library
[109] Office Information Commissioner’s. 2022. Accountability and governance. ICO. Retrieved December 1, 2022 from https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/.Google Scholar
[110] Office Information Commissioner’s. 2022. What does it mean if you are a controller? ICO. Retrieved December 1, 2022 from https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/controllers-and-processors/what-does-it-mean-if-you-are-a-controller/.Google Scholar
[111] Ito Shinsuke and Hoshino Naomi. 2014. Data swapping as a more efficient tool to create anonymized census microdata in Japan. In Proceedings of Privacy in Statistical Databases. 1–14.Google Scholar
[112] Ito Shinsuke, Yoshitake Toru, Kikuchi Ryo, and Akutsu Fumika. 2018. Comparative study of the effectiveness of perturbative methods for creating official microdata in Japan. In Privacy in Statistical Databases, Domingo-Ferrer Josep and Montes Francisco (Eds.). Springer International Publishing, Cham, Switzerland, 200–214.Google ScholarDigital Library
[113] Iyengar Vijay S.. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 279–288.Google ScholarDigital Library
[114] Jaro Matthew A.. 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association 84, 406 (1989), 414–420.Google ScholarCross Ref
[115] Jordon James, Yoon Jinsung, and Schaar Mihaela Van Der. 2018. PATE-GAN: Generating synthetic data with differential privacy guarantees. In Proceedings of the International Conference on Learning Representations.Google Scholar
[116] Jung Gyuwon, Lee Hyunsoo, Kim Auk, and Lee Uichin. 2020. Too much information: Assessing privacy risks of contact trace data disclosure on people with COVID-19 in South Korea. Frontiers in Public Health 8 (2020), 305.Google Scholar
[117] Kent Allen, Berry Madeline M., Luehrs Fred U., and Perry J. W.. 1955. Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation 6, 2 (1955), 93–101.Google ScholarCross Ref
[118] Kifer Daniel and Gehrke Johannes. 2006. Injecting utility into anonymized datasets. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 217–228.Google ScholarDigital Library
[119] Kim Jay J.. 1986. A method for limiting disclosure in microdata based on random noise and transformation. In Proceedings of the Section on Survey Research Methods. American Statistical Association, Alexandria, VA, 303–308.Google Scholar
[120] Kotal Anantaa, Piplai Aritran, Chukkapalli Sai Sree Laya, and Joshi Anupam. 2022. PriveTAB: Secure and privacy-preserving sharing of tabular data. In Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics. 35–45.Google ScholarDigital Library
[121] Kowarik A., Templ M., Meindl B., and Fonteneau F.. 2013. sdcMicroGUI: Graphical user interface for package sdcMicro. Retrieved April 5, 2023 from https://rdrr.io/cran/sdcMicroGUI/.Google Scholar
[122] Kubat Miroslav, Holte Robert C., and Matwin Stan. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30, 2 (1998), 195–215.Google ScholarDigital Library
[123] Kullback Solomon and Leibler Richard A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 1 (1951), 79–86.Google ScholarCross Ref
[124] Kunar Aditya. 2021. Effective and privacy preserving tabular data synthesizing. arXiv preprint arXiv:2108.10064 (2021).Google Scholar
[125] Laszlo Michael and Mukherjee Sumitra. 2005. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17, 7 (2005), 902–911.Google ScholarDigital Library
[126] Laszlo Michael and Mukherjee Sumitra. 2009. Approximation bounds for minimum information loss microaggregation. IEEE Transactions on Knowledge and Data Engineering 21, 11 (2009), 1643–1647.Google ScholarDigital Library
[127] Lee Jaewoo and Clifton Chris. 2011. How much is enough? Choosing \(\varepsilon\) for differential privacy. In Proceedings of the International Conference on Information Security. 325–340.Google ScholarCross Ref
[128] LeFevre Kristen, DeWitt David J., and Ramakrishnan Raghu. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 49–60.Google ScholarDigital Library
[129] LeFevre Kristen, DeWitt David J., and Ramakrishnan Raghu. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, Los Alamitos, CA, 25–25.Google ScholarDigital Library
[130] LeFevre Kristen, DeWitt David J., and Ramakrishnan Raghu. 2006. Workload-aware anonymization. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 277–286.Google ScholarDigital Library
[131] Li Boyu, He Kun, and Sun Geng. 2023. Local generalization and bucketization technique for personalized privacy preservation. Journal of King Saud University: Computer and Information Sciences 35, 1 (2023), 393–404.Google Scholar
[132] Li Boyu, Liu Yanheng, Han Xu, and Zhang Jindong. 2017. Cross-bucket generalization for information and privacy preservation. IEEE Transactions on Knowledge and Data Engineering 30, 3 (2017), 449–459.Google ScholarCross Ref
[133] Li Jiuyong, Liu Jixue, Baig Muzammil, and Wong Raymond Chi-Wing. 2011. Information based data anonymization for classification utility. Data & Knowledge Engineering 70, 12 (2011), 1030–1045.Google ScholarDigital Library
[134] Li Jiexun, Wang G. Alan, and Chen Hsinchun. 2011. Identity matching using personal and social identity features. Information Systems Frontiers 13, 1 (2011), 101–113.Google ScholarDigital Library
[135] Li Ninghui, Li Tiancheng, and Venkatasubramanian Suresh. 2007. T-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering. IEEE, Los Alamitos, CA, 106–115.Google ScholarCross Ref
[136] Li Tiancheng and Li Ninghui. 2009. On the tradeoff between privacy and utility in data publishing. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 517–526.Google ScholarDigital Library
[137] Li Tiancheng, Li Ninghui, Zhang Jian, and Molloy Ian. 2010. Slicing: A new approach for privacy preserving data publishing. IEEE Transactions on Knowledge and Data Engineering 24, 3 (2010), 561–574.Google ScholarDigital Library
[138] Liao Dan, Li Hui, Sun Gang, Zhang Ming, and Chang Victor. 2018. Location and trajectory privacy preservation in 5G-enabled vehicle social network services. Journal of Network and Computer Applications 110 (2018), 108–118.Google ScholarCross Ref
[139] Lin Jun-Lin, Chang Pei-Chann, Liu Julie Yu-Chih, and Wen Tsung-Hsien. 2010. Comparison of microaggregation approaches on anonymized data quality. Expert Systems with Applications 37, 12 (2010), 8161–8165.Google ScholarDigital Library
[140] Little Roderick J. A.. 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 2 (1993), 407.Google Scholar
[141] Little Roderick J. A., Liu Fang, and Raghunathan Trivellore E.. 2004. Statistical disclosure techniques based on multiple imputation. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives: An Essential Journey with Donald Rubin’s Statistical Family, Andrew Gelman and Xiao-Li Meng (Eds.). Wiley, 141–152.Google Scholar
[142] Liu Jiaxiang, Oya Simon, and Kerschbaum Florian. 2021. Generalization techniques empirically outperform differential privacy against membership inference. arXiv preprint arXiv:2110.05524 (2021). https://arxiv.org/abs/2110.05524.Google Scholar
[143] Liu Kun, Liu Wenyan, Cheng Junhong, and Lu Xingjian. 2019. UHRP: Uncertainty-based pruning method for anonymized data linear regression. In Proceedings of the International Conference on Database Systems for Advanced Applications. 19–33.Google ScholarDigital Library
[144] Liu Tianen, Wang Yingjie, Cai Zhipeng, Tong Xiangrong, Pan Qingxian, and Zhao Jindong. 2020. A dynamic privacy protection mechanism for spatiotemporal crowdsourcing. Security and Communication Networks 2020 (2020), 1–14.Google ScholarDigital Library
[145] Liu Yi, Peng Jialiang, James J. Q., and Wu Yi. 2019. PPGAN: Privacy-preserving generative adversarial network. In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS’19). IEEE, Los Alamitos, CA, 985–989.Google ScholarCross Ref
[146] Lyu Lingjuan, Law Yee Wei, Ng Kee Siong, Xue Shibei, Zhao Jun, Yang Mengmeng, and Liu Lei. 2020. Towards distributed privacy-preserving prediction. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC’20). IEEE, Los Alamitos, CA, 4179–4184.Google ScholarDigital Library
[147] Machanavajjhala Ashwin, Kifer Daniel, Abowd John, Gehrke Johannes, and Vilhuber Lars. 2008. Privacy: Theory meets practice on the map. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. IEEE, Los Alamitos, CA, 277–286.Google ScholarDigital Library
[148] Machanavajjhala Ashwin, Kifer Daniel, Gehrke Johannes, and Venkitasubramaniam Muthuramakrishnan. 2007. l-Diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 3–es.Google ScholarDigital Library
[149] Mackey Elaine, Elliot Mark, and O’Hara Kieron. 2016. The Anonymisation Decision-Making Framework. UKAN Publications.Google Scholar
[150] Majeed Abdul and Lee Sungchang. 2021. Anonymization techniques for privacy preserving data publishing: A comprehensive survey. IEEE Access 9 (2021), 8512–8545.Google ScholarCross Ref
[151] Manning Anna M., Haglin David J., and Keane John A.. 2008. A recursive search algorithm for statistical disclosure assessment. Data Mining and Knowledge Discovery 16, 2 (2008), 165–196.Google ScholarDigital Library
[152] Martínez Sergio, Sánchez David, and Valls Aida. 2012. Semantic adaptive microaggregation of categorical microdata. Computers & Security 31, 5 (2012), 653–672.Google ScholarDigital Library
[153] Mateo-Sanz Josep Maria, Sebé Francesc, and Domingo-Ferrer Josep. 2004. Outlier protection in continuous microdata masking. In Proceedings of the International Workshop on Privacy in Statistical Databases. 201–215.Google ScholarCross Ref
[154] Matthews Gregory J. and Harel Ofer. 2011. Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy. Statistics Surveys 5 (2011), 1–29.Google ScholarCross Ref
[155] Matwin Stan, Nin Jordi, Sehatkar Morvarid, and Szapiro Tomasz. 2015. A review of attribute disclosure control. In Advanced Research in Data Privacy. Studies in Computational Intelligence, Vol. 567. Springer, 41–61.Google ScholarCross Ref
[156] Mitchell Margaret, Wu Simone, Zaldivar Andrew, Barnes Parker, Vasserman Lucy, Hutchinson Ben, Spitzer Elena, Raji Inioluwa Deborah, and Gebru Timnit. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 220–229.Google ScholarDigital Library
[157] Mivule Kato. 2013. Utilizing noise addition for data privacy, an overview. arXiv preprint arXiv:1309.3958 (2013).Google Scholar
[158] Mivule Kato and Turner Claude. 2013. A comparative analysis of data privacy and utility parameter adjustment, using machine learning classification as a gauge. Procedia Computer Science 20 (2013), 414–419.Google ScholarCross Ref
[159] Mivule Kato, Turner Claude, and Ji Soo-Yeon. 2012. Towards a differential privacy and utility preserving machine learning classifier. Procedia Computer Science 12 (2012), 176–181.Google ScholarCross Ref
[160] Mohammed Noman, Fung Benjamin C. M., Hung Patrick C. K., and Lee Cheuk-Kwong. 2010. Centralized and distributed anonymization for high-dimensional healthcare data. ACM Transactions on Knowledge Discovery from Data 4, 4 (2010), 1–33.Google ScholarDigital Library
[161] Moore Richard. 1996. Controlled Data-Swapping Techniques for Masking Public Use Microdata Sets. U.S. Census Bureau.Google Scholar
[162] AI MOSTLY. 2017. MOSTLY AI. Retrieved December 1, 2022 from https://mostly.ai/.Google Scholar
[163] AI MOSTLY. 2020. Virtual Data Lab (VDL). Retrieved December 1, 2022 from https://github.com/mostly-ai/virtualdatalab.Google Scholar
[164] Muralidhar Krishnamurty and Domingo-Ferrer Josep. 2016. Rank-based record linkage for re-identification risk assessment. In Proceedings of the International Conference on Privacy in Statistical Databases. 225–236.Google ScholarCross Ref
[165] Muralidhar Krishnamurty, Domingo-Ferrer Josep, and Martínez Sergio. 2020. \(\epsilon\)-Differential privacy for microdata releases does not guarantee confidentiality (let alone utility). In Proceedings of the International Conference on Privacy in Statistical Databases. 21–31.Google ScholarDigital Library
[166] Muralidhar Krishnamurty and Sarathy Rathindra. 2003. A theoretical basis for perturbation methods. Statistics and Computing 13, 4 (2003), 329–335.Google ScholarDigital Library
[167] Muralidhar Krishnamurty and Sarathy Rathindra. 2003. A rejoinder to the comments by Polettini and Stander. Statistics and Computing 13, 4 (2003), 339–342.Google ScholarDigital Library
[168] Muralidhar Krishnamurty and Sarathy Rathindra. 2006. Data shuffling—A new masking approach for numerical data. Management Science 52, 5 (2006), 658–670.Google ScholarDigital Library
[169] Muralidhar Krish, Sarathy Rathindra, and Dandekar Ramesh. 2006. Why swap when you can shuffle? A comparison of the proximity swap and data shuffle for numeric data. In Proceedings of the International Conference on Privacy in Statistical Databases. 164–176.Google ScholarDigital Library
[170] Jr. Jeffrey Murray, Mashhadi Afra, Lagesse Brent, and Stiber Michael. 2021. Privacy preserving techniques applied to CPNI data: Analysis and recommendations. arXiv preprint arXiv:2101.09834 (2021).Google Scholar
[171] Nanni Mirco, Andrienko Gennady, Barabási Albert-László, Boldrini Chiara, Bonchi Francesco, Cattuto Ciro, Chiaromonte Francesca, et al. 2021. Give more data, awareness and control to individual citizens, and they will help COVID-19 containment. Ethics and Information Technology 23, 1 (2021), 1–6.Google ScholarDigital Library
[172] Narayanan Arvind and Shmatikov Vitaly. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP’08). IEEE, Los Alamitos, CA, 111–125.Google ScholarDigital Library
[173] Nawaz Asif and Kazemian Hassan. 2021. A fuzzy approach to identity resolution. In Proceedings of the International Conference on Engineering Applications of Neural Networks. 307–318.Google ScholarCross Ref
[174] Nayak Tapan K., Sinha Bimal, and Zayatz Laura. 2011. Statistical properties of multiplicative noise masking for confidentiality protection. Journal of Official Statistics 27, 3 (2011), 527.Google Scholar
[175] Nergiz Mehmet Ercan, Atzori Maurizio, and Clifton Chris. 2007. Hiding the presence of individuals from shared databases. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 665–676.Google ScholarDigital Library
[176] Nergiz M. Ercan and Clifton Chris. 2007. Thoughts on k-anonymization. Data & Knowledge Engineering 63, 3 (2007), 622–645.Google ScholarDigital Library
[177] Nin Jordi, Herranz Javier, and Torra Vicenç. 2008. Rethinking rank swapping to decrease disclosure risk. Data & Knowledge Engineering 64, 1 (2008), 346–364.Google ScholarDigital Library
[178] Nowok Beata. 2015. Utility of synthetic microdata generated using tree-based methods. In Proceedings of the UNECE Statistical Data Confidentiality Work Session. 1–11.Google Scholar
[179] Ochoa Salvador, Rasmussen Jamie, Robson Christine, and Salib Michael. 2001. Reidentification of Individuals in Chicago’s Homicide Database: A Technical and Legal Study. Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
[180] Ohm Paul. 2009. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57 (2009), 1701.Google Scholar
[181] Ohno-Machado Lucila, Vinterbo Staal, and Dreiseitl Stephan. 2002. Effects of data anonymization by cell suppression on descriptive statistics and predictive modeling performance. Journal of the American Medical Informatics Association 9, Suppl. 6 (2002), 115–119.Google ScholarCross Ref
[182] Oliveira Stanley R. M. and Zaiane Osmar R.. 2010. Privacy preserving clustering by data transformation. Journal of Information and Data Management 1, 1 (2010), 37.Google Scholar
[183] OpenAIRE. 2021. Amnesia. Retrieved November 1, 2021 from https://amnesia.openaire.eu.Google Scholar
[184] Orooji Marmar and Knapp Gerald M.. 2019. Improving suppression to reduce disclosure risk and enhance data utility. arXiv preprint arXiv:1901.00716 (2019).Google Scholar
[185] Pagliuca D. and Seri G.. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey. Esprit SDC Project, Deliverable MI-3/S1. Esprit.Google Scholar
[186] Patki Neha, Wedge Roy, and Veeramachaneni Kalyan. 2016. The synthetic data vault. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA’16). 399–410. Google ScholarCross Ref
[187] Peiffer-Smadja Nathan, Maatoug Redwan, Lescure François-Xavier, D’ortenzio Eric, Pineau Joëlle, and King Jean-Rémi. 2020. Machine learning for COVID-19 needs global collaboration and data-sharing. Nature Machine Intelligence 2, 6 (2020), 293–294.Google ScholarCross Ref
[188] Ping Haoyue, Stoyanovich Julia, and Howe Bill. 2017. DataSynthesizer: Privacy-preserving synthetic datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 1–5.Google ScholarDigital Library
[189] Prasser Fabian, Eicher Johanna, Spengler Helmut, Bild Raffael, and Kuhn Klaus A.. 2020. Flexible data anonymization using ARX—Current status and challenges ahead. Software: Practice and Experience 50, 7 (2020), 1277–1304.Google ScholarCross Ref
[190] Prasser Fabian, Kohlmayer Florian, and Kuhn Klaus A.. 2016. The importance of context: Risk-based de-identification of biomedical data. Methods of Information in Medicine 55, 4 (2016), 347–355.Google ScholarCross Ref
[191] Radanliev Petar, Roure David De, and Walton Rob. 2020. Data mining and analysis of scientific research data records on Covid-19 mortality, immunity, and vaccine development—In the first wave of the Covid-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 14, 5 (2020), 1121–1132.Google ScholarCross Ref
[192] Rand William M.. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 336 (1971), 846–850.Google ScholarDigital Library
[193] Rankin Debbie, Black Michaela, Bond Raymond, Wallace Jonathan, Mulvenna Maurice, and Gorka Epelde. 2020. Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8, 7 (2020), e18910.Google ScholarCross Ref
[194] Reiter Jerome P.. 2005. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 472 (2005), 1103–1112.Google ScholarCross Ref
[195] Reiter Jerome P.. 2005. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21, 3 (2005), 441.Google Scholar
[196] Rijsbergen C. J. Van. 1979. Information Retrieval. Butterworth-Heinemann.Google ScholarDigital Library
[197] Ritchie Felix. 2009. UK release practices for official microdata. Statistical Journal of the IAOS 26, 3, 4 (2009), 103–111.Google Scholar
[198] Rocher Luc, Hendrickx Julien M., and Montjoye Yves-Alexandre De. 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10, 1 (2019), 1–9.Google ScholarCross Ref
[199] Rockett Ian R. H., Caine Eric D., Connery Hilary S., D’Onofrio Gail, Gunnell David J., Miller Ted R., Nolte Kurt B., et al. 2018. Discerning suicide in drug intoxication deaths: Paucity and primacy of suicide notes and psychiatric history. PLoS One 13, 1 (2018), e0190200.Google ScholarCross Ref
[200] Rohilla Shivani and Bhardwaj Manish. 2017. Efficient anonymization algorithms to prevent generalized losses and membership disclosure in microdata. American Journal of Data Mining and Knowledge Discovery 2, 2 (2017), 54–61.Google Scholar
[201] Rosenblatt Lucas, Liu Xiaoyan, Pouyanfar Samira, Leon Eduardo de, Desai Anuj, and Allen Joshua. 2020. Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537 (2020).Google Scholar
[202] Rousseeuw Peter J.. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53–65.Google ScholarDigital Library
[203] Rubin Donald B.. 1993. Discussion statistical disclosure limitation. Journal of Official Statistics 9, 2 (1993), 461.Google Scholar
[204] Rustad Michael L. and Koenig Thomas H.. 2019. Towards a global data privacy standard. Florida Law Review 71 (2019), 365.Google Scholar
[205] Group Safe Data Access Professionals Working. 2019. Handbook on Statistical Disclosure Control for Outputs. Retrieved November 1, 2022 from https://ukdataservice.ac.uk/app/uploads/thf_datareport_aw_web.pdf.Google Scholar
[206] Samarati Pierangela. 2001. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 6 (2001), 1010–1027.Google ScholarDigital Library
[207] Sari W. Widodo, Irma Permata, and Murien Nugraheni. 2020. ASENVA: Summarizing anatomy model by aggregating sensitive values. In Proceedings of the 2020 International Conference on Electrical Engineering and Informatics (ICELTICs’20). IEEE, Los Alamitos, CA, 1–4.Google Scholar
[208] Skinner C. J. and Holmes David J.. 1998. Estimating the re-identification risk per record in microdata. Journal of Official Statistics 14, 4 (1998), 361.Google Scholar
[209] Skinner Chris, Marsh Catherine, Openshaw Stan, and Wymer Colin. 1994. Disclosure control for census microdata. Journal of Official Statistics–Stockholm 10 (1994), 31.Google Scholar
[210] Skinner Chris J. and Elliot M. J.. 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 4 (2002), 855–867.Google ScholarCross Ref
[211] Soria-Comas Jordi, Domingo-Ferrer Josep, Sánchez David, and Martínez Sergio. 2014. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB Journal 23, 5 (2014), 771–794.Google ScholarDigital Library
[212] Soria-Comas Jordi, Domingo-Ferrer Josep, Sánchez David, and Martínez Sergio. 2014. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal 23, 5 (2014), 771–794.Google ScholarDigital Library
[213] Spruill Nancy. 1983. The confidentiality and analytic usefulness of masked business microdata. Proceedings of the Section on Survey Research Methods 1983 (1983), 602–607.Google Scholar
[214] Netherlands Statistics. 2014. \(\mu\)-ARGUS. Retrieved November 1, 2021 from https://github.com/sdcTools/muargus.Google Scholar
[215] Sullivan Gary R.. 1989. The Use of Added Error to Avoid Disclosure in Microdata Releases. Ph. D. Dissertation. Iowa State University.Google ScholarDigital Library
[216] Susan V. Shyamala and Christopher T.. 2016. Anatomisation with slicing: A new privacy preservation approach for multiple sensitive attributes. SpringerPlus 5, 1 (2016), 1–21.Google ScholarCross Ref
[217] Sweeney Latanya. 2000. Simple demographics often identify people uniquely. Health (San Francisco) 671, 2000 (2000), 1–34.Google Scholar
[218] Sweeney Latanya. 2002. k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002), 557–570.Google ScholarDigital Library
[219] Akimichi Takemura. 1999. Local Recoding by Maximum Weight Matching for Disclosure Control of Microdata Sets. CIRJE F-Series CIRJE-F-40, CIRJE, Faculty of Economics, University of Tokyo.Google Scholar
[220] Akimichi Takemura. 1999. Some superpopulation models for estimating the number of population uniques. In Proceedings of the Conference on Statistical Data Protection. 45–58.Google Scholar
[221] Tao Yufei, Chen Hekang, Xiao Xiaokui, Zhou Shuigeng, and Zhang Donghui. 2009. Angel: Enhancing the utility of generalization for privacy preserving publication. IEEE Transactions on Knowledge and Data Engineering 21, 7 (2009), 1073–1087.Google ScholarDigital Library
[222] Templ Matthias, Kowarik Alexander, and Meindl Bernhard. 2015. Statistical disclosure control for micro-data using the R package sdcMicro. Journal of Statistical Software 67, 4 (2015), 1–36.Google ScholarCross Ref
[223] Templ Matthias and Meindl Bernhard. 2008. Robust statistics meets SDC: New disclosure risk measures for continuous microdata masking. In Proceedings of the International Conference on Privacy in Statistical Databases. 177–189.Google ScholarDigital Library
[224] Tendick Patrick. 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27, 3 (1991), 341–353.Google ScholarCross Ref
[225] Torra Vicenç. 2004. Microaggregation for categorical variables: A median based approach. In Proceedings of the International Workshop on Privacy in Statistical Databases. 162–174.Google ScholarCross Ref
[226] Torra Vicenç. 2017. Privacy models and disclosure risk measures. In Data Privacy: Foundations, New Developments and the Big Data Challenge. Springer, 111–189.Google Scholar
[227] Torra Vicenç. 2022. Guide to Data Privacy: Models, Technologies, Solutions. Springer Nature.Google Scholar
[228] Torra Vicenç, Abowd John M., and Domingo-Ferrer Josep. 2006. Using Mahalanobis distance-based record linkage for disclosure risk assessment. In Proceedings of the International Conference on Privacy in Statistical Databases. 233–242.Google ScholarDigital Library
[229] Truta Traian Marius, Fotouhi Farshad, and Barth-Jones Daniel. 2006. Global disclosure risk for microdata with continuous attributes. In Privacy and Technologies of Identity. Springer, 349–363.Google ScholarCross Ref
[230] Truta Traian Marius and Vinay Bindu. 2006. Privacy protection: P-sensitive k-anonymity property. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE, Los Alamitos, CA, 94.Google ScholarDigital Library
[231] Lab UT Dallas Data Security and Privacy. 2012. UTD Anonymisation ToolBox. http://cs.utdallas.edu/dspl/cgi-bin/toolbox/. Accessed Nov 2021.Google Scholar
[232] Vaidya Jaideep and Clifton Chris. 2004. Privacy-preserving outlier detection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04). IEEE, Los Alamitos, CA, 233–240.Google ScholarDigital Library
[233] Vanichayavisalsakul Peerapong and Piromsopa Krerk. 2018. An evaluation of anonymized models and ensemble classifiers. In Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things. 18–22.Google ScholarDigital Library
[234] Wagner Isabel and Eckhoff David. 2018. Technical privacy metrics: A systematic survey. ACM Computing Surveys 51, 3 (2018), 1–38.Google ScholarDigital Library
[235] Wang Ke and Fung Benjamin C. M.. 2006. Anonymizing sequential releases. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 414–423.Google ScholarDigital Library
[236] Wang Ke, Xu Yabo, Wong Raymond Chi-Wing, and Fu Ada Wai-Chee. 2010. Anonymizing temporal data. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, Los Alamitos, CA, 1109–1114.Google ScholarDigital Library
[237] Wang Ke, Yu Philip S., and Chakraborty Sourav. 2004. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE, Los Alamitos, CA, 249–256.Google ScholarCross Ref
[238] Weng Cheng G. and Poon Josiah. 2008. A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian Data Mining Conference, Vol. 87 27–32.Google ScholarDigital Library
[239] Willenborg Leon and Waal Ton De. 1996. Statistical Disclosure Control in Practice. Vol. 111. Springer Science & Business Media.Google ScholarCross Ref
[240] Willenborg Leon Cornelis Roelof Johannes and Waal Ton De. 2000. Elements of Statistical Disclosure Control. Lecture Notes in Statistics, Vol. 144. Springer.Google Scholar
[241] Wilson Rick L. and Rosen Peter A.. 2003. Protecting data through perturbation techniques: The impact on knowledge discovery in databases. Journal of Database Management 14, 2 (2003), 14–26.Google ScholarCross Ref
[242] Wong Raymond Chi-Wing, Li Jiuyong, Fu Ada Wai-Chee, and Wang Ke. 2006. (\(\alpha\), k)-Anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 754–759.Google ScholarDigital Library
[243] Xiao Xiaokui and Tao Yufei. 2006. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd International Conference on Very Large Data Bases. 139–150.Google ScholarDigital Library
[244] Xiao Xiaokui and Tao Yufei. 2007. M-invariance: towards privacy preserving re-publication of dynamic datasets. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 689–700.Google ScholarDigital Library
[245] Xie Liyang, Lin Kaixiang, Wang Shu, Wang Fei, and Zhou Jiayu. 2018. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018).Google Scholar
[246] Xu Jian, Wang Wei, Pei Jian, Wang Xiaoyuan, Shi Baile, and Fu Ada Wai-Chee. 2006. Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–790.Google ScholarDigital Library
[247] Xu Lei, Skoularidou Maria, Cuesta-Infante Alfredo, and Veeramachaneni Kalyan. 2019. Modeling tabular data using conditional GAN. In Advances in Neural Information Processing Systems 32.Google Scholar
[248] Yale Andrew, Dash Saloni, Dutta Ritik, Guyon Isabelle, Pavao Adrien, and Bennett Kristin P.. 2020. Generation and evaluation of privacy preserving synthetic health data. Neurocomputing 416 (2020), 244–255.Google ScholarCross Ref
[249] YData. 2019. YData. Retrieved December 1, 2022 from https://ydata.ai/.Google Scholar
[250] YData. 2021. YData Synthetic. Retrieved December 1, 2022 from https://github.com/ydataai/ydata-synthetic.Google Scholar
[251] Ye Yifan, Wang Lixxia, Han Jianmin, Qiu Sheng, and Luo Fangwei. 2017. An anonymization method combining anatomy and permutation for protecting privacy in microdata with multiple sensitive attributes. In Proceedings of the 2017 International Conference on Machine Learning and Cybernetics (ICMLC’17), Vol. 2. IEEE, Los Alamitos, CA, 404–411.Google ScholarCross Ref
[252] Yeom Samuel, Giacomelli Irene, Fredrikson Matt, and Jha Somesh. 2018. Privacy risk in machine learning: Analyzing the connection to overfitting. In Proceedings of the 2018 IEEE 31st Computer Security Foundations Symposium (CSF’18). IEEE, Los Alamitos, CA, 268–282.Google ScholarCross Ref
[253] Zhang Qing, Koudas Nick, Srivastava Divesh, and Yu Ting. 2007. Aggregate query answering on anonymized tables. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering. 116–125.Google ScholarCross Ref
[254] Zhao Benjamin Zi Hao, Agrawal Aviral, Coburn Catisha, Asghar Hassan Jameel, Bhaskar Raghav, Kaafar Mohamed Ali, Webb Darren, and Dickinson Peter. 2021. On the (in) feasibility of attribute inference attacks on machine learning models. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P’21). IEEE, Los Alamitos, CA, 232–251.Google ScholarCross Ref
[255] Zhiwei Kong, Weimin Wei, Shuo Yang, Hua Feng, and Yan Zhao. 2017. Research progress of anonymous data release. In Proceedings of the 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS’17). IEEE, Los Alamitos, CA, 226–230.Google ScholarCross Ref
[256] Zigomitros Athanasios, Casino Fran, Solanas Agusti, and Patsakis Constantinos. 2020. A survey on privacy properties for data publishing of relational data. IEEE Access 8 (2020), 51071–51099.Google ScholarCross Ref
[257] Zorarpacı Ezgi and Özel Selma Ayşe. 2020. Privacy preserving classification over differentially private data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. Early access, December 13, 2020.Google Scholar

Index Terms

Survey on Privacy-Preserving Techniques for Microdata Publication
1. Computing methodologies
  1. Machine learning
    1. Learning settings
2. Security and privacy
  1. Database and storage security
    1. Data anonymization and sanitization
  2. Human and societal aspects of security and privacy

Recommendations

Personalised anonymity for microdata release

Individual privacy protection in the released data sets has become an important issue in recent years. The release of microdata provides a significant information resource for researchers, whereas the release of person‐specific data poses a threat to ...
Read More
Measuring privacy in high dimensional microdata collections
ARES '17: Proceedings of the 12th International Conference on Availability, Reliability and Security

Microdata is collected by companies in order to enhance their quality of service as well as the accuracy of their recommendation systems. These data often become publicly available after they have been sanitized. Recent reidentification attacks on ...
Read More
Privacy and confidentiality management for the microaggregation disclosure control method: disclosure risk and information loss measures
WPES '03: Proceedings of the 2003 ACM workshop on Privacy in the electronic society

In this paper, we first introduce minimal, maximal and weighted disclosure risk measures for microaggregation disclosure control method. Our disclosure risk measures are more applicable to real-life situations, compute the overall disclosure risk, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 55, Issue 14s
December 2023
1355 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3606253
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 July 2023
- Online AM: 28 March 2023
- Accepted: 10 March 2023
- Revised: 25 December 2022
- Received: 19 January 2022
Published in csur Volume 55, Issue 14s

Check for updates
Author Tags
Data privacy
microdata
statistical disclosure control
privacy-preserving techniques
predictive performance
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 888
  Total Downloads
- Downloads (Last 12 months)738
- Downloads (Last 6 weeks)86
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Survey on Privacy-Preserving Techniques for Microdata Publication

ACM Computing Surveys

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Personalised anonymity for microdata release

Measuring privacy in high dimensional microdata collections

Privacy and confidentiality management for the microaggregation disclosure control method: disclosure risk and information loss measures