Abstract
Given the large volumes of detailed data now being collected, there is a high demand for the release of this data for research purposes. In particular, organizations are faced with the conflicting goals of (a) releasing this data and (b) protecting the privacy of the individuals to whom the data pertains. Especially, there is a conflict between the need to release precise geographic information (which is essential to many healthcare research fields such as spatial epidemiology) and the requirement to censor or generalize the same information for the sake of privacy protection. Ultimately, the challenge is to anonymize data in order to comply with government privacy policies while reducing the loss in geographic information as much as possible. In this paper, we present novel component approaches used to configure the Voronoi-Based Aggregation System (VBAS) as well as an in-depth comparison of their effectiveness. VBAS is a system which protects privacy by enforcing k-anonymity via the aggregation of regions of fine granularity into larger regions. We additionally discuss heuristics rooted in linear programming which we have also integrated in our system. Based on extensive comparisons, we highlight the strengths and weaknesses of the different approaches we tested. This enables us to make recommendations on how to satisfy user requirements via the selection of specific combinations of such approaches.
Similar content being viewed by others
References
Armstrong M, Rushton G, Zimmerman D (1999) Geographically masking health data to preserve confidentiality. Stat Med 18:497–525
Arzberger P, Schroeder P, Beaulieu A et al (2004) Promoting access to public research data for scientific, economic, and social development. Data Sci J 3:135–152
Aurenhammer F, Klein R (2000) Voronoi diagrams. In: Sack J-R, Urrutia J (eds) Handbook of computational geometry. Elsevier Science Publishers B.V., North-Holland, pp 201–290
Bayardo RJ, Agrawa R (2005) Data privacy through optimal k-anonymization. In: Proceedings of 21st ICDE ’05, pp 217–228
Benitez K, Malin B (2010) Evaluating re-identication risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc 17:169–177
Boulos M, Cai Q, Padget JA et al (2006) Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses. J Biomed Inform 39:160–170
Bridwell SA (2007) The dimensions of locational privacy. In: Miller HJ (ed) Societies and cities in the age of instant access. Springer, Netherlands, pp 209–225
CGAL (1995) The computational geometry algorithms library. http://www.cgal.org/. Accessed May 2014
Charnes A, Cooper WW (1954) The stepping stone method of explaining linear programming calculations in transportation problems. Manag Sci 1:49–69
Clifton KJ, Gehrke SR (2013) Application of geographic perturbation methods to residential locations in the oregon household activity survey. Transp Res Rec 2354:40–50
Cooper L (1972) The transportation-location problem. Op Res 20:94–108
Croft W, Shi W, Sack J-R et al (2016) Location-based anonymization: comparison and evaluation of the Voronoi-based aggregation system. Int J Geogr Inf Sci 30:2253–2275
Emam KE, Brown A, AbdelMalik P (2009) Evaluating predictors of geographic area population size cut-offs to manage re-identification risk. J Am Med Inform Assoc 16:256–266
Emam KE, Dankar FK, Neisa A et al (2013) Evaluating the risk of patient re-identification from adverse drug event reports. BMC Med Inform Decis 13
Ford LR, Fulkerson DR (1956) Solving the transportation problem. Manag Sci 3:24–32
Gionis A, Tassa T (2008) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21:206–219
Goodchild M, Massam B (1969) Some least-cost models of spatial administrative systems in southern ontario. Geogr Ann 51:86–94
Greenberg B, Voshell L (1990) Relating risk of disclosure for microdata and geographic area size. In: Proceedings of SRMS, American statistical association, pp 450–455
Hawala S (2001) Enhancing the “100,000 rule” on the variation of the per cent of uniques in a microdata sample and the geographic area size identified on the file. In: Proceedings of the annual meeting of the American statistical association, pp 1–6
Jung H-W, Emam KE (2014) A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes. Int J Health Geogr 13
Lowrance W (2006) Access to collections of data and materials for health research: a report to the medical research council and the wellcome trust. Medical research council and the wellcome trust, pp 1–39
Lyseen AK, Nohr C, Sorensen EM et al (2014) A review and framework for categorizing current research and development in health related geographical information systems (GIS) studies. Yearb Med Inform 9:110–124
Mohammed N, Fung BCM, Hung PCK et al (2009) Anonymizing healthcare data: a case study on the blood transfusion service. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1285–1294
Morreau J-P (2009) TRANSPOR.CPP. http://jean-pierre.moreau.pagesperso-orange.fr/cplus.html. Accessed Apr 2016
Olson KL, Grannis SJ, Mandl KD (2002) Privacy protection versus cluster detection in spatial epidemiology. Am J Public Health 96:2002–2008
Rezaeian M, Dunn G, Leger SS et al (2007) Geographical epidemiology, spatial analysis and geographical information systems: a multidisciplinary glossary. J Epidemiol Commun H 61:98–102
Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13:1010–1027
Statistics Canada (2014) Individuals file, 2011 National Household Survey (Public use microdata files). http://www5.statcan.gc.ca/olc-cel/olc.action?objId=99M0001X2011001&objType=46&lang=en&limit=0. Accessed Mar 2015
Statistics Canada (2015) Dissemination area (DA). http://www12.statcan.gc.ca/census-recensement/2011/ref/dict/geo021-eng.cfm. Accessed Mar 2015
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz 10:557–570
Thomas Y, Richardson D, Cheung I (2008) Integrating geography and social epidemiology in drug abuse research. In: Thomas Y, Richardson D, Cheung I (eds) Geography and drug addiction. Springer, Netherlands, pp 17–26
Vora A, Burke DS, Cummings DAT (2008) The impact of a physical geographic barrier on the dynamic of measles. Epidemiol Infect 136:713–720
Young C, Martin D, Skinner C (2009) Geographically intelligent disclosure control for flexible aggregation of census data. Epidemiol Infect 23:457–482
Zhou A, Qu B, Li H et al (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evolut Comput 1:32–49
Acknowledgements
The authors gratefully acknowledge financial support from the Natural Sciences and Engineering Research Council of Canada (NSERC) under Grants Nos. RGPIN-2015-05390 and RGPIN-2016-06253.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Croft, W.L., Shi, W., Sack, JR. et al. Comparison of approaches of geographic partitioning for data anonymization. J Geogr Syst 19, 221–248 (2017). https://doi.org/10.1007/s10109-017-0251-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-017-0251-4