Skip to main content
Log in

Comparison of approaches of geographic partitioning for data anonymization

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

Given the large volumes of detailed data now being collected, there is a high demand for the release of this data for research purposes. In particular, organizations are faced with the conflicting goals of (a) releasing this data and (b) protecting the privacy of the individuals to whom the data pertains. Especially, there is a conflict between the need to release precise geographic information (which is essential to many healthcare research fields such as spatial epidemiology) and the requirement to censor or generalize the same information for the sake of privacy protection. Ultimately, the challenge is to anonymize data in order to comply with government privacy policies while reducing the loss in geographic information as much as possible. In this paper, we present novel component approaches used to configure the Voronoi-Based Aggregation System (VBAS) as well as an in-depth comparison of their effectiveness. VBAS is a system which protects privacy by enforcing k-anonymity via the aggregation of regions of fine granularity into larger regions. We additionally discuss heuristics rooted in linear programming which we have also integrated in our system. Based on extensive comparisons, we highlight the strengths and weaknesses of the different approaches we tested. This enables us to make recommendations on how to satisfy user requirements via the selection of specific combinations of such approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Armstrong M, Rushton G, Zimmerman D (1999) Geographically masking health data to preserve confidentiality. Stat Med 18:497–525

    Article  Google Scholar 

  • Arzberger P, Schroeder P, Beaulieu A et al (2004) Promoting access to public research data for scientific, economic, and social development. Data Sci J 3:135–152

    Article  Google Scholar 

  • Aurenhammer F, Klein R (2000) Voronoi diagrams. In: Sack J-R, Urrutia J (eds) Handbook of computational geometry. Elsevier Science Publishers B.V., North-Holland, pp 201–290

    Chapter  Google Scholar 

  • Bayardo RJ, Agrawa R (2005) Data privacy through optimal k-anonymization. In: Proceedings of 21st ICDE ’05, pp 217–228

  • Benitez K, Malin B (2010) Evaluating re-identication risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc 17:169–177

    Article  Google Scholar 

  • Boulos M, Cai Q, Padget JA et al (2006) Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses. J Biomed Inform 39:160–170

    Article  Google Scholar 

  • Bridwell SA (2007) The dimensions of locational privacy. In: Miller HJ (ed) Societies and cities in the age of instant access. Springer, Netherlands, pp 209–225

    Chapter  Google Scholar 

  • CGAL (1995) The computational geometry algorithms library. http://www.cgal.org/. Accessed May 2014

  • Charnes A, Cooper WW (1954) The stepping stone method of explaining linear programming calculations in transportation problems. Manag Sci 1:49–69

    Article  Google Scholar 

  • Clifton KJ, Gehrke SR (2013) Application of geographic perturbation methods to residential locations in the oregon household activity survey. Transp Res Rec 2354:40–50

    Article  Google Scholar 

  • Cooper L (1972) The transportation-location problem. Op Res 20:94–108

    Article  Google Scholar 

  • Croft W, Shi W, Sack J-R et al (2016) Location-based anonymization: comparison and evaluation of the Voronoi-based aggregation system. Int J Geogr Inf Sci 30:2253–2275

    Article  Google Scholar 

  • Emam KE, Brown A, AbdelMalik P (2009) Evaluating predictors of geographic area population size cut-offs to manage re-identification risk. J Am Med Inform Assoc 16:256–266

    Article  Google Scholar 

  • Emam KE, Dankar FK, Neisa A et al (2013) Evaluating the risk of patient re-identification from adverse drug event reports. BMC Med Inform Decis 13

  • Ford LR, Fulkerson DR (1956) Solving the transportation problem. Manag Sci 3:24–32

    Article  Google Scholar 

  • Gionis A, Tassa T (2008) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21:206–219

    Article  Google Scholar 

  • Goodchild M, Massam B (1969) Some least-cost models of spatial administrative systems in southern ontario. Geogr Ann 51:86–94

    Article  Google Scholar 

  • Greenberg B, Voshell L (1990) Relating risk of disclosure for microdata and geographic area size. In: Proceedings of SRMS, American statistical association, pp 450–455

  • Hawala S (2001) Enhancing the “100,000 rule” on the variation of the per cent of uniques in a microdata sample and the geographic area size identified on the file. In: Proceedings of the annual meeting of the American statistical association, pp 1–6

  • Jung H-W, Emam KE (2014) A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes. Int J Health Geogr 13

  • Lowrance W (2006) Access to collections of data and materials for health research: a report to the medical research council and the wellcome trust. Medical research council and the wellcome trust, pp 1–39

  • Lyseen AK, Nohr C, Sorensen EM et al (2014) A review and framework for categorizing current research and development in health related geographical information systems (GIS) studies. Yearb Med Inform 9:110–124

    Article  Google Scholar 

  • Mohammed N, Fung BCM, Hung PCK et al (2009) Anonymizing healthcare data: a case study on the blood transfusion service. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1285–1294

  • Morreau J-P (2009) TRANSPOR.CPP. http://jean-pierre.moreau.pagesperso-orange.fr/cplus.html. Accessed Apr 2016

  • Olson KL, Grannis SJ, Mandl KD (2002) Privacy protection versus cluster detection in spatial epidemiology. Am J Public Health 96:2002–2008

    Article  Google Scholar 

  • Rezaeian M, Dunn G, Leger SS et al (2007) Geographical epidemiology, spatial analysis and geographical information systems: a multidisciplinary glossary. J Epidemiol Commun H 61:98–102

    Article  Google Scholar 

  • Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13:1010–1027

    Article  Google Scholar 

  • Statistics Canada (2014) Individuals file, 2011 National Household Survey (Public use microdata files). http://www5.statcan.gc.ca/olc-cel/olc.action?objId=99M0001X2011001&objType=46&lang=en&limit=0. Accessed Mar 2015

  • Statistics Canada (2015) Dissemination area (DA). http://www12.statcan.gc.ca/census-recensement/2011/ref/dict/geo021-eng.cfm. Accessed Mar 2015

  • Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz 10:557–570

    Article  Google Scholar 

  • Thomas Y, Richardson D, Cheung I (2008) Integrating geography and social epidemiology in drug abuse research. In: Thomas Y, Richardson D, Cheung I (eds) Geography and drug addiction. Springer, Netherlands, pp 17–26

    Chapter  Google Scholar 

  • Vora A, Burke DS, Cummings DAT (2008) The impact of a physical geographic barrier on the dynamic of measles. Epidemiol Infect 136:713–720

    Article  Google Scholar 

  • Young C, Martin D, Skinner C (2009) Geographically intelligent disclosure control for flexible aggregation of census data. Epidemiol Infect 23:457–482

    Google Scholar 

  • Zhou A, Qu B, Li H et al (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evolut Comput 1:32–49

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge financial support from the Natural Sciences and Engineering Research Council of Canada (NSERC) under Grants Nos. RGPIN-2015-05390 and RGPIN-2016-06253.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Lee Croft.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (PDF 1133 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Croft, W.L., Shi, W., Sack, JR. et al. Comparison of approaches of geographic partitioning for data anonymization. J Geogr Syst 19, 221–248 (2017). https://doi.org/10.1007/s10109-017-0251-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-017-0251-4

Keywords

Mathematics Subject Classification

Navigation