Skip to main content

The Effects of Location Access Behavior on Re-identification Risk in a Distributed Environment

  • Conference paper
Privacy Enhancing Technologies (PET 2006)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4258))

Included in the following conference series:

  • 6088 Accesses

Abstract

In this paper, we investigate how location access patterns influence the re-identification of seemingly anonymous data. In the real world, individuals visit different locations that gather similar information. For instance, multiple hospitals collect health information on the same patient. To protect anonymity for research purposes, hospitals share sensitive data, such as DNA sequences, stripped of explicit identifiers. Separately, for administrative functions, identified data, stripped of DNA, is made available. On a hospital by hospital basis, each pair of DNA and identified databases appears unlinkable, however, links can be established when multiple locations’ database are studied. This problem, known as trail re-identification, is a generalized phenomenon and occurs because an individual’s location access pattern can be matched across the shared databases.

Data holders can not exchange data to find and suppress trails that would be re-identified. Thus, it is important to assess the re-identification risk in a system in order to develop techniques to mitigate it. In this research, we evaluate several real world datasets and observe trail re-identification is related to the number of people to places. To study this phenomenon in more detail, we develop a generative model for location access patterns that simulates observed behavior. We evaluate trail re-identification risk in a range of simulated patterns and our findings suggest that the skew of the distribution of people to places is one of the main factors that drives trail re-identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Altman, R.: Bioinformatics in support of molecular medicine. In: Proceedings of the American Medical Informatics Association Annual Symposium, Miami Beach, FL, pp. 53–61 (1998)

    Google Scholar 

  2. Sax, U., Schmidt, S.: Integration of genomic data in electronic health records: opportunities and dilemmas. Methods of Information in Medicine 44, 546–550 (2005)

    Google Scholar 

  3. Altman, R., Klein, T.: Challenges for biomedical informatics and pharmacogenomics. Annual Review of Pharmacology and Toxicology 42, 113–133 (2002)

    Article  Google Scholar 

  4. Department of Health and Human Services: 45 cfr (code of federal regulations), parts 160 - 164. standards for privacy of individually identifiable health information, final rule. Federal Register 67, 53182–53273 (2002)

    Google Scholar 

  5. Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37, 179–192 (2004)

    Article  Google Scholar 

  6. Karat, C., Brodie, C., Karat, J.: Usable privacy and security for personal information management. Communications of the ACM 49, 51–55 (2006)

    Article  Google Scholar 

  7. Malin, B.: Betrayed by my shadow: learning data identity via trail matching. Journal of Privacy Technology, 20050609001 (2005)

    Google Scholar 

  8. de Moor, G., Claerhout, B., de Meyer, F.: Privacy enhancing technologies: the key to secure communication and management of clinical and genomic data. Methods of Information in Medicine 42, 148–153 (2003)

    Google Scholar 

  9. Gulcher, J., Kristjansson, K., Gudbjartsson, H., Stefansson, K.: Protection of privacy by third-party encryption in genetic research. European Journal of Human Genetics 8, 739–742 (2000)

    Article  Google Scholar 

  10. Lin, Z., Owen, A., Altman, R.: Genomic research and human subject privacy. Science 305 (2004)

    Google Scholar 

  11. Malin, B., Sweeney, L.: Composition and disclosure of unlinkable distributed databases. In: Proceedings of the 22nd IEEE International Conference on Data Engineering, Atlanta, GA (2006)

    Google Scholar 

  12. Airoldi, E.M.: A statistical theory of record linkage with applications to privacy. Technical Report CMU-ISRI-05-112, School of Computer Science, Carnegie Mellon University (2004) Revision (December 2005)

    Google Scholar 

  13. Bender, S., Brand, R., Bacher, J.: Re-identifying register data by survey data: an empirical study. Statistical Journal of the United Nations ECE 18, 373–381 (2001)

    Google Scholar 

  14. Griffith, V., Jakobsson, M.: Messin with texas: deriving mother’s maiden name using public records. In: Proceedings of the Applied Cryptography and Network Security Conference, New York, NY (2005)

    Google Scholar 

  15. Malin, B., Sweeney, L.: Determining the identifiability of dna database entries. In: Proceedings of the American Medical Informatics Association Annual Symposium, Los Angeles, CA, pp. 537–541 (2000)

    Google Scholar 

  16. Sweeney, L.: Uniqueness of simple demographics in the us population. Technical Report LIDAP-WP04, Data Privacy Laboratory, Carnegie Mellon University, Pittsburgh, PA (2000)

    Google Scholar 

  17. Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Springer, New York (1996)

    MATH  Google Scholar 

  18. Danezis, G., Serjantov, A.: Statistical disclosure or intersection attacks on anonymity systems. In: Varadharajan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, Springer, Heidelberg (2001)

    Google Scholar 

  19. Kesdogan, D., Agrawal, D., Penz, S.: Limits of anonymity in open environments. In: Varadharajan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, Springer, Heidelberg (2001)

    Google Scholar 

  20. Winkler, W.E.: Matching and record linkage. In: Cox, et al. (eds.) Business Survey Methods, pp. 355–384. J. Wiley, New York (1995)

    Google Scholar 

  21. Winkler, W.: Data cleaning methods. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC (2003)

    Google Scholar 

  22. State of Illinois Health Care Cost Containment Council: Data release overview. State of Illinois Health Care Cost Containment Council, Springfield, IL (March 1998)

    Google Scholar 

  23. Kraut, R., Mukhopadhyay, T., Szczypula, J., Kiesler, S., Scherlis, B.: Information and communication: alternative uses of the internet in households. Information Systems Research 10, 287–303 (2000)

    Article  Google Scholar 

  24. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)

    MATH  MathSciNet  Google Scholar 

  25. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 623–656 (1948)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Malin, B., Airoldi, E. (2006). The Effects of Location Access Behavior on Re-identification Risk in a Distributed Environment. In: Danezis, G., Golle, P. (eds) Privacy Enhancing Technologies. PET 2006. Lecture Notes in Computer Science, vol 4258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957454_24

Download citation

  • DOI: https://doi.org/10.1007/11957454_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68790-0

  • Online ISBN: 978-3-540-68793-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics