Abstract
In this paper, we investigate how location access patterns influence the re-identification of seemingly anonymous data. In the real world, individuals visit different locations that gather similar information. For instance, multiple hospitals collect health information on the same patient. To protect anonymity for research purposes, hospitals share sensitive data, such as DNA sequences, stripped of explicit identifiers. Separately, for administrative functions, identified data, stripped of DNA, is made available. On a hospital by hospital basis, each pair of DNA and identified databases appears unlinkable, however, links can be established when multiple locations’ database are studied. This problem, known as trail re-identification, is a generalized phenomenon and occurs because an individual’s location access pattern can be matched across the shared databases.
Data holders can not exchange data to find and suppress trails that would be re-identified. Thus, it is important to assess the re-identification risk in a system in order to develop techniques to mitigate it. In this research, we evaluate several real world datasets and observe trail re-identification is related to the number of people to places. To study this phenomenon in more detail, we develop a generative model for location access patterns that simulates observed behavior. We evaluate trail re-identification risk in a range of simulated patterns and our findings suggest that the skew of the distribution of people to places is one of the main factors that drives trail re-identification.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altman, R.: Bioinformatics in support of molecular medicine. In: Proceedings of the American Medical Informatics Association Annual Symposium, Miami Beach, FL, pp. 53–61 (1998)
Sax, U., Schmidt, S.: Integration of genomic data in electronic health records: opportunities and dilemmas. Methods of Information in Medicine 44, 546–550 (2005)
Altman, R., Klein, T.: Challenges for biomedical informatics and pharmacogenomics. Annual Review of Pharmacology and Toxicology 42, 113–133 (2002)
Department of Health and Human Services: 45 cfr (code of federal regulations), parts 160 - 164. standards for privacy of individually identifiable health information, final rule. Federal Register 67, 53182–53273 (2002)
Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37, 179–192 (2004)
Karat, C., Brodie, C., Karat, J.: Usable privacy and security for personal information management. Communications of the ACM 49, 51–55 (2006)
Malin, B.: Betrayed by my shadow: learning data identity via trail matching. Journal of Privacy Technology, 20050609001 (2005)
de Moor, G., Claerhout, B., de Meyer, F.: Privacy enhancing technologies: the key to secure communication and management of clinical and genomic data. Methods of Information in Medicine 42, 148–153 (2003)
Gulcher, J., Kristjansson, K., Gudbjartsson, H., Stefansson, K.: Protection of privacy by third-party encryption in genetic research. European Journal of Human Genetics 8, 739–742 (2000)
Lin, Z., Owen, A., Altman, R.: Genomic research and human subject privacy. Science 305 (2004)
Malin, B., Sweeney, L.: Composition and disclosure of unlinkable distributed databases. In: Proceedings of the 22nd IEEE International Conference on Data Engineering, Atlanta, GA (2006)
Airoldi, E.M.: A statistical theory of record linkage with applications to privacy. Technical Report CMU-ISRI-05-112, School of Computer Science, Carnegie Mellon University (2004) Revision (December 2005)
Bender, S., Brand, R., Bacher, J.: Re-identifying register data by survey data: an empirical study. Statistical Journal of the United Nations ECE 18, 373–381 (2001)
Griffith, V., Jakobsson, M.: Messin with texas: deriving mother’s maiden name using public records. In: Proceedings of the Applied Cryptography and Network Security Conference, New York, NY (2005)
Malin, B., Sweeney, L.: Determining the identifiability of dna database entries. In: Proceedings of the American Medical Informatics Association Annual Symposium, Los Angeles, CA, pp. 537–541 (2000)
Sweeney, L.: Uniqueness of simple demographics in the us population. Technical Report LIDAP-WP04, Data Privacy Laboratory, Carnegie Mellon University, Pittsburgh, PA (2000)
Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Springer, New York (1996)
Danezis, G., Serjantov, A.: Statistical disclosure or intersection attacks on anonymity systems. In: Varadharajan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, Springer, Heidelberg (2001)
Kesdogan, D., Agrawal, D., Penz, S.: Limits of anonymity in open environments. In: Varadharajan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, Springer, Heidelberg (2001)
Winkler, W.E.: Matching and record linkage. In: Cox, et al. (eds.) Business Survey Methods, pp. 355–384. J. Wiley, New York (1995)
Winkler, W.: Data cleaning methods. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC (2003)
State of Illinois Health Care Cost Containment Council: Data release overview. State of Illinois Health Care Cost Containment Council, Springfield, IL (March 1998)
Kraut, R., Mukhopadhyay, T., Szczypula, J., Kiesler, S., Scherlis, B.: Information and communication: alternative uses of the internet in households. Information Systems Research 10, 287–303 (2000)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 623–656 (1948)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malin, B., Airoldi, E. (2006). The Effects of Location Access Behavior on Re-identification Risk in a Distributed Environment. In: Danezis, G., Golle, P. (eds) Privacy Enhancing Technologies. PET 2006. Lecture Notes in Computer Science, vol 4258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957454_24
Download citation
DOI: https://doi.org/10.1007/11957454_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68790-0
Online ISBN: 978-3-540-68793-1
eBook Packages: Computer ScienceComputer Science (R0)