A Survey of Anonymization Algorithms for Electronic Health Records

Gkoulalas-Divanis, Aris; Loukides, Grigorios

doi:10.1007/978-3-319-23633-9_2

Aris Gkoulalas-Divanis³ &
Grigorios Loukides⁴

2749 Accesses
4 Citations

Abstract

Electronic Health Records (EHRs) contain various types of structured data about patients, such as patients’ diagnoses, laboratory results, active medication, and allergies, which are increasingly shared to support a wide spectrum of medical analyses. To protect patient privacy, EHR data must be anonymized before their sharing. Anonymization ensures that the re-identification of patients and/or the inference of patients’ sensitive information is prevented, and it is possible using several algorithms that have been proposed recently. In this chapter, we survey popular data anonymization algorithms for EHR data and explain their objectives, as well as the main aspects of their operation. After that, we present several promising directions for future research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, G., Kenthapadi, F., Motwani, K., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Privacy Technol. 3, 1–8 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Google Scholar
Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st ICDE, pp. 217–228 (2005)
Google Scholar
Byun, J., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: DASFAA, pp. 188–200 (2007)
Google Scholar
Cao, J., Karras, P., Raïssi, C., Tan, K.: r h o-uncertainty: inference-proof transaction anonymization. Proc. VLDB 3(1), 1033–1044 (2010)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Du, Y., Xia, T., Tao, Y., Zhang, D., Zhu, F.: On multidimensional k-anonymity with local recoding generalization. In: ICDE ’07, pp. 1422–1424 (2007)
Google Scholar
El Emam, K., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.P., Walker, M., Chowdhury, S., Vaillancourt, R., Roffey, T., Bottomley, J.: A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assoc. 16(5), 670–682 (2009). doi: 10.1197/jamia.M3144
Article Google Scholar
El Emam, K., Jonker, E., Arbuckle, L., Malin, B.: A systematic review of re-identification attacks on health data. PLoS ONE 6(12), e28,071 (2011). http://dx.doi.org/10.1371%2Fjournal.pone.0028071
Emam, K.E., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)
Article Google Scholar
Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE, pp. 205–216 (2005)
Google Scholar
Fung, B.C.M., Wang, K., Wang, L., Hung, P.C.K.: Privacy-preserving data publishing for cluster analysis. Data Knowl. Eng. 68(6), 552–575 (2009)
Article Google Scholar
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey on recent developments. ACM Comput. Surv. 42, 1–53 (2010)
Article Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very large Data bases, VLDB ’07, pp. 758–769 (2007)
Google Scholar
Gionis, A., Mazza, A., Tassa, T.: k-anonymization revisited. In: ICDE, pp. 744–753 (2008)
Google Scholar
Gkoulalas-Divanis, A., Loukides, G.: PCTA: privacy-constrained clustering-based transaction data anonymization. In: EDBT PAIS, p. 5 (2011)
Google Scholar
Gkoulalas-Divanis, A., Loukides, G.: Revisiting sequential pattern hiding to enhance utility. In: KDD, pp. 1316–1324 (2011)
Google Scholar
Gkoulalas-Divanis, A., Verykios, V.S.: Hiding sensitive knowledge without side effects. Knowl. Inf. Syst. 20(3), 263–299 (2009)
Article Google Scholar
Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50(1), 4–19 (2014)
Article Google Scholar
Gwadera, R., Gkoulalas-Divanis, A., Loukides, G.: Permutation-based sequential pattern hiding. In: IEEE International Conference on Data Mining (ICDM), pp. 241–250 (2013)
Google Scholar
He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB 2(1), 934–945 (2009)
Article Google Scholar
Hsiao, C., Hing, E.: Use and characteristics of electronic health record systems among office-based physician practices: United states, 2001–2012. In: NCHS Data Brief, pp. 1–8 (2012)
Google Scholar
Iwuchukwu, T., Naughton, J.F.: K-anonymization as spatial indexing: Toward scalable and incremental anonymization. In: VLDB, pp. 746–757 (2007)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD, pp. 279–288 (2002)
Google Scholar
Koudas, N., Zhang, Q., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: ICDE ’07, pp. 116–125 (2007)
Google Scholar
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Article Google Scholar
Lau, E., Mowat, F., Kelsh, M., Legg, J., Engel-Nitz, N., Watson, H., Collins, H., Nordyke, R., Whyte, J.: Use of electronic medical records (EMR) for oncology outcomes research: assessing the comparability of EMR information to patient registry and health claims data. Clin. Epidemiol. 3(1), 259–272 (2011)
Google Scholar
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD, pp. 49–60 (2005)
Google Scholar
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)
Google Scholar
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: KDD, pp. 277–286 (2006)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst. 33(3), 1–47 (2008)
Article Google Scholar
Li, J., Wong, R., Fu, A., Pei, J.: Achieving -anonymity by clustering in attribute hierarchical structures. In: DaWaK, pp. 405–416 (2006)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: Closeness: A new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng. 22(7), 943–956 (2010)
Article Google Scholar
Liu, J., Wang, K.: Anonymizing transaction data by integrating suppression and generalization. In: Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, PAKDD ’10, pp. 171–180 (2010)
Google Scholar
Loukides, G., Gkoulalas-Divanis, A.: Utility-preserving transaction data anonymization with low information loss. Expert Syst. Appl. 39(10), 9764–9777 (2012)
Article Google Scholar
Loukides, G., Gkoulalas-Divanis, A.: Utility-aware anonymization of diagnosis codes. IEEE J. Biomed. Health Inform. 17(1), 60–70 (2013)
Article Google Scholar
Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: SAC, pp. 370–374 (2007)
Google Scholar
Loukides, G., Shao, J.: An efficient clustering algorithm for -anonymisation. J. Comput. Sci. Technol. 23(2), 188–202 (2008)
Article Google Scholar
Loukides, G., Shao, J.: Preventing range disclosure in k-anonymised data. Expert Syst. Appl. 38(4), 4559–4574 (2011)
Article Google Scholar
Loukides, G., Tziatzios, A., Shao, J.: Towards preference-constrained -anonymisation. In: DASFAA International Workshop on Privacy- Preserving Data Analysis (PPDA), pp. 231–245 (2009)
Google Scholar
Loukides, G., Denny, J., Malin, B.: The disclosure of diagnosis codes can breach research participants’ privacy. J. Am. Med. Inform. Assoc. 17, 322–327 (2010)
Article Google Scholar
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proc. Natl. Acad. Sci. 17(107), 7898–7903 (2010)
Article Google Scholar
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: An integrative framework for anonymizing clinical and genomic data, Chap. 8 In: Database Technology for Life Sciences and Medicine, pp. 65–89. World Scientific, Singapore (2010)
Google Scholar
Loukides, G., Gkoulalas-Divanis, A., Shao, J.: Anonymizing transaction data to eliminate sensitive inferences. In: DEXA, pp. 400–415 (2010)
Google Scholar
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: Constraint-based anonymization of transactions. Knowl. Inf. Syst. 28(2), 251–282 (2011)
Article Google Scholar
Loukides, G., Gkoulalas-Divanis, A., Shao, J.: Efficient and flexible anonymization of transaction data. Knowl. Inf. Syst. 36(1), 153–210 (2013)
Article Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Google Scholar
Mailman, M., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al.: The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007)
Article Google Scholar
Massimo, R., Angiulli, F., Pizzuti, C.: Descry: a density based clustering algorithm for very large dataset. In: 5th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’04), pp. 25–27 (2004)
Google Scholar
Nergiz, M.E., Clifton, C.: Thoughts on k-anonymization. Data Knowl. Eng. 63(3), 622–645 (2007)
Article Google Scholar
Ollier, W., Sprosen, T., Peakman, T.: UK biobank: from concept to reality. Pharmacogenomics 6(6), 639–646 (2005)
Article Google Scholar
Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: ECML/PKDD (3), pp. 353–369 (2013)
Google Scholar
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(9), 1010–1027 (2001)
Article Google Scholar
Sweeney, L.A.: Computational disclosure control: a primer on data privacy protection. Ph.D. thesis (2001). AAI0803469
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10, 557–570 (2002)
Google Scholar
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. Proc. VLDB 1(1), 115–125 (2008)
Article Google Scholar
Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)
Article Google Scholar
Tildesley, M.J., House, T.A., Bruhn, M., Curry, R., ONeil, M., Allpress, J., Smith, G., Keeling, M.: Impact of spatial clustering on disease transmission and optimal control. Proc. Natl. Acad. Sci. 107(3), 1041–1046 (2010)
Article Google Scholar
Truta, T., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: ICDE Workshops, p. 94 (2006)
Google Scholar
U.S. Department of Health and Human Services Office for Civil Rights: HIPAA administrative statute and rules, http://www.hhs.gov/ocr/privacy/hipaa/administrative/ (September 6, 2015)
Wong, R.C., Li, J., Fu, A., K.Wang: alpha-k-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD, pp. 754–759 (2006)
Google Scholar
Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)
Google Scholar
Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD, pp. 229–240 (2006)
Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: KDD, pp. 785–790 (2006)
Google Scholar
Xu, Y., Wang, K., Fu, A.W.C., Yu, P.S.: Anonymizing transaction databases for publication. In: KDD, pp. 767–775 (2008)
Google Scholar

Download references

Acknowledgements

Grigorios Loukides is partly supported by a Research Fellowship from the Royal Academy of Engineering.

Author information

Authors and Affiliations

Smarter Cities Technology Centre, IBM Research, Dublin, Ireland
Aris Gkoulalas-Divanis
School of Computer Science & Informatics, Cardiff University, Cardiff, UK
Grigorios Loukides

Authors

Aris Gkoulalas-Divanis
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Loukides
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aris Gkoulalas-Divanis .

Editor information

Editors and Affiliations

IBM Research - Ireland, Mulhuddart, Dublin, Ireland
Aris Gkoulalas-Divanis
Cardiff University, Cardiff, United Kingdom
Grigorios Loukides

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gkoulalas-Divanis, A., Loukides, G. (2015). A Survey of Anonymization Algorithms for Electronic Health Records. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-23633-9_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23632-2
Online ISBN: 978-3-319-23633-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics