Skip to main content

A Survey of Anonymization Algorithms for Electronic Health Records

  • Chapter
Medical Data Privacy Handbook

Abstract

Electronic Health Records (EHRs) contain various types of structured data about patients, such as patients’ diagnoses, laboratory results, active medication, and allergies, which are increasingly shared to support a wide spectrum of medical analyses. To protect patient privacy, EHR data must be anonymized before their sharing. Anonymization ensures that the re-identification of patients and/or the inference of patients’ sensitive information is prevented, and it is possible using several algorithms that have been proposed recently. In this chapter, we survey popular data anonymization algorithms for EHR data and explain their objectives, as well as the main aspects of their operation. After that, we present several promising directions for future research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, G., Kenthapadi, F., Motwani, K., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Privacy Technol. 3, 1–8 (2005)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)

    Google Scholar 

  3. Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st ICDE, pp. 217–228 (2005)

    Google Scholar 

  4. Byun, J., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: DASFAA, pp. 188–200 (2007)

    Google Scholar 

  5. Cao, J., Karras, P., Raïssi, C., Tan, K.: r h o-uncertainty: inference-proof transaction anonymization. Proc. VLDB 3(1), 1033–1044 (2010)

    Google Scholar 

  6. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    Article  Google Scholar 

  7. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  8. Du, Y., Xia, T., Tao, Y., Zhang, D., Zhu, F.: On multidimensional k-anonymity with local recoding generalization. In: ICDE ’07, pp. 1422–1424 (2007)

    Google Scholar 

  9. El Emam, K., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.P., Walker, M., Chowdhury, S., Vaillancourt, R., Roffey, T., Bottomley, J.: A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assoc. 16(5), 670–682 (2009). doi: 10.1197/jamia.M3144

    Article  Google Scholar 

  10. El Emam, K., Jonker, E., Arbuckle, L., Malin, B.: A systematic review of re-identification attacks on health data. PLoS ONE 6(12), e28,071 (2011). http://dx.doi.org/10.1371%2Fjournal.pone.0028071

  11. Emam, K.E., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)

    Article  Google Scholar 

  12. Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE, pp. 205–216 (2005)

    Google Scholar 

  13. Fung, B.C.M., Wang, K., Wang, L., Hung, P.C.K.: Privacy-preserving data publishing for cluster analysis. Data Knowl. Eng. 68(6), 552–575 (2009)

    Article  Google Scholar 

  14. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey on recent developments. ACM Comput. Surv. 42, 1–53 (2010)

    Article  Google Scholar 

  15. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very large Data bases, VLDB ’07, pp. 758–769 (2007)

    Google Scholar 

  16. Gionis, A., Mazza, A., Tassa, T.: k-anonymization revisited. In: ICDE, pp. 744–753 (2008)

    Google Scholar 

  17. Gkoulalas-Divanis, A., Loukides, G.: PCTA: privacy-constrained clustering-based transaction data anonymization. In: EDBT PAIS, p. 5 (2011)

    Google Scholar 

  18. Gkoulalas-Divanis, A., Loukides, G.: Revisiting sequential pattern hiding to enhance utility. In: KDD, pp. 1316–1324 (2011)

    Google Scholar 

  19. Gkoulalas-Divanis, A., Verykios, V.S.: Hiding sensitive knowledge without side effects. Knowl. Inf. Syst. 20(3), 263–299 (2009)

    Article  Google Scholar 

  20. Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50(1), 4–19 (2014)

    Article  Google Scholar 

  21. Gwadera, R., Gkoulalas-Divanis, A., Loukides, G.: Permutation-based sequential pattern hiding. In: IEEE International Conference on Data Mining (ICDM), pp. 241–250 (2013)

    Google Scholar 

  22. He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proc. VLDB 2(1), 934–945 (2009)

    Article  Google Scholar 

  23. Hsiao, C., Hing, E.: Use and characteristics of electronic health record systems among office-based physician practices: United states, 2001–2012. In: NCHS Data Brief, pp. 1–8 (2012)

    Google Scholar 

  24. Iwuchukwu, T., Naughton, J.F.: K-anonymization as spatial indexing: Toward scalable and incremental anonymization. In: VLDB, pp. 746–757 (2007)

    Google Scholar 

  25. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD, pp. 279–288 (2002)

    Google Scholar 

  26. Koudas, N., Zhang, Q., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: ICDE ’07, pp. 116–125 (2007)

    Google Scholar 

  27. Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)

    Article  Google Scholar 

  28. Lau, E., Mowat, F., Kelsh, M., Legg, J., Engel-Nitz, N., Watson, H., Collins, H., Nordyke, R., Whyte, J.: Use of electronic medical records (EMR) for oncology outcomes research: assessing the comparability of EMR information to patient registry and health claims data. Clin. Epidemiol. 3(1), 259–272 (2011)

    Google Scholar 

  29. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD, pp. 49–60 (2005)

    Google Scholar 

  30. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)

    Google Scholar 

  31. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: KDD, pp. 277–286 (2006)

    Google Scholar 

  32. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst. 33(3), 1–47 (2008)

    Article  Google Scholar 

  33. Li, J., Wong, R., Fu, A., Pei, J.: Achieving -anonymity by clustering in attribute hierarchical structures. In: DaWaK, pp. 405–416 (2006)

    Google Scholar 

  34. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)

    Google Scholar 

  35. Li, N., Li, T., Venkatasubramanian, S.: Closeness: A new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng. 22(7), 943–956 (2010)

    Article  Google Scholar 

  36. Liu, J., Wang, K.: Anonymizing transaction data by integrating suppression and generalization. In: Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, PAKDD ’10, pp. 171–180 (2010)

    Google Scholar 

  37. Loukides, G., Gkoulalas-Divanis, A.: Utility-preserving transaction data anonymization with low information loss. Expert Syst. Appl. 39(10), 9764–9777 (2012)

    Article  Google Scholar 

  38. Loukides, G., Gkoulalas-Divanis, A.: Utility-aware anonymization of diagnosis codes. IEEE J. Biomed. Health Inform. 17(1), 60–70 (2013)

    Article  Google Scholar 

  39. Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: SAC, pp. 370–374 (2007)

    Google Scholar 

  40. Loukides, G., Shao, J.: An efficient clustering algorithm for -anonymisation. J. Comput. Sci. Technol. 23(2), 188–202 (2008)

    Article  Google Scholar 

  41. Loukides, G., Shao, J.: Preventing range disclosure in k-anonymised data. Expert Syst. Appl. 38(4), 4559–4574 (2011)

    Article  Google Scholar 

  42. Loukides, G., Tziatzios, A., Shao, J.: Towards preference-constrained -anonymisation. In: DASFAA International Workshop on Privacy- Preserving Data Analysis (PPDA), pp. 231–245 (2009)

    Google Scholar 

  43. Loukides, G., Denny, J., Malin, B.: The disclosure of diagnosis codes can breach research participants’ privacy. J. Am. Med. Inform. Assoc. 17, 322–327 (2010)

    Article  Google Scholar 

  44. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proc. Natl. Acad. Sci. 17(107), 7898–7903 (2010)

    Article  Google Scholar 

  45. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: An integrative framework for anonymizing clinical and genomic data, Chap. 8 In: Database Technology for Life Sciences and Medicine, pp. 65–89. World Scientific, Singapore (2010)

    Google Scholar 

  46. Loukides, G., Gkoulalas-Divanis, A., Shao, J.: Anonymizing transaction data to eliminate sensitive inferences. In: DEXA, pp. 400–415 (2010)

    Google Scholar 

  47. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: Constraint-based anonymization of transactions. Knowl. Inf. Syst. 28(2), 251–282 (2011)

    Article  Google Scholar 

  48. Loukides, G., Gkoulalas-Divanis, A., Shao, J.: Efficient and flexible anonymization of transaction data. Knowl. Inf. Syst. 36(1), 153–210 (2013)

    Article  Google Scholar 

  49. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google Scholar 

  50. Mailman, M., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al.: The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007)

    Article  Google Scholar 

  51. Massimo, R., Angiulli, F., Pizzuti, C.: Descry: a density based clustering algorithm for very large dataset. In: 5th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’04), pp. 25–27 (2004)

    Google Scholar 

  52. Nergiz, M.E., Clifton, C.: Thoughts on k-anonymization. Data Knowl. Eng. 63(3), 622–645 (2007)

    Article  Google Scholar 

  53. Ollier, W., Sprosen, T., Peakman, T.: UK biobank: from concept to reality. Pharmacogenomics 6(6), 639–646 (2005)

    Article  Google Scholar 

  54. Poulis, G., Loukides, G., Gkoulalas-Divanis, A., Skiadopoulos, S.: Anonymizing data with relational and transaction attributes. In: ECML/PKDD (3), pp. 353–369 (2013)

    Google Scholar 

  55. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(9), 1010–1027 (2001)

    Article  Google Scholar 

  56. Sweeney, L.A.: Computational disclosure control: a primer on data privacy protection. Ph.D. thesis (2001). AAI0803469

    Google Scholar 

  57. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10, 557–570 (2002)

    Google Scholar 

  58. Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. Proc. VLDB 1(1), 115–125 (2008)

    Article  Google Scholar 

  59. Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)

    Article  Google Scholar 

  60. Tildesley, M.J., House, T.A., Bruhn, M., Curry, R., ONeil, M., Allpress, J., Smith, G., Keeling, M.: Impact of spatial clustering on disease transmission and optimal control. Proc. Natl. Acad. Sci. 107(3), 1041–1046 (2010)

    Article  Google Scholar 

  61. Truta, T., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: ICDE Workshops, p. 94 (2006)

    Google Scholar 

  62. U.S. Department of Health and Human Services Office for Civil Rights: HIPAA administrative statute and rules, http://www.hhs.gov/ocr/privacy/hipaa/administrative/ (September 6, 2015)

  63. Wong, R.C., Li, J., Fu, A., K.Wang: alpha-k-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD, pp. 754–759 (2006)

    Google Scholar 

  64. Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)

    Google Scholar 

  65. Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD, pp. 229–240 (2006)

    Google Scholar 

  66. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: KDD, pp. 785–790 (2006)

    Google Scholar 

  67. Xu, Y., Wang, K., Fu, A.W.C., Yu, P.S.: Anonymizing transaction databases for publication. In: KDD, pp. 767–775 (2008)

    Google Scholar 

Download references

Acknowledgements

Grigorios Loukides is partly supported by a Research Fellowship from the Royal Academy of Engineering.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aris Gkoulalas-Divanis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Gkoulalas-Divanis, A., Loukides, G. (2015). A Survey of Anonymization Algorithms for Electronic Health Records. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23633-9_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23632-2

  • Online ISBN: 978-3-319-23633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics