Skip to main content
Log in

Recognition and pseudonymisation of medical records for secondary use

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Health records rank among the most sensitive personal information existing today. An unwanted disclosure to unauthorised parties usually results in significant negative consequences for an individual. Therefore, health records must be adequately protected in order to ensure the individual’s privacy. However, health records are also valuable resources for clinical studies and research activities. In order to make the records available for privacy-preserving secondary use, thorough de-personalisation is a crucial prerequisite to prevent re-identification. This paper introduces MEDSEC, a system which automatically converts paper-based health records into de-personalised and pseudonymised documents which can be accessed by secondary users without compromising the patients’ privacy. The system converts the paper-based records into a standardised structure that facilitates automated processing and the search for useful information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.ehr4cr.eu/index.cfm.

  2. http://eurecaproject.eu/.

  3. https://code.google.com/p/tesseract-ocr/.

  4. https://gate.ac.uk/.

  5. http://www.xitrust.com/produkte/xitrust-business-server/.

  6. https://www.i2b2.org/NLP/HeartDisease/.

References

  1. Appari A, Johnson ME (2010) Information security and privacy in health-care: current state of research. Int J Internet Enterp Manag 6(4):279–314

    Article  Google Scholar 

  2. Appelt DE (1999) Introduction to information extraction. AI Commun 12(3):161–172

    Google Scholar 

  3. Bascifci F, Eldem A (2013) Using reduced rule base with Expert System for the diagnosis of disease in hypertension. Med Biomed Eng Comput 51:1287–1293

    Article  Google Scholar 

  4. Buckland M, Gey F (1994) The relationship between recall and precision. J Am Soc Inform Sci 45(1):12–19

    Article  Google Scholar 

  5. Claerhout B, DeMoor G (2005) Privacy protection for clinical and genomic data: the use of privacy-enhancing techniques in medicine. Int J Med Inform 74(2):257–265

    Article  CAS  PubMed  Google Scholar 

  6. Galindo D, Verheul ER (2007) Microdata sharing via pseudonymisation. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Manchester, pp 24–32

    Google Scholar 

  7. Giakoumaki A, Pavlopoulos S, Koutsouris D (2006) Secure and efficient health data management through multiple watermarking on medical images. Med Biomed Eng Comput 44:619–631

    Article  CAS  Google Scholar 

  8. Grouin D, Rosier A, Dameron O, Zweigenbaum P (2009) Testing tactics to localize de-identification. Stud Health Technol Inform 150:735–739

    PubMed  Google Scholar 

  9. Health Level Seven International (2007) HL7 version 3. Online: www.hl7.org

  10. Heurix J, Karlinger M, Neubauer T (2012) PERiMETER–Pseudonymization and personal metadata encryption for privacy-preserving searchable documents. Health Systems 1(1):46–57

    Article  Google Scholar 

  11. Heurix J, Rella A, Fenz S, Neubauer T (2013) Automated transformation of semi-structured text elements. In: Proceedings of America’s conference on information systems (AMCIS), pp 1–11

  12. Iacono LL (2007) Multi-centric universal pseudonymisation for secondary use of the EHR. Stud Health Technol Inform 126:239–247

    PubMed  Google Scholar 

  13. Imamura T, Matsumoto S, Kanagawa Y, Tajima B, Matsuya S, Furue M, Oyama H (2007) A technique for identifying three diagnostic findings using association analysis. Med Biomed Eng Comput 45:51–59

    Article  Google Scholar 

  14. Morrison F, Li L, Lai A, Hripcsak G (2009) Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc 16(1):37–39

    Article  PubMed  PubMed Central  Google Scholar 

  15. Ness RB (2007) Influence of the HIPAA privacy rule on health research. J Am Med Assoc 298(18):2164–2170

    Article  CAS  Google Scholar 

  16. Noumeir R, Lemay A, Lina J-M (2007) Pseudonymization of radiology data for research purposes. J Digit Imaging 20(3):284–295

    Article  PubMed  PubMed Central  Google Scholar 

  17. Sarawagi S (2008) Information extraction. Found Trends Databases 1(3):261–377

    Article  Google Scholar 

  18. Sibanda T, He T, Szolovits P, Uzuner O (2006) Syntactically-informed semantic category recognition in discharge summaries. In: AMIA annual symposium proceedings, pp 714–718

  19. Simon SR, Evans JS, Benjamin A, Delano D, Bates DW (2009) Patients’ attitudes toward electronic health information exchange: qualitative study. J Med Internet Res 11(3):1–30

    Article  Google Scholar 

  20. Szarvas G, Farkas R, Busa-Fekete R (2007) State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc 14(5):574–580

    Article  PubMed  PubMed Central  Google Scholar 

  21. Union European (1995) Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off J Eur Commun L281:31–50

    Google Scholar 

  22. United States Congress (1996) Health insurance portability and accountability Act of 1996. Pub.L. 104–191, 110 Stat. 1936

  23. Velupillai S, Dalianisa H, Hassela M, Nilsson GH (2009) Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and f-measure in a manual and computerized annotation trial. Int J Med Inform 78(12):19–26

    Article  Google Scholar 

  24. Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L (2007) Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc 14(5):564–573

    Article  PubMed  PubMed Central  Google Scholar 

  25. Willison DJ, Keshavjee K, Nair K, Goldsmith C, Holbrook AM (2003) Patients’ consent preferences for research uses of information in electronic medical records: interview and survey data. Br Med J 326(7385):373

    Article  Google Scholar 

Download references

Acknowledgments

We thank our business partners XiTrust Secure Technologies and Xylem Technologies for supporting the implementation of the case studies carried out within the MEDSEC project. The research was funded by BRIDGE (#824884), FFG-Austrian Research Promotion Agency, and supported by COMET K1, FFG-Austrian Research Promotion Agency.

Ethical standard

Since real-life records from a hospital archive with personal data were used in the case study, special care was taken to ensure the involved patients’ privacy. Access to the data was only allowed for the directly involved project members. Furthermore, the test data were only accessible within the archive computer network and records were not stored, copied, or processed outside the network environment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Heurix.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Heurix, J., Fenz, S., Rella, A. et al. Recognition and pseudonymisation of medical records for secondary use. Med Biol Eng Comput 54, 371–383 (2016). https://doi.org/10.1007/s11517-015-1322-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-015-1322-7

Keywords

Navigation