Skip to main content

The cloud4health Project: Secondary Use of Clinical Data with Secure Cloud-Based Text Mining Services

  • Chapter
  • First Online:
Scientific Computing and Algorithms in Industrial Simulations

Abstract

Advances in translational and personalized medicine require the integration of multiple patient related resources across different organizational bodies. Thus, secure cloud environments for huge data processing, storage and data integration are needed. Moreover, the integration of clinical patient data is indispensable for translational research. Although operational e-health record systems are established in most hospitals, many clinical and phenotypically relevant parameters can only be found in unstructured texts like medical records and reports. To meet these challenges, the cloud4health project established a cloud-based text mining platform to facilitate information extraction of biomedical texts in a secure cloud environment. In order to comply with privacy regulations, general technical demands and security rules for such a cloud installation were developed and have been implemented. Different clinical use cases show the wide spectrum of application of specific text mining services in a secure cloud environment. As application examples, two use cases utilizing text mining technologies to analyse pathology and surgery reports are analysed in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A recent example is the “Heartbleed bug” of the OpenSSL cryptography libraries. The vulnerability causing code was found over 2 years after its initial integration into the libraries’ code base. The flawed code rendered approximately 20% of all Internet servers vulnerable to a potential theft of private data (such as private keys).

  2. 2.

    It should be noted that the ID does not allow identifying the patient because this ID already is a result of the anonymization process. If an agreement with the responsible data protection officers can be reached, it would be desirable to only pseudonymise the documents instead, as patients could benefit from results of the text mining if these results could be mapped to a patient.

  3. 3.

    in cooperation with the Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.

  4. 4.

    RHÖN-KLINIKUM AG, Bad Neustadt/Saale, Germany.

  5. 5.

    http://www.uicc.org/.

References

  1. American Recovery and Reinvestment Act, Website, 2009. Online at https://www.washington.edu/research/gca/recovery/index.html, visited 18 Dec 2015

  2. Berliner Forschungsplattform Gesundheit, Website. Online at http://medinfo.charite.de/forschung/berliner_forschungsplattform_gesundheit/, visited 18 Dec 2015.

  3. D. Carrell, A strategy for deploying secure cloud-based natural language processing systems for applied research involving clinical text, in Proceedings of the 44th Hawaii International Conference on System Sciences, IEEE Computer Society, pp. 1–11 (2011)

    Google Scholar 

  4. W.W. Chapman, P.M. Nadkarni, L. Hirschman, et al., Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inform. Assoc. 18, 540–543 (2011)

    Article  Google Scholar 

  5. K. Chard, M. Russell, Y. Lussier, E. Mendonça, J. Silverstein, A cloud-based approach to medical NLP, in AMIA Annual Symposium Proceedings, PMC, US National Library of Medicine, National Institutes of Health, pp. 207–216 (2011)

    Google Scholar 

  6. J. Christoph, L. Griebel, I. Leb, et al., Secure secondary use of clinical data with cloud-based NLP services. Towards a highly scalable research infrastructure. Methods Inf. Med. 54, 276–282 (2015)

    Google Scholar 

  7. C. Chute, J. Pathak, G. Savova, et al., The SHARPn project on secondary use of electronic medical record data: progress, plans, and possibilities, in AMIA Annual Symposium Proceedings, PMC, US National Library of Medicine, National Institutes of Health, pp. 248–256 (2011)

    Google Scholar 

  8. Cloud Standards Customer Council, Impact of Cloud Computing on Healthcare, Tech. Rep. Online at http://www.cloud-council.org/deliverables/CSCC-Impact-of-Cloud-Computing-on-Healthcare.pdf, visited 18 Dec 2015

  9. cloud4health – Cloud Computing für Big-Data-Analysen in der Medizin, Website, 2013. Online at http://cloud4health.de/, visited 18 Dec 2015

  10. F. Dankar, K. El Emam, A. Neisa, T. Roffey, Estimating the re-identification risk of clinical data sets. BMC Med. Inform. Decis. Mak. 12, 66 (2012)

    Article  Google Scholar 

  11. J.C. Denny, Chapter 13: Mining electronic health records in the genomics era, PLoS Comput. Biol. 8, e1002823 (2012)

    Google Scholar 

  12. J.C. Denny, A. Spickard, R.A. Miller, et al., Identifying UMLS concepts from ECG impressions using KnowledgeMap, in AMIA … Annual Symposium proceedings/AMIA Symposium. AMIA Symposium, pp. 196–200 (2005)

    Google Scholar 

  13. T.M. Deserno, V. Deserno, V. Lowitsch, et al., Aspekte des datenschutzgerechten Managements klinischer Forschungsdaten, in Proceedings of the 2012 GI Jahrestagung, pp. 1491–1505 (2012)

    Google Scholar 

  14. DROOLS – Business Rules Management System, Website, 2015. Online at http://www.drools.org, visited 18 Dec 2015

  15. G.S. Dunham, M.G. Pacak, A.W. Pratt, Automatic indexing of pathology data. J. Am. Soc. Inf. Sci. 29, 81–90 (1978)

    Article  Google Scholar 

  16. Elektronische Fallakte, Website. Online at http://www.fallakte.de, visited 18 Dec 2015

  17. P.L. Elkin, A.P. Ruggieri, S.H. Brown, et al., A randomized controlled trial of the accuracy of clinical record retrieval using SNOMED-RT as compared with ICD9-CM, in Proceedings of the AMIA Symposium, pp. 159–163 (2001)

    Google Scholar 

  18. Endoprothesenregister Deutschland, Website, 2014. Online at http://www.eprd.de, visited 18 Dec (2015)

  19. Federal Data Protection Act, Website, 1990. Online at http://www.gesetze-im-internet.de/englisch_bdsg/index.html, visited 18 Dec 2015

  20. Federal Register/Vol. 78, No. 17 – Modifications to the Health Insurance Portability and Accountability Act, Website, 2013. Online at https://www.gpo.gov/fdsys/pkg/FR-2013-01-25/pdf/2013-01073.pdf, visited 18 Dec 2015

  21. C. Friedman, L. Shagina, Y. Lussier, G. Hripcsak, Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11, 392–402 (2004)

    Article  Google Scholar 

  22. T. Ganslandt, S. Mate, K. Helbing, U. Sax, H.U. Prokosch, Unlocking data for clinical research – the German i2b2 experience. Appl. Clin. Inform. 2, 116–127 (2011)

    Article  Google Scholar 

  23. L. Griebel, H.-U. Prokosch, F. Köpcke, et al., A scoping review of cloud computing in healthcare, (2015). Online at http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-015-0145-7, visited 6 Nov 2015

  24. H. Gurulingappa, B. Müller, M. Hofmann-Apitius, A semantic platform for information retrieval from E-health records, in The Twentieth Text REtrieval Conference (TREC 2011) Proceedings (2011)

    Google Scholar 

  25. D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, J. Fluck, ProMiner: rule-based protein and gene entity recognition. BMC Bioinf. 6, 1–9 (2005)

    Article  Google Scholar 

  26. T. Hupperich, H. Löhr, A.-R. Sadeghi, M. Winandy, Flexible patient-controlled security for electronic health records, in IHI ’12: Proceedings of the 2Nd ACM SIGHIT International Health Informatics Symposium (Association for Computing Machinery, New York, 2012), pp. 727–732

    Book  Google Scholar 

  27. IT-Grundschutz webpages of the Federal Office for Information Security (BSI), Website, 2015. Online at https://www.bsi.bund.de/EN/Topics/ITGrundschutz/itgrundschutz_node.html, visited 18 Dec 2015

  28. Is Open Source Software Insecure? An Introduction to the Issues, Website, 2013. Online at http://oss-watch.ac.uk/resources/securityintro, visited 18 Dec 2015

  29. M. Li, S. Yu, Y. Zheng, K. Ren, W. Lou, Scalable and secure sharing of personal health records in cloud computing using attribute-based encryption, IEEE Trans. Parallel Distrib. Syst. 24, 131–143 (2013)

    Article  Google Scholar 

  30. S. Medicine, Recommendations and practices for using cloud computing in medical environments: Stanford Medicine, Information Resources and Technology, Tech. Rep. Online at https://med.stanford.edu/irt/security/cloud.html, visited 6 Nov 2015

  31. C. Neuhaus, R. Wierschke, M.V. Löwis, A. Polze, Aspekte des datenschutzgerechten Managements klinischer Forschungsdaten, in Proceedings of the 2011 GI Jahrestagung, Workshop “Zukunftsfähiges IT-Management im medizinischen Bereich” (2011)

    Google Scholar 

  32. Operational Data Model, Website, 2015. Online at http://www.cdisc.org/odm, visited 18 Dec 2015

  33. OpenNebula Project Webpages, Website, 2015. Online at http://opennebula.org/, visited 18 Dec 2015

  34. OpenVPN, Website, 2015. Online at https://openvpn.net/, visited 18 Dec 2015

  35. Open vSwitch – Production Quality, Multilayer Open Virtual Switch, Website, 2015. Online at http://openvswitch.org/, visited 18 Dec 2015

  36. K. Pommerening, K. Helbing, T. Ganslandt, J. Drepper, Identitätsmanagement für Patienten in medizinischen Forschungsverbünden, in Proceedings of the 2012 GI Jahrestagung, pp. 1520–1529 (2012)

    Google Scholar 

  37. H.B. Rahmouni, T. Solomonides, M.C. Mont, S. Shiu, Privacy compliance and enforcement on European healthgrids: an approach through ontology. Philos. Trans. Math. Phys. Eng. Sci. 368, 4057–4072 (2010)

    Article  Google Scholar 

  38. S. Rea, C. Chute, J. Pathak, et al., Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J. Biomed. Inform. 45, 763–771 (2012)

    Article  Google Scholar 

  39. Reform of EU Data Protection Rules, Website, 2015. Online at http://ec.europa.eu/justice/data-protection/reform/index_en.htm, visited 18 Dec 2015

  40. N. Regola, N.V. Chawla, Storing and using health data in a virtual private cloud. J. Med. Internet Res. 15, e63 (2013)

    Article  Google Scholar 

  41. G.K. Savova, J.J. Masanz, P.V. Ogren, et al., Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507–513 (2010)

    Article  Google Scholar 

  42. U.K. Schneider, Sekundärnutzung klinischer Daten – Rechtliche Rahmenbedingungen, in Schriftenreihe der TMF, Band 12, Medizinisch Wissenschaftliche Verlagsgesellschaft (2015)

    Google Scholar 

  43. SHARP – Strategic Health IT Advanced Research Projects, Website, 2013. Online at https://www.healthit.gov/policy-researchers-implementers/strategic-health-it-advanced-research-projects-sharp, visited 18 Dec 2015

  44. SHARPn – Secondary Use of EHR Data, Website, 2013. Online at https://www.healthit.gov/policy-researchers-implementers/secondary-use-ehr-data, visited 18 Dec 2015

  45. C.P. Shen, C. Jigjidsuren, S. Dorjgochoo, et al., A data-mining framework for transnational healthcare system. J. Med. Syst. 36, 2565–2575 (2012)

    Article  Google Scholar 

  46. Sicheres Cloud-basiertes Datenmanagement im Umfeld der klinischen Forschung: Project Webpages, Website, 2014. Online at http://www.cloudi-o.de, visited 18 Dec 2015

  47. SkIDentity – Trusted Identities for the Cloud, Website, 2015. Online at https://www.skidentity.de/en/home/, visited 18 Dec 2015

  48. K. Tomanek, P. Daumke, F. Enders, et al., An interactive de-identification-system, in Proceedings of the GMDS 2013 (2013)

    Google Scholar 

  49. TREC Medical Records Track, Website, 2015. Online at https://www.i2b2.org/NLP/, visited 18 Dec 2015

  50. TREC Medical Records Track, Website, 2015. Online at http://trec.nist.gov/data/medical.html, visited 18 Dec 2015

  51. TRESOR – Trusted Ecosystem for Standardized and Open Cloud-Based Resources: Project Webpages, Website, 2015. Online at http://www.cloud-tresor.de, visited 18 Dec 2015

  52. Trusted Cloud Projekt, Website, 2015. Online at https://www.trusted-cloud.de/projekt, visited 18 Dec 2015

  53. UIMA – Unstructured Information Management Architecture, Website, 2015. Online at http://uima.apache.org/, visited 18 Dec 2015

  54. O. Uzuner, I. Solti, E. Cadag, Extracting medication information from clinical text. J. Am Med. Inform. Assoc. 17, 514–518 (2010)

    Article  Google Scholar 

  55. Ö. Uzuner, B.R. South, S. Shen, S.L. DuVall, 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18, 552–556 (2011)

    Article  Google Scholar 

  56. E.M. Voorhees, W. Hersh, Overview of the TREC 2012 medical records track, in The Twenty-First Text REtrieval Conference (TREC 2012) Proceedings, ed. by E. Voorhees, L. Buckland (2012)

    Google Scholar 

  57. X. Wang, G. Hripcsak, M. Markatou, C. Friedman, Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J. Am. Med. Inform. Assoc. 16, 328–337 (2009)

    Article  Google Scholar 

  58. C. Wang, K. Ren, W. Lou, J. Li, Toward publicly auditable secure cloud data storage services. IEEE Netw. 24, 19–24 (2010)

    Article  Google Scholar 

  59. C. Wang, S.S.M. Chow, Q. Wang, K. Ren, W. Lou, Privacy-preserving public auditing for secure cloud storage. IEEE Trans. Comput. 62, 362–375 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  60. H. Xu, S.P. Stenner, S. Doan, et al., MedEx: a medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17, 19–24 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

The project cloud4health has been funded by the German Federal Ministry of Economics and Technology in the funding program “Trusted Cloud” (FKZ 01MD11009).

Besides Fraunhofer SCAI, four other partners participated in and contributed to the project: Averbis GmbH, located in Freiburg, coordinated cloud4health, set up the UMIA based text mining environment in the cloud and developed text mining services as well. The Friedrich-Alexander-University Erlangen-Nuremberg and the RHÖN-KLINIKUM AG Bad Neustadt/Saale provided the clinical data and set up the clinical extraction workflow. Finally, TMF—Technology, Methods, and Infrastructure for Networked Medical Research, Berlin, was responsible for data protection related issues.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juliane Fluck .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Fluck, J., Senger, P., Ziegler, W., Claus, S., Schwichtenberg, H. (2017). The cloud4health Project: Secondary Use of Clinical Data with Secure Cloud-Based Text Mining Services. In: Griebel, M., Schüller, A., Schweitzer, M. (eds) Scientific Computing and Algorithms in Industrial Simulations. Springer, Cham. https://doi.org/10.1007/978-3-319-62458-7_15

Download citation

Publish with us

Policies and ethics