Skip to main content

Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Abstract

Clinical Named Entity Recognition (CNER), the task of identifying the entity boundaries in clinical texts, is essential for many applications. Previous methods usually follow the traditional NER methods that heavily rely on language specific features (i.e. linguistics and lexicons) and high quality annotated data. However, due to the problem of Limited Availability of Annotated Data and Informal Clinical Texts, CNER becomes more challenging. In this paper, we propose a novel method that learn multiple representations for each category, namely category-multi-representation (CMR) that captures the semantic relatedness between words and clinical categories from different perspectives. CMR is learned based on a large scale unannotated corpus and a small set of annotated data, which greatly alleviates the burden of human effort. Instead of the language specific features, our proposed method uses more evidential features without any additional NLP tools, and enjoys a lightweight adaption among languages. We conduct a series of experiments to verify our new CMR features can further improve the performance of NER significantly without leveraging any external lexicons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These methods extract lexicons from UMLS [3] or MeSH: https://www.nlm.nih.gov/mesh/meshhome.html.

  2. 2.

    If one word belongs to multiple mentions, we simply choose the one with highest frequency.

  3. 3.

    In UMLS, each concept (entity) is represented by its CUI and is semantically classified into one of semantic types.

  4. 4.

    For those mentions mapping to unknown CUI, i.e. CUI-less.

  5. 5.

    http://www.chokkan.org/software/crfsuite/.

  6. 6.

    http://deeplearning.net/software/theano/.

  7. 7.

    The same as the ones used in [10] except lexical features extracted from existing annotated tools.

References

  1. Abacha, A.B., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: BioNLP, pp. 56–64 (2011)

    Google Scholar 

  2. Aronson, A.R., Lang, F.M.: An overview of metamap: historical perspective and recent advances. JAMIA 17, 229–236 (2010)

    Google Scholar 

  3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology (2004)

    Article  Google Scholar 

  4. Bodnari, A., Deléger, L., Lavergne, T., Névéol, A., Zweigenbaum, P.: A supervised named-entity extraction system for medical text. In: Working Notes for CLEF Conference (2013)

    Google Scholar 

  5. Bodnari, A., Deléger, L., Lavergne, T., Névéol, A., Zweigenbaum, P.: A supervised named-entity extraction system for medical text. In: CLEF (2013)

    Google Scholar 

  6. de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med. Inform. Assoc. 18(5), 557 (2011)

    Article  Google Scholar 

  7. Chapman, W.W., Chu, D., Dowling, J.N.: Context: an algorithm for identifying contextual features from clinical text. In: BioNLP 2007, pp. 81–88 (2007)

    Google Scholar 

  8. Dernoncourt, F., Lee, J.Y., Uzuner, Ö., Szolovits, P.: De-identification of patient notes with recurrent neural networks. JAMIA 24(3), 596–606 (2017)

    Google Scholar 

  9. Donnelly, K.: SNOMED-CT: the advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. 121, 279–90 (2006)

    Google Scholar 

  10. Jiang, M., Chen, Y., Liu, M., Rosenbloom, S.T., Mani, S., Denny, J.C., Xu, H.: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. JAMIA 18, 601–606 (2011)

    Google Scholar 

  11. Johnson, A.E.W., Pollard, T.J., Shen, L., Lehman, L.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Article  Google Scholar 

  12. Kundeti, S.R., Vijayananda, J., Mujjiga, S., Kalyan, M.: Clinical named entity recognition: challenges and opportunities. In: 2016 IEEE International Conference on Big Data, pp. 1937–1945 (2016)

    Google Scholar 

  13. Li, L., Jin, L., Jiang, Z., Song, D., Huang, D.: Biomedical named entity recognition based on extended recurrent neural networks. In: BIBM, pp. 649–652 (2015)

    Google Scholar 

  14. Liu, S., Tang, B., Chen, Q., Wang, X.: Effects of semantic features on machine learning-based drug name recognition systems: word embeddings vs. manually constructed dictionaries. Information 6(4), 848–865 (2015)

    Article  Google Scholar 

  15. Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inform. 35, 128–144 (2008)

    Google Scholar 

  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Proc. Syst. 26, 3111–3119 (2013)

    Google Scholar 

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)

    Google Scholar 

  18. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL (2009)

    Google Scholar 

  19. Sadikin, M., Fanany, M.I., Basaruddin, T.: A new data representation based on training data characteristics to extract drug name entity in medical text. Comput. Intell. Neurosci. 2016, 16 (2016)

    Article  Google Scholar 

  20. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. JAMIA 17(5), 507–513 (2010)

    Google Scholar 

  21. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: JNLPBA, pp. 104–107 (2004)

    Google Scholar 

  22. Shen, D., Zhang, J., Zhou, G., Su, J., Tan, C.L.: Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain. In: BioMed, pp. 49–56 (2003)

    Google Scholar 

  23. Suominen, H., Salanterä, S., Velupillai, S., Chapman, W.W., Savova, G., Elhadad, N., Pradhan, S., South, B.R., Mowery, D.L., Jones, G.J.F., Leveling, J., Kelly, L., Goeuriot, L., Martinez, D., Zuccon, G.: Overview of the ShARe/CLEF eHealth evaluation lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_24

    Chapter  Google Scholar 

  24. Takeuchi, K., Collier, N.: Bio-medical entity extraction using support vector machines. In: BioMed, pp. 57–64 (2003)

    Google Scholar 

  25. Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552 (2011)

    Article  Google Scholar 

  26. Wei, Q., Chen, T., Xu, R., He, Y., Gui, L.: Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. In: Database 2016 (2016)

    Article  Google Scholar 

  27. Xu, Y., Hua, J., Ni, Z., Chen, Q., Fan, Y., Ananiadou, S., Chang, E.I.C., Tsujii, J.: Anatomical entity recognition with a hierarchical framework augmented by external resources. Plos One 9, 1–13 (2014)

    Google Scholar 

  28. Zeng, D., Sun, C., Lin, L., Liu, B.: Enlarging drug dictionary with semi-supervised learning for drug entity recognition. In: BIBM, pp. 1929–1931 (2016)

    Google Scholar 

  29. Zhao, Z., Yang, Z., Luo, L., Zhang, Y., Wang, L., Lin, H., Wang, J.: ML-CNN: a novel deep learning based disease named entity recognition architecture. In: BIBM, p. 794 (2016)

    Google Scholar 

Download references

Acknowledgements

The work is supported by major national research and development projects (2017YFB1002101), NSFC key project (U1736204, 61661146007), Fund of Online Education Research Center, Ministry of Education (No. 2016ZD102), and THU-NUS NExT Co-Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangtao Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J. et al. (2018). Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93037-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93036-7

  • Online ISBN: 978-3-319-93037-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics