Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts

Zhang, Jiangtao; Li, Juanzi; Wang, Shuai; Zhang, Yan; Cao, Yixin; Hou, Lei; Li, Xiao-Li

doi:10.1007/978-3-319-93037-4_22

Jiangtao Zhang¹⁹,
Juanzi Li¹⁹,
Shuai Wang¹⁹,
Yan Zhang¹⁹,
Yixin Cao¹⁹,
Lei Hou¹⁹ &
…
Xiao-Li Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2084 Accesses
5 Citations

Abstract

Clinical Named Entity Recognition (CNER), the task of identifying the entity boundaries in clinical texts, is essential for many applications. Previous methods usually follow the traditional NER methods that heavily rely on language specific features (i.e. linguistics and lexicons) and high quality annotated data. However, due to the problem of Limited Availability of Annotated Data and Informal Clinical Texts, CNER becomes more challenging. In this paper, we propose a novel method that learn multiple representations for each category, namely category-multi-representation (CMR) that captures the semantic relatedness between words and clinical categories from different perspectives. CMR is learned based on a large scale unannotated corpus and a small set of annotated data, which greatly alleviates the burden of human effort. Instead of the language specific features, our proposed method uses more evidential features without any additional NLP tools, and enjoys a lightweight adaption among languages. We conduct a series of experiments to verify our new CMR features can further improve the performance of NER significantly without leveraging any external lexicons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
These methods extract lexicons from UMLS [3] or MeSH: https://www.nlm.nih.gov/mesh/meshhome.html.
2.
If one word belongs to multiple mentions, we simply choose the one with highest frequency.
3.
In UMLS, each concept (entity) is represented by its CUI and is semantically classified into one of semantic types.
4.
For those mentions mapping to unknown CUI, i.e. CUI-less.
5.
http://www.chokkan.org/software/crfsuite/.
6.
http://deeplearning.net/software/theano/.
7.
The same as the ones used in [10] except lexical features extracted from existing annotated tools.

References

Abacha, A.B., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: BioNLP, pp. 56–64 (2011)
Google Scholar
Aronson, A.R., Lang, F.M.: An overview of metamap: historical perspective and recent advances. JAMIA 17, 229–236 (2010)
Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology (2004)
Article Google Scholar
Bodnari, A., Deléger, L., Lavergne, T., Névéol, A., Zweigenbaum, P.: A supervised named-entity extraction system for medical text. In: Working Notes for CLEF Conference (2013)
Google Scholar
Bodnari, A., Deléger, L., Lavergne, T., Névéol, A., Zweigenbaum, P.: A supervised named-entity extraction system for medical text. In: CLEF (2013)
Google Scholar
de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med. Inform. Assoc. 18(5), 557 (2011)
Article Google Scholar
Chapman, W.W., Chu, D., Dowling, J.N.: Context: an algorithm for identifying contextual features from clinical text. In: BioNLP 2007, pp. 81–88 (2007)
Google Scholar
Dernoncourt, F., Lee, J.Y., Uzuner, Ö., Szolovits, P.: De-identification of patient notes with recurrent neural networks. JAMIA 24(3), 596–606 (2017)
Google Scholar
Donnelly, K.: SNOMED-CT: the advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. 121, 279–90 (2006)
Google Scholar
Jiang, M., Chen, Y., Liu, M., Rosenbloom, S.T., Mani, S., Denny, J.C., Xu, H.: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. JAMIA 18, 601–606 (2011)
Google Scholar
Johnson, A.E.W., Pollard, T.J., Shen, L., Lehman, L.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Article Google Scholar
Kundeti, S.R., Vijayananda, J., Mujjiga, S., Kalyan, M.: Clinical named entity recognition: challenges and opportunities. In: 2016 IEEE International Conference on Big Data, pp. 1937–1945 (2016)
Google Scholar
Li, L., Jin, L., Jiang, Z., Song, D., Huang, D.: Biomedical named entity recognition based on extended recurrent neural networks. In: BIBM, pp. 649–652 (2015)
Google Scholar
Liu, S., Tang, B., Chen, Q., Wang, X.: Effects of semantic features on machine learning-based drug name recognition systems: word embeddings vs. manually constructed dictionaries. Information 6(4), 848–865 (2015)
Article Google Scholar
Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inform. 35, 128–144 (2008)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Proc. Syst. 26, 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL (2009)
Google Scholar
Sadikin, M., Fanany, M.I., Basaruddin, T.: A new data representation based on training data characteristics to extract drug name entity in medical text. Comput. Intell. Neurosci. 2016, 16 (2016)
Article Google Scholar
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. JAMIA 17(5), 507–513 (2010)
Google Scholar
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: JNLPBA, pp. 104–107 (2004)
Google Scholar
Shen, D., Zhang, J., Zhou, G., Su, J., Tan, C.L.: Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain. In: BioMed, pp. 49–56 (2003)
Google Scholar
Suominen, H., Salanterä, S., Velupillai, S., Chapman, W.W., Savova, G., Elhadad, N., Pradhan, S., South, B.R., Mowery, D.L., Jones, G.J.F., Leveling, J., Kelly, L., Goeuriot, L., Martinez, D., Zuccon, G.: Overview of the ShARe/CLEF eHealth evaluation lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 212–231. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_24
Chapter Google Scholar
Takeuchi, K., Collier, N.: Bio-medical entity extraction using support vector machines. In: BioMed, pp. 57–64 (2003)
Google Scholar
Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552 (2011)
Article Google Scholar
Wei, Q., Chen, T., Xu, R., He, Y., Gui, L.: Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. In: Database 2016 (2016)
Article Google Scholar
Xu, Y., Hua, J., Ni, Z., Chen, Q., Fan, Y., Ananiadou, S., Chang, E.I.C., Tsujii, J.: Anatomical entity recognition with a hierarchical framework augmented by external resources. Plos One 9, 1–13 (2014)
Google Scholar
Zeng, D., Sun, C., Lin, L., Liu, B.: Enlarging drug dictionary with semi-supervised learning for drug entity recognition. In: BIBM, pp. 1929–1931 (2016)
Google Scholar
Zhao, Z., Yang, Z., Luo, L., Zhang, Y., Wang, L., Lin, H., Wang, J.: ML-CNN: a novel deep learning based disease named entity recognition architecture. In: BIBM, p. 794 (2016)
Google Scholar

Download references

Acknowledgements

The work is supported by major national research and development projects (2017YFB1002101), NSFC key project (U1736204, 61661146007), Fund of Online Education Research Center, Ministry of Education (No. 2016ZD102), and THU-NUS NExT Co-Lab.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Jiangtao Zhang, Juanzi Li, Shuai Wang, Yan Zhang, Yixin Cao & Lei Hou
Institute for Infocomm Research, A*STAR, Singapore, 138632, Singapore
Xiao-Li Li

Authors

Jiangtao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juanzi Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Cao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Hou
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Li Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangtao Zhang .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2018). Category Multi-representation: A Unified Solution for Named Entity Recognition in Clinical Texts. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-93037-4_22
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics