Abstract
Disease name normalization aims at mapping various disease names to standardized disease vocabulary entries. Disease names have such a wide variation that dictionary lookup method couldn’t get a high accuracy on this task. Dnorm is the first machine learning approach for this task. It is not robust enough due to strong dependence on training dataset. In this article, we propose a deep learning way for disease name representation and normalization. Representations of composing words can be learned from large unlabelled literature corpus. Rich semantic and syntactic properties of disease names are encoded in the representations during the process. With the new way of representations for disease names, a higher accuracy is achieved in the normalization task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011, baq036 (2011)
Garcia-Albornoz, M., Nielsen, J.: Finding directionality and gene-disease predictions in disease associations. BMC Syst. Biol. 9(1), 35 (2015)
Yu, L., Huang, J., Ma, Z., et al.: Inferring drug-disease associations based on known protein complexes. BMC Med. Genomics 8(2), S2 (2015)
Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)
DoÄźan, R.I., Lu, Z.: An inference method for disease name normalization. In: AAAI Fall Symposium Series (2012)
Kang, N., Singh, B., Afzal, Z., et al.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013)
Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Cao, Z., Qin, T., Liu, T.Y., et al.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine learning, pp. 129–136. ACM (2007)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566 (2015)
Al-Rfou, R., et al.: Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint (2016)
Medical Subject Headings. https://www.nlm.nih.gov/mesh
An Online Catalog of Human Genes and Genetic Disorders. https://www.omim.org
Davis, A.P., Wiegers, T.C., Rosenstein, M.C., et al.: MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012, bar065 (2012)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
US National Labrary of Medicine. https://www.ncbi.nlm.nih.gov/pubmed
Li, J., Sun, Y., Johnson, R.J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, baw068 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Liu, H., Xu, Y. (2018). A Deep Learning Way for Disease Name Representation and Normalization. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)