Skip to main content

Protein/Gene Entity Recognition and Normalization with Domain Knowledge and Local Context

  • Conference paper
  • First Online:
  • 1579 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11831))

Abstract

Biomedical named entity recognition and normalization aim at recognizing biomedical entity mentions from text and mapping them to their unique database entity identifiers (IDs), which are the primary task of biomedical text mining. However, name variation and entity ambiguity problems make this task challenging. In this paper, we leverage domain knowledge by a novel knowledge feature representation method to recognize more entity variants, and model important local context through a dual attention mechanism and a gating mechanism to perform entity normalization. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed system achieves the new state-of-the-art performance (0.844 F1-score for protein/gene entity recognition and 0.408 F1-score for normalization).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lin, Y., Liu, Z., Sun, M.: Neural relation extraction with multi-lingual attention. Proc. Assoc. Comput. Linguist. 1, 34–43 (2017)

    Google Scholar 

  2. Rudolf, K., Ondrej, B., Jan, K.: Knowledge base completion: baselines strike back. In: Proceedings of the Association for Computational Linguistics, pp. 69–74 (2017)

    Google Scholar 

  3. Arighi, C., et al.: Bio-ID track overview. In: Proceedings of BioCreative Workshop, pp. 482–376 (2017)

    Google Scholar 

  4. Sheikhshab, G., Starks, E., Karsan, A., Sarkar, A., Birol, I.: Graph-based semi-supervised gene mention tagging. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing, pp. 27–35 (2016)

    Google Scholar 

  5. Kaewphan, S., Mehryary, F., Hakala, K., Salakoski, T., Ginter, F.: TurkuNLP entry for interactive Bio-ID assignment. In: Proceedings of the BioCreative VI Workshop, pp. 32–35 (2017)

    Google Scholar 

  6. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)

  7. Chiu, J., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)

    Article  Google Scholar 

  8. Sheng, E., Miller, S., Ambite, J., Natarajan, P: A neural named entity recognition approach to biological entity identification. In: Proceedings of the BioCreative VI Workshop, pp. 24–27 (2017)

    Google Scholar 

  9. Devlin, J., Chang, M., Lee, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  10. Apweiler, R., et al.: UniProt: the universal protein knowledgebase. Nucl. Acids Res. 32(suppl_1), D115–D119 (2004)

    Article  Google Scholar 

  11. Edgar, R., Domrachev, M., Lash, A.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucl. Acids Res. 30(1), 207–210 (2002)

    Article  Google Scholar 

  12. Eshel, Y., Cohen, N., Radinsky, K., Markovitch, Y., Levy, O.: Named entity disambiguation for noisy text. arXiv preprint arXiv:1706.09147 (2017)

  13. Ganea, O., Hofmann, T.: Deep joint entity disambiguation with local neural attention. arXiv preprint arXiv:1704.04920 (2017)

  14. GENIA Tagger tool Homepage. https://omictools.com/genia-tagger-tool. Accessed 12 Aug 2019

  15. Moen, S., Ananiadou, T.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, pp. 39–43 (2013)

    Google Scholar 

  16. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the grants of the Ministry of education of Humanities and Social Science project (No. 17YJA740076) and the National Natural Science Foundation of China (No. 61772109). Comments from the audience of CLSW2019 and the reviewers are also acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongze Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, W., Li, X., Li, Z., Liu, Z., Ning, S. (2020). Protein/Gene Entity Recognition and Normalization with Domain Knowledge and Local Context. In: Hong, JF., Zhang, Y., Liu, P. (eds) Chinese Lexical Semantics. CLSW 2019. Lecture Notes in Computer Science(), vol 11831. Springer, Cham. https://doi.org/10.1007/978-3-030-38189-9_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38189-9_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38188-2

  • Online ISBN: 978-3-030-38189-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics