Skip to main content

Biomedical Entity Normalization Using Encoder Regularization and Dynamic Ranking Mechanism

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

  • 1879 Accesses

Abstract

Biomedical entity normalization is a fundamental method for lots of downstream applications. Due to the rich additional information for biomedical entities in medical dictionaries, such as synonyms or definitions, transformer-based models are applied to dig semantic representations in normalization recently. Despite the high performance of the transformer-based model, the over-fitting problem remains challenging and unsolved. Besides, bi-encoder structure and cross-encoder structure are popularly applied in many biomedical entity normalization works, the issue to measure the distance of such encoder structures is very challenging. Moreover, the triples margin ranking loss mechanism is widely used in reranking stage of entity normalization. In this paper, we proposed an encoder-level regularization to restrain the over-fitting problem caused by the deep representation of transformer. Moreover, we use a dynamic margin ranking mechanism instead of fixed margin selection in reranking stage during training. In this way, we experiment our model on three biomedical entity normalization datasets, and the empirical results outperform previous state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/ CBBresearch/Dogan/DISEASE.

  2. 2.

    https://biocreative.bioinformatics.udel.edu/tasks/biocreative-v/track-3-cdr.

References

  1. Dogan, R.I., Murray, G.C., Névéol, A., Lu, Z.: Understanding pubmed® user search behavior through log analysis. In: Database 2009 (2009)

    Google Scholar 

  2. Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)

    Article  Google Scholar 

  3. Wei, C.-H., Kao, H.-Y., Lu, Z.: GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. BioMed Res. Int. 2015 (2015)

    Google Scholar 

  4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Wu, L., et al.: R-drop: regularized dropout for neural networks. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  6. Bhowmik, R., Stratos, K., de Melo, G.: Fast and effective biomedical entity linking using a dual encoder. arXiv preprint arXiv:2103.05028 (2021)

  7. Xu, D., Zhang, Z., Bethard, S.: A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8452–8464 (2020)

    Google Scholar 

  8. Luan, Y., Eisenstein, J., Toutanova, K., Collins, M.: Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Linguist. 9, 329–345 (2021)

    Article  Google Scholar 

  9. Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. arXiv preprint arXiv:2005.00239 (2020)

  10. Yan, C., Zhang, Y., Liu, K., Zhao, J., Shi, Y., Liu, S.: Biomedical concept normalization by leveraging hypernyms. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3512–3517 (2021)

    Google Scholar 

  11. Li, H., et al.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 79–86 (2017)

    Google Scholar 

  12. Fakhraei, S., Mathew, J., Ambite, J.L.: NSEEN: neural semantic embedding for entity normalization. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 665–680. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_40

    Chapter  Google Scholar 

  13. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Article  MathSciNet  Google Scholar 

  14. Vashishth, S., Joshi, R., Newman-Griffis, D., Dutt, R., Rose, C.: Med-type: improving medical entity linking with semantic type prediction. arxiv e-prints, page. arXiv preprint arXiv:2005.00460 (2020)

  15. Gao, L., Dai, Z., Callan, J.: Modularized transfomer-based ranking framework. arXiv preprint arXiv:2004.13313 (2020)

  16. Zhang, W., Hua, W., Stratos, K.: EntQA: entity linking as question answering. arXiv preprint arXiv:2110.02369 (2021)

  17. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUS. IEEE Trans. Big Data 7(3), 535–547 (2019)

    Article  Google Scholar 

  18. Davis, A.P., Wiegers, T.C., Rosenstein, M.C., Mattingly, C.J.: MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012, bar065 (2012)

    Google Scholar 

  19. Davis, A.P., et al.: The comparative toxicogenomics database: update 2019. Nucl. Acids Res. 47(D1), D948–D954 (2019)

    Article  Google Scholar 

  20. Gillick, D., et al.: Learning dense representations for entity retrieval. arXiv preprint arXiv:1909.10506 (2019)

  21. Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot entity linking with dense entity retrieval. arXiv preprint arXiv:1911.03814 (2019)

  22. Zhang, W., Stratos, K.: Understanding hard negatives in noise contrastive estimation. arXiv preprint arXiv:2104.06245 (2021)

  23. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)

    Article  Google Scholar 

  24. Li, J., et al.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. In: Database 2016 (2016)

    Google Scholar 

  25. D’Souza, J., Ng, V.: Sieve-based entity linking for the biomedical domain. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 297–302 (2015)

    Google Scholar 

  26. Wright, D.: NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction. University of California, San Diego (2019)

    Google Scholar 

  27. Phan, M.C., Sun, A., Tay, Y.: Robust representation learning of biomedical names. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3275–3285 (2019)

    Google Scholar 

  28. Ji, Z., Wei, Q., Hua, X.: Bert-based ranking for biomedical entity normalization. AMIA Summits Transl. Sci. Proc. 2020, 269 (2020)

    Google Scholar 

  29. Mondal, I., et al.: Medical entity linking using triplet network. arXiv preprint arXiv:2012.11164 (2020)

  30. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  31. Leaman, R., Zhiyong, L.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongbin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, S. et al. (2023). Biomedical Entity Normalization Using Encoder Regularization and Dynamic Ranking Mechanism. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44693-1_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44692-4

  • Online ISBN: 978-3-031-44693-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics