Abstract
Biomedical entity normalization is a fundamental method for lots of downstream applications. Due to the rich additional information for biomedical entities in medical dictionaries, such as synonyms or definitions, transformer-based models are applied to dig semantic representations in normalization recently. Despite the high performance of the transformer-based model, the over-fitting problem remains challenging and unsolved. Besides, bi-encoder structure and cross-encoder structure are popularly applied in many biomedical entity normalization works, the issue to measure the distance of such encoder structures is very challenging. Moreover, the triples margin ranking loss mechanism is widely used in reranking stage of entity normalization. In this paper, we proposed an encoder-level regularization to restrain the over-fitting problem caused by the deep representation of transformer. Moreover, we use a dynamic margin ranking mechanism instead of fixed margin selection in reranking stage during training. In this way, we experiment our model on three biomedical entity normalization datasets, and the empirical results outperform previous state-of-the-art models.
Similar content being viewed by others
References
Dogan, R.I., Murray, G.C., Névéol, A., Lu, Z.: Understanding pubmed® user search behavior through log analysis. In: Database 2009 (2009)
Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Wei, C.-H., Kao, H.-Y., Lu, Z.: GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. BioMed Res. Int. 2015 (2015)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Wu, L., et al.: R-drop: regularized dropout for neural networks. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Bhowmik, R., Stratos, K., de Melo, G.: Fast and effective biomedical entity linking using a dual encoder. arXiv preprint arXiv:2103.05028 (2021)
Xu, D., Zhang, Z., Bethard, S.: A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8452–8464 (2020)
Luan, Y., Eisenstein, J., Toutanova, K., Collins, M.: Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Linguist. 9, 329–345 (2021)
Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. arXiv preprint arXiv:2005.00239 (2020)
Yan, C., Zhang, Y., Liu, K., Zhao, J., Shi, Y., Liu, S.: Biomedical concept normalization by leveraging hypernyms. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3512–3517 (2021)
Li, H., et al.: CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18(11), 79–86 (2017)
Fakhraei, S., Mathew, J., Ambite, J.L.: NSEEN: neural semantic embedding for entity normalization. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 665–680. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_40
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Vashishth, S., Joshi, R., Newman-Griffis, D., Dutt, R., Rose, C.: Med-type: improving medical entity linking with semantic type prediction. arxiv e-prints, page. arXiv preprint arXiv:2005.00460 (2020)
Gao, L., Dai, Z., Callan, J.: Modularized transfomer-based ranking framework. arXiv preprint arXiv:2004.13313 (2020)
Zhang, W., Hua, W., Stratos, K.: EntQA: entity linking as question answering. arXiv preprint arXiv:2110.02369 (2021)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUS. IEEE Trans. Big Data 7(3), 535–547 (2019)
Davis, A.P., Wiegers, T.C., Rosenstein, M.C., Mattingly, C.J.: MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012, bar065 (2012)
Davis, A.P., et al.: The comparative toxicogenomics database: update 2019. Nucl. Acids Res. 47(D1), D948–D954 (2019)
Gillick, D., et al.: Learning dense representations for entity retrieval. arXiv preprint arXiv:1909.10506 (2019)
Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot entity linking with dense entity retrieval. arXiv preprint arXiv:1911.03814 (2019)
Zhang, W., Stratos, K.: Understanding hard negatives in noise contrastive estimation. arXiv preprint arXiv:2104.06245 (2021)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Li, J., et al.: Biocreative V CDR task corpus: a resource for chemical disease relation extraction. In: Database 2016 (2016)
D’Souza, J., Ng, V.: Sieve-based entity linking for the biomedical domain. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 297–302 (2015)
Wright, D.: NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction. University of California, San Diego (2019)
Phan, M.C., Sun, A., Tay, Y.: Robust representation learning of biomedical names. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3275–3285 (2019)
Ji, Z., Wei, Q., Hua, X.: Bert-based ranking for biomedical entity normalization. AMIA Summits Transl. Sci. Proc. 2020, 269 (2020)
Mondal, I., et al.: Medical entity linking using triplet network. arXiv preprint arXiv:2012.11164 (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Leaman, R., Zhiyong, L.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, S. et al. (2023). Biomedical Entity Normalization Using Encoder Regularization and Dynamic Ranking Mechanism. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-44693-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)