Skip to main content

Advertisement

Log in

ArRaNER: A novel named entity recognition model for biomedical literature documents

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Developments in advanced innovations have prompted the generation of an immense amount of digital information. The data deluge contains hidden information that is difficult to extract. In the biomedical domain, the development of technology has caused the production of voluminous data. Processing these voluminous textual data is referred to as ‘biomedical content mining’. Emerging artificial intelligence (AI) models play a major role in the automation of Pharma 4.0. In AI, natural language processing (NLP) plays a dynamic role in extracting knowledge from biomedical documents. Research articles published by scientists and researchers contain an enormous amount of hidden information. Most of the original and peer-reviewed articles are indexed in PubMed. Extracting meaningful information from a large number of literature documents is very difficult for human beings. This research aims to extract the named entities of literature documents available in the life science domain. A high-level architecture is proposed along with a novel named entity recognition (NER) model. The model is built using rule-based machine learning (ML). The proposed ArRaNER model produced better accuracy and was also able to identify more entities. The NER model was tested on two different datasets: a PubMed dataset and a Wikipedia talk dataset. The ArRaNER model obtains an accuracy of 83.42% on the PubMed articles and 77.65% on the Wikipedia articles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook Med Info 17(1):128–144. https://doi.org/10.1055/s-0038-1638592

    Article  Google Scholar 

  2. Lena M°artensson, Gunnel Hensing, (2012) Health literacy–a heterogeneous phenomenon: a literature review. Scandinavian J Caring Sci 26(1):151–160. https://doi.org/10.1111/j.1471-6712.2011.00900.x

    Article  Google Scholar 

  3. Lim S, Lee K, Kang J (2018) Drug drug interaction extraction from the literature using a recursive neural network. PLoS ONE 13(1):e0190926

    Article  Google Scholar 

  4. Barbara Rosario, Marti Hearst (2004) Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual meeting of the association for computational linguistics (ACL-04), pages 430–437, Barcelona, Spain, July.

  5. GiorgiJM BaderGD (2018) Transfer learning for biomedical named entity recognition with neural networks. J Bioinfo 34:4087

    Article  Google Scholar 

  6. Habibi M (2017) Deep learning with word embeddings improves biomedical named entity recognition. J of Bioinf 33(37):48

    Google Scholar 

  7. Wang X (2015) Cross-type biomedical named entity recognition with deep multi-task learning. J Bioinf 35:1745–1752

    Article  Google Scholar 

  8. Yoon W (2019) Collabonet: collaboration of deep neural networks for biomedical named entity recognition. BMC Bioinf 20:249

    Article  Google Scholar 

  9. Bhasuran B, NatarajanJ, (2018) Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS ONE 13:e0200699

    Article  Google Scholar 

  10. LimS KJ (2018) Chemical–gene relation extraction using recursive neural network. Database The J Biol Database Curation. https://doi.org/10.1093/database/bay060

    Article  Google Scholar 

  11. Wiese G (2017) Neural domain adaptation for biomedical question answering. In: Proceedings of the 21st conference on computational natural language learning (CoNLL 2017) Vancouver, Canada, 281–289.

  12. Mikolov T (2013) Distributed representations of words and phrases and their compositionality In: Burges,C.J.C. (eds.)”, Advances in neural information processing systems 26, Curran Associates, Inc., 2013, pp.3111–3119.

  13. Liu Z, Lin Y, Sun M (2020) Word Representation In Representation learning for natural language processing. Springer, Singapore, pp:13–41, 2020.

  14. Sfakianaki P, Koumakis L, Sfakianakis S, Iatraki G, Zacharioudakis G, Graf N, Marias K, Tsiknakis M (2015) Semantic biomedical resource discovery: a Natural Language Processing framework. BMC Medical Informatics Decision Making 15:77

    Article  Google Scholar 

  15. Pyysalo S (2013) Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th international symposium on languages in biology and medicine, Tokyo, Japan, 2013, pp. 39–43.

  16. Rau LF (1991) Extracting company names from text”. The seventh IEEE conference on artificial intelligence application, proceedings. Florida: IEEE; 1991. vol 1. p. 29–32.

  17. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359

    Article  Google Scholar 

  18. Shah AM, Yan X, Shah SAA (2020) Mining patient opinion to evaluate the service quality in healthcare: a deep-learning approach. J Amb Intell Human Comput 11:2925–2942

    Article  Google Scholar 

  19. Krallinger M (2017) Overview of the Biocreative VI chemical-protein interaction track. In: Proceedings of the BioCreative VI workshop, Bethesda, MD, USA, 2017, pp.141–146.

  20. Sekine S (1998) Nyu: Description of the Japanese NE System Used For Met-2”, In Proceeding of Message Understanding Conference.

  21. Perera N, Dehmer M, Emmert-Streib F (2020) Named entity recognition and relation detection for biomedical information extraction. In Frontiers Cell Develop Biol 8:673

    Article  Google Scholar 

  22. Xiaodong L, Pengcheng H, Weizhu C, Jianfeng G “ Multitask deep neural networks for natural language understanding”, 2019, arXiv preprint arXiv:1901.11504.

  23. Shin Y and Lee S (2020) Learning context using segment-level LSTM for Neural Sequence Labeling”, In IEEE/ACM Transactions on Audio, Speech, and language processing, vol. 28, pp. 105–115, 2020.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Ramachandran.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramachandran, R., Arutchelvan, K. ArRaNER: A novel named entity recognition model for biomedical literature documents. J Supercomput 78, 16498–16511 (2022). https://doi.org/10.1007/s11227-022-04527-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04527-y

Keywords

Navigation