An Effective Biomedical Named Entity Recognition by Handling Imbalanced Data Sets Using Deep Learning and Rule-Based Methods

Archana, S. M.; Prakash, Jay; Singh, Pramod Kumar; Ahmed, Waquar

doi:10.1007/s42979-023-02068-6

An Effective Biomedical Named Entity Recognition by Handling Imbalanced Data Sets Using Deep Learning and Rule-Based Methods

Original Research
Published: 28 August 2023

Volume 4, article number 650, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

S. M. Archana¹,
Jay Prakash¹,
Pramod Kumar Singh² &
…
Waquar Ahmed³

160 Accesses
Explore all metrics

Abstract

Biomedical named entity recognition (Bio-NER) is an important and fundamental task for biomedical text mining. Bio-NER has various applications, especially in Natural Language Processing, and its performance heavily impacts dependent tasks. However, the class imbalance problem is the major issue for an effective Bio-NER. Over the past few years, rule-based methods and deep learning-based methods have been widely used. However, they are not able to effectively handle class imbalance issues. Therefore, we proposed a hybrid method by taking advantage of both methods for Bio-NER, which is named Hybrid Bio-NER (HBio-NER). Furthermore, we propose a training data format called ‘single_entity-O’, which reduces the class imbalance problem. Data preprocessing and ‘single_entity-O’ data representation format improve the class balance, further improving the HBio-NER system performance. HBio-NER identifies the named entities using the Bidirectional Long Short-Term Memory (Bi-LSTM) model and named entity boundaries (begin, inside of entity mentions) using rules. The effectiveness of the proposed method is tested on two data sets (NCBI disease and CHEMDNER chemical data sets) with different word representation models (GloVe, Word2Vec and FastText). The comparative analysis indicates that significant performance gain is achieved by the proposed method especially for disease named entity recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigation of Biomedical Named Entity Recognition Methods

Clinical Named Entity Recognition Using U-Net Classification Model

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

Article Open access 29 May 2019

Data Availability

The data that support the findings of this study are publicly available.

Notes

References

Luo L, Yang Z, Yang P, Zhang Y, Wang L, Lin H, Wang J. An attention-based bilstm-crf approach to document-level chemical named entity recognition. Bioinformatics. 2018;34(8):1381–8.
Article Google Scholar
Peng K, Yin C, Rong W, Lin C, Zhou D, Xiong Z. Named entity aware transfer learning for biomedical factoid question answering. IEEE/ACM Trans Comput Biol Bioinform. 2021;19:2365–76.
Article Google Scholar
Pereira A, Trifan A, Lopes RP, Oliveira JL. Systematic review of question answering over knowledge bases. IET Softw. 2022;16(1):1–13.
Article Google Scholar
Abacha AB, Chowdhury MFM, Karanasiou A, Mrabet Y, Lavelli A, Zweigenbaum P. Text mining for pharmacovigilance: using machine learning for drug name recognition and drug–drug interaction extraction and classification. J Biomed Inform. 2015;58:122–32.
Article Google Scholar
Pozi MSM, Azhar NA, Raziff ARA, Ajrina LH. Svgpm: evolving svm decision function by using genetic programming to solve imbalanced classification problem. Prog Artif Intell. 2022;11(1):65–77.
Article Google Scholar
Raghuwanshi BS, Shukla S. Classifying multiclass imbalanced data using generalized class-specific extreme learning machine. Progr Artif Intell. 2021;10(3):259–81.
Article Google Scholar
Akkasi A, Varoğlu E, Dimililer N. Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text. Appl Intell. 2018;48(8):1965–78.
Article Google Scholar
Akkasi A, Varoglu E. Improvement of chemical named entity recognition through sentence-based random under-sampling and classifier combination. J AI Data Min. 2019;7(2):311–9.
Google Scholar
Gliozzo AM, Giuliano C, Rinaldi R (2005) Instance pruning by filtering uninformative words: an information extraction case study. In: International conference on intelligent text processing and computational linguistics. Springer, pp 498–509
Wang X, Zhang Y, Ren X, Zhang Y, Zitnik M, Shang J, Langlotz C, Han J. Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics. 2019;35(10):1745–52.
Article Google Scholar
Goyal A, Gupta V, Kumar M. Recent named entity recognition and classification techniques: a systematic review. Comput Sci Rev. 2018;29:21–43.
Article Google Scholar
Yoon W, So CH, Lee J, Kang J. Collabonet: collaboration of deep neural networks for biomedical named entity recognition. BMC Bioinform. 2019;20(10):55–65.
Google Scholar
Li J, Sun A, Han J, Li C. A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng. 2020;34(1):50–70.
Article Google Scholar
Lee K-J, Hwang Y-S, Rim HC (2003) Two-phase biomedical ne recognition based on svms. In: Proceedings of the ACL 2003 workshop on natural language processing in biomedicine, pp 33–40
Zhao S (2004) Named entity recognition in biomedical texts using an hmm model. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (NLPBA/BioNLP), pp 87–90
Song Y, Kim E, Lee GG, Yi B-K (2004) Posbiotm-ner in the shared task of bionlp/nlpba2004. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (NLPBA/BioNLP), pp 103–106
McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons
Tian Y, Shen W, Song Y, Xia F, He M, Li K. Improving biomedical named entity recognition with syntactic information. BMC Bioinform. 2020;21(1):1–17.
Article Google Scholar
Liao Z, Wu H (2012) Biomedical named entity recognition based on skip-chain crfs. In: 2012 International conference on industrial control and electronics engineering. IEEE, pp 1495–1498
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991
Suárez-Paniagua V, Zavala RMR, Segura-Bedmar I, Martínez P. A two-stage deep learning approach for extracting entities and relationships from medical texts. J Biomed Inform. 2019;99: 103285.
Article Google Scholar
Yu G, Yang Y, Wang X, Zhen H, He G, Li Z, Zhao Y, Shu Q, Shu L. Adversarial active learning for the identification of medical concepts and annotation inconsistency. J Biomed Inform. 2020;108: 103481.
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Article MATH Google Scholar
Grancharova M, Berg H, Dalianis H (2020) Improving named entity recognition and classification in class imbalanced Swedish electronic patient records through resampling. In: Eighth Swedish language technology conference (SLTC). Förlag Göteborgs Universitet
Karia V, Zhang W, Naeim A, Ramezani R (2019) Gensample: a genetic algorithm for oversampling in imbalanced datasets. arXiv preprint arXiv:1910.10806
Chakraborty A, Ghosh KK, De R, Cuevas E, Sarkar R. Learning automata based particle swarm optimization for solving class imbalance problem. Appl Soft Comput. 2021;113: 107959.
Article Google Scholar
Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98), New York City, New York, USA, August 27–31, 1998, pp 73–79
Mikolov, T., Chen, K., Corrado, G., Dean, J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2017;5:135–46.
Article Google Scholar
Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.
MATH Google Scholar
Doğan RI, Leaman R, Lu Z. Ncbi disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform. 2014;47:1–10.
Article Google Scholar
Cho H, Lee H. Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinform. 2019;20:1–11.
Article Google Scholar
Zuo M, Zhang Y. Dataset-aware multi-task learning approaches for biomedical named entity recognition. Bioinformatics. 2020;36(15):4331–8.
Article Google Scholar
Zhu Q, Li X, Conesa A, Pereira C. Gram-cnn: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics. 2018;34(9):1547–54.
Article Google Scholar
Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics. 2017;33(14):37–48.
Article Google Scholar
Limsopatham N, Collier N (2016) Learning orthographic features in bi-directional lstm for biomedical named entity recognition. In: Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM2016), pp 10–19
Korvigo I, Holmatov M, Zaikovskii A, Skoblov M. Putting hands to rest: efficient deep cnn-rnn architecture for chemical named entity recognition with no hand-crafted rules. J Cheminform. 2018;10(1):1–10.
Article Google Scholar
Leaman R, Wei C-H, Lu Z. tmchem: a high performance approach for chemical named entity recognition and normalization. J Cheminform. 2015;7(1):1–10.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, National Institute of Technology, Calicut, Kerala, India
S. M. Archana & Jay Prakash
Computer Science and Engineering, ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India
Pramod Kumar Singh
Electronics and Communication Engineering, National Institute of Technology, Calicut, Kerala, India
Waquar Ahmed

Authors

S. M. Archana
View author publications
You can also search for this author in PubMed Google Scholar
Jay Prakash
View author publications
You can also search for this author in PubMed Google Scholar
Pramod Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Waquar Ahmed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Archana.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Archana, S.M., Prakash, J., Singh, P.K. et al. An Effective Biomedical Named Entity Recognition by Handling Imbalanced Data Sets Using Deep Learning and Rule-Based Methods. SN COMPUT. SCI. 4, 650 (2023). https://doi.org/10.1007/s42979-023-02068-6

Download citation

Received: 06 March 2023
Accepted: 19 June 2023
Published: 28 August 2023
DOI: https://doi.org/10.1007/s42979-023-02068-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Effective Biomedical Named Entity Recognition by Handling Imbalanced Data Sets Using Deep Learning and Rule-Based Methods

Abstract

Access this article

Similar content being viewed by others

Investigation of Biomedical Named Entity Recognition Methods

Clinical Named Entity Recognition Using U-Net Classification Model

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Effective Biomedical Named Entity Recognition by Handling Imbalanced Data Sets Using Deep Learning and Rule-Based Methods

Abstract

Access this article

Similar content being viewed by others

Investigation of Biomedical Named Entity Recognition Methods

Clinical Named Entity Recognition Using U-Net Classification Model

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation