Supervised Machine Learning Approach for Bio-molecular Event Extraction

Ekbal, Asif; Majumder, Amit; Hasanuzzaman, Mohammad; Saha, Sriparna

doi:10.1007/978-3-642-27242-4_27

Asif Ekbal²⁰,
Amit Majumder²¹,
Mohammad Hasanuzzaman²² &
…
Sriparna Saha²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7077))

Included in the following conference series:

International Conference on Swarm, Evolutionary, and Memetic Computing

1618 Accesses

Abstract

The main goal of biomedical text mining is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities such as proteins and genes. Most of the research in the related areas were focused on extracting only binary relations. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose a supervised approach that enables extraction, i.e. identification and classification of relatively complex bio-molecular events. We approach this as the supervised machine learning problems and use the well-known statistical algorithm, namely Conditional Random Field (CRF) that makes use of statistical and linguistic features that represent various morphological, syntactic and contextual information of the candidate bio-molecular trigger words. Firstly, we consider the problem of event identification and classification as a two-step process, first step of which deals with the event identification task and the second step classifies these identified events to one of the nine predefined classes. Thereafter, we perform event identification and classification together. Three-fold cross validation experiments on the Biomedical Natural Language Processing (BioNLP) 2009 shared task datasets yield the overall average recall, precision and F-measure values of 58.88%, 74.53% and 65.79%, respectively, for the event identification. We observed the overall classification accuracy of 59.34%. Evaluation results of the proposed approach when identification and classification are performed together showed the overall recall, precision and F-measure values of 59.92%, 54.25% and 56.94%, respectively.

All authors equally contributed for the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nedellec, C.: Learning Language in Logic -Genic Interaction Extraction Challenge. In: Cussens, J., Nedellec, C. (eds.) Proceedings of the 4th Learning Language in Logic Workshop (LLL 2005), pp. 31–37 (2005)
Google Scholar
Hirschman, L., Krallinger, M., Valencia, A. (eds.): Proceedings of the Second BioCreative Challenge Evaluation Workshop. CNIO Centro Nacional de Investigaciones Oncologicas (2007)
Google Scholar
Chatr-aryamontri, A., Ceol, A., Palazzi, L.M., Nardelli, G., Schneider, M.V., Castagnoli, L., Cesareni, G.: MINT: the Molecular INTeraction database. Nucleic Acids Research 35(suppl. 1), D572–D574 (2007)
Article Google Scholar
Kim, J.-D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 shared task on event extraction. In: BioNLP 2009: Proceedings of the Workshop on BioNLP, pp. 1–9 (2009)
Google Scholar
Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jarvinen, J., Salakoski, T.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 50 (2007)
Article Google Scholar
Kim, J.-D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
Asif Ekbal & Sriparna Saha
Academy of Technology, Kolkata, India
Amit Majumder
WBIDCL, Kolkata, India
Mohammad Hasanuzzaman

Authors

Asif Ekbal
View author publications
You can also search for this author in PubMed Google Scholar
Amit Majumder
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Hasanuzzaman
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering, IIT, Delhi, India
Bijaya Ketan Panigrahi
School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore
Ponnuthurai Nagaratnam Suganthan
Department of Electronics and Telecommunications, Jadavpur University, 700032, Kolkata, India
Swagatam Das
ANITS, Visakhapatnam, India
Suresh Chandra Satapathy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekbal, A., Majumder, A., Hasanuzzaman, M., Saha, S. (2011). Supervised Machine Learning Approach for Bio-molecular Event Extraction. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2011. Lecture Notes in Computer Science, vol 7077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27242-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-27242-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27241-7
Online ISBN: 978-3-642-27242-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics