Machine Learning Models for Automatic Gene Ontology Annotation of Biological Texts

Jui, Jayati H.; Hauskrecht, Milos

doi:10.1007/978-3-031-34344-5_24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13897))

Included in the following conference series:

International Conference on Artificial Intelligence in Medicine

1707 Accesses

Abstract

Gene ontology (GO) is a major source of biological knowledge that describes the functions of genes and gene products using a comprehensive set of controlled vocabularies or terms organized in a hierarchical structure. Automatic annotation of biological texts using gene ontology (GO) terms gained the attention of the scientific community as it helps to quickly identify relevant documents or parts of text related to specific biological functions or processes. In this paper, we propose and investigate a new GO-term annotation strategy that uses a non-parametric k-nearest neighbor model and relies on various vector-based representations of documents and GO terms linked to these documents. Our vector representations are based on machine learning and natural language processing (NLP) models, including singular value decomposition, Word2Vec and topic-based scoring. We evaluate the performance of our model on a large benchmark corpus using a variety of standard and hierarchical evaluation metrics.

Supported by the Defense Advanced Research Projects Agency (DARPA) through Cooperative Agreement D20AC00002 awarded by the U.S. Department of the Interior, Interior Business Center. The content of the article does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

tESA: a distributional measure for calculating semantic relatedness

Article Open access 28 December 2016

GOTA: GO term annotation of biomedical literature

Article Open access 28 October 2015

GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms

Article Open access 10 October 2018

Notes

References

Arighi, C., et al.: Proceedings of the fourth biocreative challenge evaluation workshop (2013)
Google Scholar
Blaschke, C., Leon, E.A., Krallinger, M., Valencia, A.: Evaluation of biocreative assessment of task 2. BMC Bioinform. 6, 1–13 (2005)
Article Google Scholar
Camon, E.B., et al.: An evaluation of go annotation retrieval for biocreative and goa. BMC Bioinf. 6, 1–11 (2005)
Article Google Scholar
Chen, Y.D., Yang, C.J., Li, W.G., Huang, C.Y., Chiang, J.H., et al.: Gene ontology evidence sentence extraction and concept extraction: two rule-based approaches (2013)
Google Scholar
Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A.E., Albrecht, M., Falcão, A.O.: Mining go annotations for improving annotation consistency. PLoS ONE 7(7), e40519 (2012)
Article Google Scholar
Gobeill, J., Pasche, E., Vishnyakova, D., Ruch, P.: Closing the loop: from paper to protein annotation using supervised gene ontology classification. Database 2014 (2014)
Google Scholar
Lena, P.D., Domeniconi, G., Margara, L., Moro, G.: Gota: Go term annotation of biomedical literature. BMC Bioinform. 16, 1–13 (2015)
Article Google Scholar
Lu, Z., Hirschman, L.: Biocuration workflows and text mining: overview of the biocreative 2012 workshop track ii. Database 2012 (2012)
Google Scholar
Voorhees, E.M., Buckland, L.: Overview of the trec 2003 question answering track. In: TREC, vol. 2003, pp. 54–68 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pittsburgh, Pittsburgh, PA, 15260, USA
Jayati H. Jui & Milos Hauskrecht

Authors

Jayati H. Jui
View author publications
You can also search for this author in PubMed Google Scholar
Milos Hauskrecht
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayati H. Jui .

Editor information

Editors and Affiliations

University of Murcia, Murcia, Spain
Jose M. Juarez
Universitat Jaume I, Castellón de la Plana, Spain
Mar Marcos
University of Maribor, Maribor, Slovenia
Gregor Stiglic
Brunel University London, Uxbridge, UK
Allan Tucker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jui, J.H., Hauskrecht, M. (2023). Machine Learning Models for Automatic Gene Ontology Annotation of Biological Texts. In: Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A. (eds) Artificial Intelligence in Medicine. AIME 2023. Lecture Notes in Computer Science(), vol 13897. Springer, Cham. https://doi.org/10.1007/978-3-031-34344-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-34344-5_24
Published: 05 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34343-8
Online ISBN: 978-3-031-34344-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Machine Learning Models for Automatic Gene Ontology Annotation of Biological Texts