Learning to Learn Biological Relations from a Small Training Set

Alonso i Alemany, Laura; Bruno, Santiago

doi:10.1007/978-3-642-00382-0_34

Laura Alonso i Alemany¹⁷ &
Santiago Bruno¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1752 Accesses

Abstract

In this paper we present different ways to improve a basic machine learning approach to identify relations between biological named entities as annotated in the Genia corpus.

The main difficulty with learning from the Genia event-annotated corpus is the small amount of examples that are available for each relation type. We compare different ways to address the data sparseness problem: using the corpus as the initial seed of a bootstrapping procedure, generalizing classes of relations via the Genia ontology and generalizing classes via clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., Salakoski, T.: A graph kernel for protein-protein interaction extraction. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, Columbus, Ohio, June 2008, pp. 1–9 (2008)
Google Scholar
Alex, B., Grover, C., Haddow, B., Kabadjov, M., Klein, E., Matthews, M., Tobin, R., Wang, X.: The ITI TXM corpora: Tissue expressions and protein-protein interactions. In: Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining at the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar
Bunescu, R., Mooney, R.: Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 576–583 (June 2007)
Google Scholar
Bunescu, R.C., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative experiments on learning information extractors for proteins and their interactions. Artif. Intell. Med. 33(2), 139–155 (2005)
Article Google Scholar
Carletta, J.: Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics 22(2), 249–254 (1996)
Google Scholar
Haddow, B.: Using automated feature optimisation to create an adaptable relation extraction system. In: Proceedings of BioNLP, Columbus, Ohio (2008)
Google Scholar
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph. D. thesis, University of Waikato, Hamilton, New Zealand (1998)
Google Scholar
Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9(1) (2008)
Google Scholar
Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: Bioinfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 50 (2007)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: ACL 1995, Cambridge, MA, pp. 189–196. ACL (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

NLP Group Facultad de Matemática Astronomía y Física (FaMAF), UNC, Córdoba, Argentina
Laura Alonso i Alemany & Santiago Bruno

Authors

Laura Alonso i Alemany
View author publications
You can also search for this author in PubMed Google Scholar
Santiago Bruno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alonso i Alemany, L., Bruno, S. (2009). Learning to Learn Biological Relations from a Small Training Set. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-00382-0_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics