Abstract
In this paper we present different ways to improve a basic machine learning approach to identify relations between biological named entities as annotated in the Genia corpus.
The main difficulty with learning from the Genia event-annotated corpus is the small amount of examples that are available for each relation type. We compare different ways to address the data sparseness problem: using the corpus as the initial seed of a bootstrapping procedure, generalizing classes of relations via the Genia ontology and generalizing classes via clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., Salakoski, T.: A graph kernel for protein-protein interaction extraction. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, Columbus, Ohio, June 2008, pp. 1–9 (2008)
Alex, B., Grover, C., Haddow, B., Kabadjov, M., Klein, E., Matthews, M., Tobin, R., Wang, X.: The ITI TXM corpora: Tissue expressions and protein-protein interactions. In: Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining at the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Bunescu, R., Mooney, R.: Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 576–583 (June 2007)
Bunescu, R.C., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative experiments on learning information extractors for proteins and their interactions. Artif. Intell. Med. 33(2), 139–155 (2005)
Carletta, J.: Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics 22(2), 249–254 (1996)
Haddow, B.: Using automated feature optimisation to create an adaptable relation extraction system. In: Proceedings of BioNLP, Columbus, Ohio (2008)
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph. D. thesis, University of Waikato, Hamilton, New Zealand (1998)
Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9(1) (2008)
Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: Bioinfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 50 (2007)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: ACL 1995, Cambridge, MA, pp. 189–196. ACL (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alonso i Alemany, L., Bruno, S. (2009). Learning to Learn Biological Relations from a Small Training Set. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-00382-0_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)