Skip to main content

Learning to Learn Biological Relations from a Small Training Set

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

  • 1752 Accesses

Abstract

In this paper we present different ways to improve a basic machine learning approach to identify relations between biological named entities as annotated in the Genia corpus.

The main difficulty with learning from the Genia event-annotated corpus is the small amount of examples that are available for each relation type. We compare different ways to address the data sparseness problem: using the corpus as the initial seed of a bootstrapping procedure, generalizing classes of relations via the Genia ontology and generalizing classes via clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., Salakoski, T.: A graph kernel for protein-protein interaction extraction. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, Columbus, Ohio, June 2008, pp. 1–9 (2008)

    Google Scholar 

  • Alex, B., Grover, C., Haddow, B., Kabadjov, M., Klein, E., Matthews, M., Tobin, R., Wang, X.: The ITI TXM corpora: Tissue expressions and protein-protein interactions. In: Proceedings of the Workshop on Building and Evaluating Resources for Biomedical Text Mining at the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  • Bunescu, R., Mooney, R.: Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 576–583 (June 2007)

    Google Scholar 

  • Bunescu, R.C., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative experiments on learning information extractors for proteins and their interactions. Artif. Intell. Med. 33(2), 139–155 (2005)

    Article  Google Scholar 

  • Carletta, J.: Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics 22(2), 249–254 (1996)

    Google Scholar 

  • Haddow, B.: Using automated feature optimisation to create an adaptable relation extraction system. In: Proceedings of BioNLP, Columbus, Ohio (2008)

    Google Scholar 

  • Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph. D. thesis, University of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

  • Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9(1) (2008)

    Google Scholar 

  • Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: Bioinfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 50 (2007)

    Article  Google Scholar 

  • Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  • Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: ACL 1995, Cambridge, MA, pp. 189–196. ACL (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alonso i Alemany, L., Bruno, S. (2009). Learning to Learn Biological Relations from a Small Training Set. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics