Skip to main content

SVM-Based Biological Named Entity Recognition Using Minimum Edit-Distance Feature Boosted by Virtual Examples

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Abstract

In this paper, we propose two independent solutions to the problems of spelling variants and the lack of annotated corpus, which are the main difficulties in SVM(Support-Vector Machine) and other machine-learning based biological named entity recognition. To resolve the problem of spelling variants, we propose the use of edit-distance as a feature for SVM. To resolve the lack-of-corpus problem, we propose the use of virtual examples, by which the annotated corpus can be automatically expanded in a fast, efficient and easy way. The experimental results show that the introduction of edit-distance produces some improvements. And the model, which is trained with the corpus expanded by virtual examples, outperforms the model trained with the original corpus. Finally, we achieved the high performance of 71.46 % in F-measure (64.03 % in precision, 80.84 % in recall) in the experiment of five categories named entity recognition on GENIA corpus (version 3.0).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yi, E., Lee, G.G., Park, S.: HMM-based protein name recognition with editdistance using automatically annotated corpus. In: Proceedings of the workshop on BioLINK text data mining SIG: Biology literature information and knowledge, ISMB 2003 (2003)

    Google Scholar 

  2. An, J., Lee, S., Lee, G.: Automatic acquisition of named entity tagged corpus from World Wide Web. In: Preceeding of ACL 2003 (2003)

    Google Scholar 

  3. Lee, K., Hwang, Y., Rim, H.: Two-phase biomedical NE recognition based on SVMs. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine (2003)

    Google Scholar 

  4. Yamamoto, K.: T,Kudo, A.Konagaya, Y.Matusmoto: Protein name tagging for biomedical annotation in text. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine (2003)

    Google Scholar 

  5. Niyogi, P., Girosi, F., Poggio, T.: Incorporating prior information in machine learning by creating virtual examples. Proceedings of IEEE 86, 2196–2207 (1998)

    Article  Google Scholar 

  6. Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the Association for Computer Machinery 21(1) (1974)

    Google Scholar 

  7. Ohta, T., Tateisi, Y., Kim, J., Mima, H., Tsujii, J.: The genia corpus: An annotated research abstract corpus in molecular biology domain. In: Proceedings of HLT 2002 (2002)

    Google Scholar 

  8. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  9. Tsuruoka, Y., Tsujii, J.: Boosting precision and recall of dictionary-based protein name recognition. In: Proceeding of ACL 2003 Workshop on Natural Language Processing in Biomedicine (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yi, E., Lee, G.G., Song, Y., Park, SJ. (2005). SVM-Based Biological Named Entity Recognition Using Minimum Edit-Distance Feature Boosted by Virtual Examples. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_86

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_86

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics