Skip to main content

Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Abstract

Named Entity Recognition (NER) from biomedical literature is crucial in biomedical knowledge base automation. In this paper, both empirical rule and statistical approaches to protein entity recognition are presented and investigated on a general corpus GENIA 3.02p and a new domain-specific corpus SRC. Experimental results show the rules derived from SRC are useful though they are simpler and more general than the one used by other rule-based approaches. Meanwhile, a concise HMM-based model with rich set of features is presented and proved to be robust and competitive while comparing it to other successful hybrid models. Besides, the resolution of coordination variants common in entities recognition is addressed. By applying heuristic rules and clustering strategy, the presented resolver is proved to be feasible.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Towards Information Extraction: identifying Protein Names from Biological Papers. In: The 3rd Pacific Symposium on Biocomputing, pp. 707–718 (1998)

    Google Scholar 

  2. Hou, W.J., Chen, H.H.: Enhancing Performance of Protein Name Recognizers using Collocation. In: ACL 2003, pp. 25–32 (2003)

    Google Scholar 

  3. Lee, K.J., Hwang, Y.S., Rim, H.C.: Two-Phase Biomedical NE Recognition based on SVMs. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)

    Google Scholar 

  4. Lin, Y., Tsai, T., Chiou, W., Wu, K., Sung, T.-Y., Hsu, W.L.: A Maximum Entropy Approach to Biomedical Named Entity Recognition. In: 4th Workshop on Data Mining in Bioinformatics (2004)

    Google Scholar 

  5. Olsson, F., Eriksson, G., Franzen, K., Asker, L., Liden, P.: Notions of Correctness when Evaluating Protein Name Taggers. In: 19th International Conference on Computational Linguistics, pp. 765–771 (2002)

    Google Scholar 

  6. Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), Geneva, Switzerland (2004)

    Google Scholar 

  7. Takeuchi, K., Collier, N.: Bio-Medical Entity Extraction using Support Vector Machines. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 57–64 (2003)

    Google Scholar 

  8. Tsuruoka, Y., Tsujii, J.: Boosting Precision and Recall of Dictionary-based Protein Name Recognition. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 41–48 (2003)

    Google Scholar 

  9. Zhou, G.D., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: 40th Annual Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  10. Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.L.: Recognizing Names in Biomedical Texts: A Machine Learning Approach. Bioinformatics 20, 1178–1190 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liang, T., Shih, PK. (2005). Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_6

Download citation

  • DOI: https://doi.org/10.1007/11428817_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26031-8

  • Online ISBN: 978-3-540-32110-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics