Skip to main content

Technical Term Recognition with Semi-supervised Learning Using Hierarchical Bayesian Language Models

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

  • 2250 Accesses

Abstract

To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can’t solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM.

We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhou, G.D., Su, J.: Exploring deep knowledge resources in biomedical name recognition. In: JNLPBA 2004, pp. 96–99. ACL (2004)

    Google Scholar 

  2. Song, Y., Kim, E., Lee, G.G., Yi, B.-K.: Posbiotm-ner in the shared task of bionlp/nlpba 2004. In: JNLPBA 2004, pp. 100–103. ACL (2004)

    Google Scholar 

  3. Mochihashi, D., Sumita, E.: The infinite markov model. In: NIPS 2007, pp. 1–2 (2007)

    Google Scholar 

  4. Teh, Y.W.: A hierarchical bayesian language model based on pitman-yor processes. In: ACL, pp. 985–992 (2006)

    Google Scholar 

  5. Mochihashi, D., Yamada, T., Ueda, N.: Bayesian unsupervised word segmentation with hierachical language modeling. ACL 1(36), 49 (2009)

    Google Scholar 

  6. Zhao, S.: Name entity recognition in biomedical text using a hmm model. In: JNLPBA 2004, pp. 84–87. ACL (2004)

    Google Scholar 

  7. Lee, C., Hou, W.J., Chen, H.-H.: Annotating multiple types of biomedical entities: a single word classification approach. In: JNLPBA 2004, pp. 80–83. ACL (2004)

    Google Scholar 

  8. Park, K.-M., Kim, S.-H., Lee, K.-J., Lee, D.-G., Rim, H.-C.: Incorporating lexical knowledge into biomedical ne recognition. In: JNLPBA 20, pp. 76–79. ACL (2004)

    Google Scholar 

  9. Jiao, F., Wang, S., Lee, C.-H., Greiner, R., Schuurmans, D.: Semi-supervised conditional random fields for improved sequence segmentation and labeling. In: ACL, pp. 209–216 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fujii, R., Sakurai, A. (2012). Technical Term Recognition with Semi-supervised Learning Using Hierarchical Bayesian Language Models. In: Bouma, G., Ittoo, A., MĂ©tais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31178-9_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31177-2

  • Online ISBN: 978-3-642-31178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics