Skip to main content

Boosting-Based Ensemble Learning with Penalty Setting Profiles for Automatic Thai Unknown Word Recognition

  • Conference paper
Computational Collective Intelligence. Technologies and Applications (ICCCI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6422))

Included in the following conference series:

  • 1051 Accesses

Abstract

A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructing to cope with errors obtained from preceding steps. This paper presents an application of the boosting-based ensemble learning with penalty setting profiles on automatic unknown word recognition in Thai. Treating a sequential task as a non-sequential problem requires us to rank a set of generated candidates for a potential unknown word position. Since the correct candidate might not located at the highest rank among those candidates in the set, the proposed method provides penalties, in the form of a penalty setting profile, to improper ranking in order to reconstruct the succeeding classification model. In addition a number of alternative penalty setting profiles are introduced and their performances are compared on the task of extracting unknown words from a large Thai medical text. Using the naïve Bayes as the base classifier for ensemble learning, the proposed method achieves the accuracy of 89.24%, which is an improvement of 9.91%, 7.54%, 5.25% over conventional naïve Bayes, non-ensemble version, and flat penalty setting profile.

This work was partially funded by NECTEC of Thailand via research grant for Automatic Tagger for Named Entity in Thai News Corpus Project (NT-B-22-KE-38-52-01).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Charoenpornsawat, P.: et al.: Feature-based thai unknown word boundary identification using winnow. In: Proc. of APCCAS 1998, Chiang Mai, Thailand, pp. 547–550 (November 1998)

    Google Scholar 

  2. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittle, J., Roli, F. (eds.) Multiple Classifiers Systems, pp. 1–15. Springer, Heidelberg (2000)

    Google Scholar 

  3. Freund, Y., Schapire, R.E.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)

    Google Scholar 

  4. Haruechaiyasak, C., et al.: A collaborative framework for collecting thai unknown words from the web. In: Proc. of the COLING/ACL-2006, Sydney, Australia, pp. 345–352 (July 2006)

    Google Scholar 

  5. Kawtrakul, A., et al.: Automatic thai unknown word recognition. In: Proc. of NLPRS 1997, Phuket, Thailand, pp. 341–346 (October 1997)

    Google Scholar 

  6. Sornlertlamvanich, V., Tanaka, H.: The automatic extraction of open compounds from text. In: Proc. of COLING 1996, Copenhagen, Denmark, pp. 1143–1146 ( August 1996)

    Google Scholar 

  7. TeCho, J., et al.: A corpus-based approach for automatic thai unknown word recognition using boosting techniques. IEICE Transactions on Information and Systems E92-D(12), 2321–2333 (2009)

    Article  Google Scholar 

  8. Theeramunkong, T., et al.: Pattern-based features vs. statistical-based features in decision trees for word segmentation. IEICE Transactions on Information and Systems E87-D(5), 1254–1260 (2004)

    Google Scholar 

  9. Theeramunkong, T., et al.: A framework for constructing a thai medical knowledge base. In: Proc. of KICSS 2007, JAIST, Ishikawa, Japan, pp. 45–50 (November 2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

TeCho, J., Nattee, C., Theeramunkong, T. (2010). Boosting-Based Ensemble Learning with Penalty Setting Profiles for Automatic Thai Unknown Word Recognition. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2010. Lecture Notes in Computer Science(), vol 6422. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16732-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16732-4_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16731-7

  • Online ISBN: 978-3-642-16732-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics