Skip to main content

Word Boundary Identification for Myanmar Text Using Conditional Random Fields

  • Conference paper
  • First Online:
Genetic and Evolutionary Computing (GEC 2015)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 388))

Included in the following conference series:

  • International Conference on Genetic and Evolutionary Computing
  • 1656 Accesses

Abstract

This paper examines the effectiveness of conditional random fields (CRFs) when used to identify Myanmar word boundaries within a supervised framework. Existing approaches are based on the method of maximum matching which appears to suffer from problems relating to the manner in which Myanmar words are composed. In our experiments, the CRF approach is compared against a baseline based on maximum matching using dictionaries from the Myanmar Language Commission Dictionary (word only) and a manually segmented subset of the BTEC1 corpus. The experimental results show that the CRF model is able to achieve considerably higher F-scores on the segmentation task than the baseline, even when the baseline is allowed to use words from the test data in its dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Pa, W.P., Thein, N.L.: Myanmar Word Segmentation using Hybrid Approach. In: Proceedings of 6th International Conference on Computer Applications, Yangon, Myanmar, pp. 166–170 (2008)

    Google Scholar 

  2. Kikui, G., Yamamoto, S., Takezawa, T., Sumita, E.: Comparative study on corpora for speech translation. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1674–1682 (2006)

    Google Scholar 

  3. Thu, Y.K., Finch, A., Sagisaka, Y., Sumita, E.: A Study of Myanmar Word Segmentation Schemes for Statistical Machine Translation. In: Proceedings of 12th International Conference on Computer Applications, Yangon, Myanmar, pp. 167–179 (2014)

    Google Scholar 

  4. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th International Conf. on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  5. Thet, T.T., Na, J.-C., Ko, W.K.: Word Segmentation for the Myanmar language. Journal of Information Science 34(5), 688–704 (2008)

    Google Scholar 

  6. Htay, H.H., Murthy, K.N.: Myanmar Word Segmentation Using Syllable Level Longst Matching, the 6th Workshop on Asian Language. Resources 2008, 41–48 (2008)

    Google Scholar 

  7. Liu, Y., Tan, Q., Shen, K.X.: The Word Segmentation Methods for Chinese Information Processing. Quing Hua University Press and Guang Xi Science and Technology Press, 36 (1994) (in Chinese)

    Google Scholar 

  8. Myanmar English Dictionary, Myanmar Language Commission, Myanmar, 2012 Edition

    Google Scholar 

  9. Myanmar Grammar, Myanmar Language Commission, Myanmar (2000)

    Google Scholar 

  10. Taku Kudo: CRF++ An open source toolkit for CRF (2005). http://crfpp.sourceforge.net/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Win Pa Pa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pa, W.P., Thu, Y.K., Finch, A., Sumita, E. (2016). Word Boundary Identification for Myanmar Text Using Conditional Random Fields. In: Zin, T., Lin, JW., Pan, JS., Tin, P., Yokota, M. (eds) Genetic and Evolutionary Computing. GEC 2015. Advances in Intelligent Systems and Computing, vol 388. Springer, Cham. https://doi.org/10.1007/978-3-319-23207-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23207-2_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23206-5

  • Online ISBN: 978-3-319-23207-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics