Incorporating Transformer Models for Sentiment Analysis and News Classification in Khmer

Rifat, Md Rifatul Islam; Imran, Abdullah Al

doi:10.1007/978-3-030-91434-9_10

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13116))

Included in the following conference series:

International Conference on Computational Data and Social Networks

828 Accesses
1 Citations

Abstract

In recent years, natural language modeling has achieved a major breakthrough with its sophisticated theoretical and technical advancements. Leveraging the power of deep learning, transformer models have created a disrupting impact in the domain of natural language processing. However, the benefits of such advancements are still inscribed between few highly resourced languages such as English, German, and French. Low-resourced language such as Khmer is still deprived of utilizing these advancements due to lack of technical support for this language. In this study, our objective is to apply the state-of-the-art language models within two empirical use cases such as Sentiment Analysis and News Classification in the Khmer language. To perform the classification tasks, we have employed FastText and BERT for extracting word embeddings and carried out three different type of experiments such as FastText, BERT feature-based, and BERT fine-tuning-based. A large text corpus including over 100,000 news articles has been used for pre-training the transformer model, BERT. The outcome of our experiment shows that in both of the use cases, a pre-trained and fine-tuned BERT model produces the outperforming results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Valy, D., Verleysen, M., Chhun, S., Burie, J.C.: Character and text recognition of Khmer historical palm leaf manuscripts. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 13–18. IEEE (2018). https://doi.org/10.1109/ICFHR-2018.2018.00012
Sangvat, S., Pluempitiwiriyawej, C.: Khmer POS tagging using conditional random fields. In: Hasida, K., Pa, W.P. (eds.) PACLING 2017. CCIS, vol. 781, pp. 169–178. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8438-6_14
Chapter Google Scholar
Nou, C., Kameyama, W.: Khmer POS tagger: a transformation-based approach with hybrid unknown word handling. In: International Conference on Semantic Computing (ICSC), pp. 482–492. IEEE (2007). https://doi.org/10.1109/ICSC.2007.104
Long, P., Boonjing, V.: Longest matching and rule-based techniques for Khmer word segmentation. In: 10th International Conference on Knowledge and Smart Technology (KST), pp. 80–83. IEEE (2018). https://doi.org/10.1109/KST.2018.8426109
Bi, N., Taing, N.: Khmer word segmentation based on bi-directional maximal matching for plaintext and microsoft word document. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific, pp. 1–9. IEEE (2014). https://doi.org/10.1109/APSIPA.2014.7041822
Chea, V., Thu, Y. K., Ding, C., Utiyama, M., Finch, A., Sumita, E.: Khmer word segmentation using conditional random fields. In: Khmer Natural Language Processing, pp. 62–69 (2015)
Google Scholar
Ning, S., Yan, X., Nuo, Y., Zhou, F., Xie, Q., Zhang, J.P.: Chinese-Khmer parallel fragments extraction from comparable corpus based on Dirichlet process. Procedia Comput. Sci. 166, 213–221 (2020)
Article Google Scholar
Koh Santepheap Daily. https://kohsantepheapdaily.com.kh/. Accessed 28 Aug 2020
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions: comparison of trends in practice and research for deep learning. CoRR, abs/1811.03378 (2018). http://arxiv.org/abs/1811.03378

Download references

Author information

Authors and Affiliations

Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
Md Rifatul Islam Rifat
American International University-Bangladesh, Dhaka, Bangladesh
Abdullah Al Imran

Authors

Md Rifatul Islam Rifat
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Al Imran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Central Florida, Orlando, FL, USA
David Mohaisen
Kent State University, Kent, OH, USA
Ruoming Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rifat, M.R.I., Imran, A.A. (2021). Incorporating Transformer Models for Sentiment Analysis and News Classification in Khmer. In: Mohaisen, D., Jin, R. (eds) Computational Data and Social Networks. CSoNet 2021. Lecture Notes in Computer Science(), vol 13116. Springer, Cham. https://doi.org/10.1007/978-3-030-91434-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-91434-9_10
Published: 04 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91433-2
Online ISBN: 978-3-030-91434-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incorporating Transformer Models for Sentiment Analysis and News Classification in Khmer