MorphBen: A Neural Morphological Analyzer for Bengali Language

Das, Ayan; Sarkar, Sudeshna

doi:10.1007/978-3-031-24337-0_42

Ayan Das⁸ &
Sudeshna Sarkar⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

343 Accesses

Abstract

Rule-based systems based on two-level morphology for tagging the morphological features of a word work quite well for Bengali language and are able to predict all possible morphological derivations for standard forms of words whose roots occur in the dictionary. However many words have multiple morphological derivations and the correct morphological derivation depends upon the context of the word. Non-dictionary words are also very frequent. Machine learning based methods have been used for predicting the values of morphological features of a word which take into account the context of the word. Although the machine learning systems to some extent can disambiguate the cases related to the words with multiple possible values, these systems needs to be improved to make more efficient use of the character-level information. Character-level information is particularly important for analysis of out-of-vocabulary (OOV) words which are not seen in the training data. We propose a method which makes use of both the context of the word as well as makes efficient use of the constituent characters of the words in order to develop a high quality morphological analyzer for Bengali. In this work we show that using character-level information along with the contextual information improves the performance of the morphological analyzer both for the OOV words and in predicting the correct analyses for the instances of the words that can have multiple morphological derivations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org
Ali, M.N.Y., Al-Mamun, S.M.A., Das, J.K., Nurannabi, A.M.: Morphological analysis of bangla words for universal networking language. In: 2008 Third International Conference on Digital Information Management, pp. 532–537 (Nov 2008). https://doi.org/10.1109/ICDIM.2008.4746734
Barik, B., Sarkar, S.: Pattern based pruning of morphological alternatives of bengali wordforms. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1724–1730 (2014). https://doi.org/10.1109/ICACCI.2014.6968551
Bhattacharya, S., Choudhury, M., Sarkar, S., Basu, A.: Inflectional morphology synthesis for bengali noun, pronoun and verb systems. In: In Proceedings of the National Conference on Computer Processing of Bangla NCCPB, pp. 34–43 (2005)
Google Scholar
Bohnet, B., McDonald, R., Simoes, G., Andor, D., Pitler, E., Maynez, J.: Morphosyntactic tagging with a meta-bilstm model over context sensitive token encodings. arXiv preprint arXiv:1805.08237 (2018)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Chakrabarty, A., Garain, U.: Benlem (a bengali lemmatizer) and its role in WSD. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15(3), 12:1–12:18 (Feb 2016). https://doi.org/10.1145/2835494, http://doi.acm.org/10.1145/2835494
Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the conll 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 20–30 (2017)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659–1666. European Language Resources Association, Portorož, Slovenia (2016)
Google Scholar
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Chair, N.C.C., et al., (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey (2012)
Google Scholar
Smith, A., Bohnet, B., de Lhoneux, M., Nivre, J., Shao, Y., Stymne, S.: 82 treebanks, 34 models: universal dependency parsing with multi-treebank models. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 113–123. Association for Computational Linguistics (2018), http://aclweb.org/anthology/K18-2011
Tkachenko, A., Sirts, K.: Modeling composite labels for neural morphological tagging. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 368–379. Association for Computational Linguistics (2018). http://aclweb.org/anthology/K18-1036

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, 721302, WB, India
Ayan Das & Sudeshna Sarkar

Authors

Ayan Das
View author publications
You can also search for this author in PubMed Google Scholar
Sudeshna Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayan Das .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, A., Sarkar, S. (2023). MorphBen: A Neural Morphological Analyzer for Bengali Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-24337-0_42
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MorphBen: A Neural Morphological Analyzer for Bengali Language