Skip to main content

MorphBen: A Neural Morphological Analyzer for Bengali Language

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

  • 343 Accesses

Abstract

Rule-based systems based on two-level morphology for tagging the morphological features of a word work quite well for Bengali language and are able to predict all possible morphological derivations for standard forms of words whose roots occur in the dictionary. However many words have multiple morphological derivations and the correct morphological derivation depends upon the context of the word. Non-dictionary words are also very frequent. Machine learning based methods have been used for predicting the values of morphological features of a word which take into account the context of the word. Although the machine learning systems to some extent can disambiguate the cases related to the words with multiple possible values, these systems needs to be improved to make more efficient use of the character-level information. Character-level information is particularly important for analysis of out-of-vocabulary (OOV) words which are not seen in the training data. We propose a method which makes use of both the context of the word as well as makes efficient use of the constituent characters of the words in order to develop a high quality morphological analyzer for Bengali. In this work we show that using character-level information along with the contextual information improves the performance of the morphological analyzer both for the OOV words and in predicting the correct analyses for the instances of the words that can have multiple morphological derivations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org

  2. Ali, M.N.Y., Al-Mamun, S.M.A., Das, J.K., Nurannabi, A.M.: Morphological analysis of bangla words for universal networking language. In: 2008 Third International Conference on Digital Information Management, pp. 532–537 (Nov 2008). https://doi.org/10.1109/ICDIM.2008.4746734

  3. Barik, B., Sarkar, S.: Pattern based pruning of morphological alternatives of bengali wordforms. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1724–1730 (2014). https://doi.org/10.1109/ICACCI.2014.6968551

  4. Bhattacharya, S., Choudhury, M., Sarkar, S., Basu, A.: Inflectional morphology synthesis for bengali noun, pronoun and verb systems. In: In Proceedings of the National Conference on Computer Processing of Bangla NCCPB, pp. 34–43 (2005)

    Google Scholar 

  5. Bohnet, B., McDonald, R., Simoes, G., Andor, D., Pitler, E., Maynez, J.: Morphosyntactic tagging with a meta-bilstm model over context sensitive token encodings. arXiv preprint arXiv:1805.08237 (2018)

  6. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  7. Chakrabarty, A., Garain, U.: Benlem (a bengali lemmatizer) and its role in WSD. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15(3), 12:1–12:18 (Feb 2016). https://doi.org/10.1145/2835494, http://doi.acm.org/10.1145/2835494

  8. Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the conll 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 20–30 (2017)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  10. Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659–1666. European Language Resources Association, Portorož, Slovenia (2016)

    Google Scholar 

  11. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Chair, N.C.C., et al., (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey (2012)

    Google Scholar 

  12. Smith, A., Bohnet, B., de Lhoneux, M., Nivre, J., Shao, Y., Stymne, S.: 82 treebanks, 34 models: universal dependency parsing with multi-treebank models. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 113–123. Association for Computational Linguistics (2018), http://aclweb.org/anthology/K18-2011

  13. Tkachenko, A., Sirts, K.: Modeling composite labels for neural morphological tagging. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 368–379. Association for Computational Linguistics (2018). http://aclweb.org/anthology/K18-1036

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayan Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Das, A., Sarkar, S. (2023). MorphBen: A Neural Morphological Analyzer for Bengali Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics