Skip to main content
Log in

Character-level inclusive transformer architecture for information gain in low resource code-mixed language

  • S.I.: 2020 India Intl. Congress on Computational Intelligence
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The use of code-mixed languages in social media platforms is very common to communicate in an informal way and has immense importance in a multilingual society, like India. Implementing various NLP tasks on code-mixed language for machine comprehension and NLP applications is the need of the hour. The implementation of complex learning models is difficult due to the scarcity of available code-mixed resources. Designing more effective architectures to perform learning from low resource dataset along with transfer learning settings are the possible solutions. We propose an improvised transformer network (Character Inclusion Transformer) that utilizes and learns character-level information available in the words of code-mixed sentences. The proposed model improves the performance of the transformer model when trained from scratch using low resource code-mixed datasets. We also propose two more architecture settings, useful for transfer learning strategy using the mBERT pre-trained model. Three basic word-level tagging NLP tasks, i.e., NER, POS Tagging, and Language Identification (LID) are considered in the paper where Language Identification is specific to code-mixed language. Six separate datasets, namely IIITH NER, LID FIRE, LID ICON, LID UD, POS ICON, POS UD, have been tested, and results are reported using weighted and macro-average while evaluating precision, recall and F1 score

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Aggarwal N, Kaur Randhawa A (2015) A survey on parts of speech tagging for Indian languages. Int J Comput Appl 975:8887

    MATH  Google Scholar 

  2. Aguilar G, AlGhamdi F, Soto V, Solorio T, Diab M, Hirschberg J (2018) Proceedings of the third workshop on computational approaches to linguistic code-switching. In: Proceedings of the third workshop on computational approaches to linguistic code-switching

  3. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  4. Bali K, Sharma J, Choudhury M, Vyas Y(2014) “I am borrowing ya mixing?” an analysis of English-Hindi code mixing in Facebook. In: Proceedings of the first workshop on computational approaches to code switching, pp 116–126

  5. Banerjee S, Moghe N, Arora S, Khapra MM (2018) A dataset for building code-mixed goal oriented conversation systems. arXiv preprint arXiv:1806.05997

  6. Barman U, Das A, Wagner J, Foster J (2014) Code mixing: a challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching, pp 13–23

  7. Bhargava R, Vamsi B, Sharma Y (2016) Named entity recognition for code mixing in Indian languages using hybrid approach. Facilities 23(10)

  8. Bhat IA, Bhat RA, Shrivastava M, Sharma DM (2018) Universal dependency parsing for Hindi-English code-switching. arXiv preprint arXiv:1804.05868

  9. Bhat IA, Shrivastava M, Bhat RA (2016) Code mixed entity extraction in Indian languages using neural networks. In: FIRE (Working Notes), pp 296–297

  10. Bhattu SN, Nunna SK, Somayajulu DVLN, Pradhan B (2020) Improving code-mixed POS tagging using code-mixed embeddings. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(4):1–31

    Article  Google Scholar 

  11. Bhowmick RS, Ganguli I, Paul J, Sil J (2021) Effectiveness of decoder transformer network in breaking low-resource real-time text captcha system. In: 2021 international conference on cyberworlds (CW), pp 287–290. IEEE

  12. Bhowmick RS, Ganguli I, Paul J, Sil J (2021) A multimodal deep framework for derogatory social media post identification of a recognized person. Trans Asian Low-Resour Lang Inf Process 21(1):1–19

    Google Scholar 

  13. Bhowmick RS, Ganguli I, Sil J (2020) Improvised transformer network for NER on low resource EnglishHindi code-mixed language from scratch. In: 2020 7th international conference on soft computing & machine intelligence (ISCMI), pp 265–269. IEEE

  14. Bhowmick RS, Ganguli I, Sil J (2020) Introduction and correction of Bengali-Hindi noise in large word vocabulary using RNN. In: 2020 international conference on communication and signal processing (ICCSP), pp 277–281. IEEE

  15. Bhowmick RS, Ghosh T, Singh A, Chakraborty S, Sil J (2021) Shallow learning for MTL in end-to-end RNN for basic sequence tagging. In: 2021 thirteenth international conference on contemporary computing (IC3-2021), pp 252–261

  16. Bhowmick RS, Sil J (2020) Memorizing and retrieving of text using recurrent neural network—a case study on Gitanjali dataset. In: Computational intelligence in pattern recognition, pp 413–422. Springer

  17. Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection. In: Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, pp 36–41

  18. Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80

    Article  MATH  Google Scholar 

  19. Chen Y, Feng J (2016) Improved symmetry method for the mobility of regular structures using graph products. J Struct Eng 142(9):04016051

    Article  MATH  Google Scholar 

  20. Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

  21. Correia A, Paredes H, Fonseca B (2018) Scientometric analysis of scientific publications in CSCW. Scientometrics 114(1):31–89

    Article  MATH  Google Scholar 

  22. Dabre R, Chu C, Kunchukuttan A (2020) A comprehensive survey of multilingual neural machine translation. ACM Comput Surv 53(5):38. https://doi.org/10.1145/3406095

    Article  Google Scholar 

  23. Dai Z, Callan J (2019) Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 985–988

  24. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240

  25. Dong D, Wu H, He W, Yu D, Wang H (2015) Multi-task learning for multiple language translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long Papers), pp 1723–1732

  26. Fan W, Chen Y, Li J, Sun Y, Feng J, Hassanin H, Sareh P (2021) Machine learning applied to the design and inspection of reinforced concrete bridges: resilient methods and emerging applications. Structures 33:3954–3963. https://doi.org/10.1016/j.istruc.2021.06.110

    Article  MATH  Google Scholar 

  27. Ganguli I, Bhowmick RS, Biswas S, Sil J (2021) Empirical auto-evaluation of python code for performance analysis of transformer network using T5 architecture. In: 2021 8th international conference on smart computing and communications (ICSCC), pp 75–79. IEEE

  28. Ganguli I, Bhowmick RS, Sil J (2021) Deep insights of erroneous Bengali-English code-mixed bilingual language. IETE J Res. https://doi.org/10.1080/03772063.2021.1934125

    Article  MATH  Google Scholar 

  29. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

  30. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  MATH  Google Scholar 

  31. Jamatia A, Das A, Gambäck B (2019) Deep learning-based language identification in English-Hindi-Bengali code-mixed social media corpora. J Intell Syst 28(3):399–408

    MATH  Google Scholar 

  32. Jamatia A, Gambäck B, Das A (2015) Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. Association for Computational Linguistics

  33. Jamatia A, Gambäck B, Das A (2016) Collecting and annotating Indian social media code-mixed corpora. In: International conference on intelligent text processing and computational linguistics. Springer, pp 406–417

  34. Jiang J (2012) Information extraction from text. Mining text data. Springer, Boston, MA, pp 11–41

    Chapter  MATH  Google Scholar 

  35. Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G et al (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351

  36. Kaveh A, Bakhshpoori T (2019) Metaheuristics: outlines, MATLAB codes and examples. Springer, Cham

    Book  MATH  Google Scholar 

  37. Khanuja S, Dandapat S, Sitaram S, Choudhury M (2020) A new dataset for natural language inference from code-mixed conversations. arXiv preprint arXiv:2004.05051

  38. Khanuja S, Dandapat S, Srinivasan A, Sitaram S, Choudhury M (2020) Gluecos: an evaluation benchmark for code-switched NLP. arXiv preprint arXiv:2004.12376

  39. Koehn P, Knowles R (2017) Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872

  40. Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70

    Article  MATH  Google Scholar 

  41. Mandal S, Singh AK (2018) Language identification in code-mixed data using multichannel neural networks and context capture. arXiv preprint arXiv:1808.07118

  42. Metzing D (2019) Frame conceptions and text understanding. De Gruyter, Berlin. https://doi.org/10.1515/9783110858778

    Book  MATH  Google Scholar 

  43. Molina G, AlGhamdi F, Ghoneim M, Hawwari A, Rey-Villamizar N, Diab M, Solorio T (2019) Overview for the second shared task on language identification in code-switched data. arXiv preprint arXiv:1909.13016

  44. Mujjiga S, Krishna V, Chakravarthi K, Vijayananda J (2019) Identifying semantics in clinical reports using neural machine translation. In: Proceedings of the conference on artificial intelligence, vol 33, pp 9552–9557

  45. Paul J, Bhowmick RS, Das B, Sikdar BK (2020) A smart home security system in low computing IoT environment. In: 2020 IEEE 17th India council international conference (INDICON), pp 1–7. IEEE

  46. Ratner A, Ré C (2018) Knowledge base construction in the machine-learning era: three critical design points: joint-learning, weak supervision, and new representations. Queue 16(3):79–90

    Article  MATH  Google Scholar 

  47. Roy RS, Choudhury M, Majumder P, Agarwal K (2013) Overview of the fire 2013 track on transliterated search. In: Post-proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, pp 1–7

  48. Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  49. Sen S, Gupta KK, Ekbal A, Bhattacharyya P (2018) IITP-MT at WAT2018: transformer-based multilingual indic-English neural machine translation system. In: Proceedings of the 32nd Pacific Asia conference on language, information and computation: 5th workshop on Asian translation: 5th workshop on Asian translation

  50. Sequiera R, Choudhury M, Bali K (2015) POS tagging of Hindi-English code mixed text from social media: some machine learning experiments. In: Proceedings of the 12th international conference on natural language processing, pp 237–246

  51. Sharma A, Gupta S, Motlani R, Bansal P, Srivastava M, Mamidi R, Sharma DM (2016) Shallow parsing pipeline for Hindi-English code-mixed social media text. arXiv preprint arXiv:1604.03136

  52. Sharma Y, Gupta S (2018) Deep learning approaches for question answering system. Procedia Comput Sci 132:785–794

    Article  MATH  Google Scholar 

  53. Singh K, Sen I, Kumaraguru P (2018) A Twitter corpus for Hindi-English code mixed POS tagging. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 12–17

  54. Singh V, Vijay D, Akhtar SS (2018) Named entity recognition for Hindi-English code-mixed social media text. In: Named entity recognition for Hindi-English code-mixed social media text. Association for Computational Linguistics, Melbourne, Australia, pp 27–35. https://doi.org/10.18653/v1/W18-2405

    Chapter  MATH  Google Scholar 

  55. Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875

    Article  Google Scholar 

  56. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  57. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In :Thirty-first AAAI conference on artificial intelligence

  58. Tay Y, Dehghani M, Bahri D, Metzler D(2020) Efficient transformers: a survey. arXiv preprint arXiv:2009.06732

  59. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  60. Vyas Y, Gella S, Sharma J, Bali K, Choudhury M (2014) POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 974–979

  61. Wang W, Pan SJ (2020) Integrating deep learning with logic fusion for information extraction. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 9225–9232

  62. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771

  63. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajat Subhra Bhowmick.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A tag wise result

Appendix A tag wise result

The precision, recall, and F1 score for each tag in all datasets are shown in the tables below for weighted and Macro-average along with respective support values. Tables 12, 13, 14, 15, 16 and 17 show the performance on POS ICON, POS UD, LID FIRE, LID ICON, LID UD, IIITH NER datasets, respectively.

Table 12 The comparison precision, recall and F1 score for each tag on the POS ICON dataset among the direct transformer and the proposed character-level inclusive transformer
Table 13 The comparison precision, recall and F1 score for each tag on the POS UD dataset among the direct transformer and the proposed character-level inclusive transformer
Table 14 The comparison precision, recall and F1 score for each tag on the LID FIRE dataset among the direct transformer and the proposed character-level inclusive transformer
Table 15 The comparison precision, recall and F1 score for each tag on the LID ICON dataset among the direct transformer and the proposed character-level inclusive transformer
Table 16 The comparison precision, recall and F1 score for each tag on the LID UD dataset among the direct transformer and the proposed character-level inclusive transformer
Table 17 The comparison precision, recall and F1 score for each tag on the IIITH NER dataset among the direct transformer and the proposed character-level inclusive transformer

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhowmick, R.S., Ganguli, I. & Sil, J. Character-level inclusive transformer architecture for information gain in low resource code-mixed language. Neural Comput & Applic 37, 559–577 (2025). https://doi.org/10.1007/s00521-022-06983-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-06983-2

Keywords