Abstract
The use of code-mixed languages in social media platforms is very common to communicate in an informal way and has immense importance in a multilingual society, like India. Implementing various NLP tasks on code-mixed language for machine comprehension and NLP applications is the need of the hour. The implementation of complex learning models is difficult due to the scarcity of available code-mixed resources. Designing more effective architectures to perform learning from low resource dataset along with transfer learning settings are the possible solutions. We propose an improvised transformer network (Character Inclusion Transformer) that utilizes and learns character-level information available in the words of code-mixed sentences. The proposed model improves the performance of the transformer model when trained from scratch using low resource code-mixed datasets. We also propose two more architecture settings, useful for transfer learning strategy using the mBERT pre-trained model. Three basic word-level tagging NLP tasks, i.e., NER, POS Tagging, and Language Identification (LID) are considered in the paper where Language Identification is specific to code-mixed language. Six separate datasets, namely IIITH NER, LID FIRE, LID ICON, LID UD, POS ICON, POS UD, have been tested, and results are reported using weighted and macro-average while evaluating precision, recall and F1 score





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal N, Kaur Randhawa A (2015) A survey on parts of speech tagging for Indian languages. Int J Comput Appl 975:8887
Aguilar G, AlGhamdi F, Soto V, Solorio T, Diab M, Hirschberg J (2018) Proceedings of the third workshop on computational approaches to linguistic code-switching. In: Proceedings of the third workshop on computational approaches to linguistic code-switching
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Bali K, Sharma J, Choudhury M, Vyas Y(2014) “I am borrowing ya mixing?” an analysis of English-Hindi code mixing in Facebook. In: Proceedings of the first workshop on computational approaches to code switching, pp 116–126
Banerjee S, Moghe N, Arora S, Khapra MM (2018) A dataset for building code-mixed goal oriented conversation systems. arXiv preprint arXiv:1806.05997
Barman U, Das A, Wagner J, Foster J (2014) Code mixing: a challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching, pp 13–23
Bhargava R, Vamsi B, Sharma Y (2016) Named entity recognition for code mixing in Indian languages using hybrid approach. Facilities 23(10)
Bhat IA, Bhat RA, Shrivastava M, Sharma DM (2018) Universal dependency parsing for Hindi-English code-switching. arXiv preprint arXiv:1804.05868
Bhat IA, Shrivastava M, Bhat RA (2016) Code mixed entity extraction in Indian languages using neural networks. In: FIRE (Working Notes), pp 296–297
Bhattu SN, Nunna SK, Somayajulu DVLN, Pradhan B (2020) Improving code-mixed POS tagging using code-mixed embeddings. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(4):1–31
Bhowmick RS, Ganguli I, Paul J, Sil J (2021) Effectiveness of decoder transformer network in breaking low-resource real-time text captcha system. In: 2021 international conference on cyberworlds (CW), pp 287–290. IEEE
Bhowmick RS, Ganguli I, Paul J, Sil J (2021) A multimodal deep framework for derogatory social media post identification of a recognized person. Trans Asian Low-Resour Lang Inf Process 21(1):1–19
Bhowmick RS, Ganguli I, Sil J (2020) Improvised transformer network for NER on low resource EnglishHindi code-mixed language from scratch. In: 2020 7th international conference on soft computing & machine intelligence (ISCMI), pp 265–269. IEEE
Bhowmick RS, Ganguli I, Sil J (2020) Introduction and correction of Bengali-Hindi noise in large word vocabulary using RNN. In: 2020 international conference on communication and signal processing (ICCSP), pp 277–281. IEEE
Bhowmick RS, Ghosh T, Singh A, Chakraborty S, Sil J (2021) Shallow learning for MTL in end-to-end RNN for basic sequence tagging. In: 2021 thirteenth international conference on contemporary computing (IC3-2021), pp 252–261
Bhowmick RS, Sil J (2020) Memorizing and retrieving of text using recurrent neural network—a case study on Gitanjali dataset. In: Computational intelligence in pattern recognition, pp 413–422. Springer
Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection. In: Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, pp 36–41
Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80
Chen Y, Feng J (2016) Improved symmetry method for the mobility of regular structures using graph products. J Struct Eng 142(9):04016051
Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Correia A, Paredes H, Fonseca B (2018) Scientometric analysis of scientific publications in CSCW. Scientometrics 114(1):31–89
Dabre R, Chu C, Kunchukuttan A (2020) A comprehensive survey of multilingual neural machine translation. ACM Comput Surv 53(5):38. https://doi.org/10.1145/3406095
Dai Z, Callan J (2019) Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 985–988
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240
Dong D, Wu H, He W, Yu D, Wang H (2015) Multi-task learning for multiple language translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long Papers), pp 1723–1732
Fan W, Chen Y, Li J, Sun Y, Feng J, Hassanin H, Sareh P (2021) Machine learning applied to the design and inspection of reinforced concrete bridges: resilient methods and emerging applications. Structures 33:3954–3963. https://doi.org/10.1016/j.istruc.2021.06.110
Ganguli I, Bhowmick RS, Biswas S, Sil J (2021) Empirical auto-evaluation of python code for performance analysis of transformer network using T5 architecture. In: 2021 8th international conference on smart computing and communications (ICSCC), pp 75–79. IEEE
Ganguli I, Bhowmick RS, Sil J (2021) Deep insights of erroneous Bengali-English code-mixed bilingual language. IETE J Res. https://doi.org/10.1080/03772063.2021.1934125
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Jamatia A, Das A, Gambäck B (2019) Deep learning-based language identification in English-Hindi-Bengali code-mixed social media corpora. J Intell Syst 28(3):399–408
Jamatia A, Gambäck B, Das A (2015) Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. Association for Computational Linguistics
Jamatia A, Gambäck B, Das A (2016) Collecting and annotating Indian social media code-mixed corpora. In: International conference on intelligent text processing and computational linguistics. Springer, pp 406–417
Jiang J (2012) Information extraction from text. Mining text data. Springer, Boston, MA, pp 11–41
Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G et al (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351
Kaveh A, Bakhshpoori T (2019) Metaheuristics: outlines, MATLAB codes and examples. Springer, Cham
Khanuja S, Dandapat S, Sitaram S, Choudhury M (2020) A new dataset for natural language inference from code-mixed conversations. arXiv preprint arXiv:2004.05051
Khanuja S, Dandapat S, Srinivasan A, Sitaram S, Choudhury M (2020) Gluecos: an evaluation benchmark for code-switched NLP. arXiv preprint arXiv:2004.12376
Koehn P, Knowles R (2017) Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70
Mandal S, Singh AK (2018) Language identification in code-mixed data using multichannel neural networks and context capture. arXiv preprint arXiv:1808.07118
Metzing D (2019) Frame conceptions and text understanding. De Gruyter, Berlin. https://doi.org/10.1515/9783110858778
Molina G, AlGhamdi F, Ghoneim M, Hawwari A, Rey-Villamizar N, Diab M, Solorio T (2019) Overview for the second shared task on language identification in code-switched data. arXiv preprint arXiv:1909.13016
Mujjiga S, Krishna V, Chakravarthi K, Vijayananda J (2019) Identifying semantics in clinical reports using neural machine translation. In: Proceedings of the conference on artificial intelligence, vol 33, pp 9552–9557
Paul J, Bhowmick RS, Das B, Sikdar BK (2020) A smart home security system in low computing IoT environment. In: 2020 IEEE 17th India council international conference (INDICON), pp 1–7. IEEE
Ratner A, Ré C (2018) Knowledge base construction in the machine-learning era: three critical design points: joint-learning, weak supervision, and new representations. Queue 16(3):79–90
Roy RS, Choudhury M, Majumder P, Agarwal K (2013) Overview of the fire 2013 track on transliterated search. In: Post-proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, pp 1–7
Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge
Sen S, Gupta KK, Ekbal A, Bhattacharyya P (2018) IITP-MT at WAT2018: transformer-based multilingual indic-English neural machine translation system. In: Proceedings of the 32nd Pacific Asia conference on language, information and computation: 5th workshop on Asian translation: 5th workshop on Asian translation
Sequiera R, Choudhury M, Bali K (2015) POS tagging of Hindi-English code mixed text from social media: some machine learning experiments. In: Proceedings of the 12th international conference on natural language processing, pp 237–246
Sharma A, Gupta S, Motlani R, Bansal P, Srivastava M, Mamidi R, Sharma DM (2016) Shallow parsing pipeline for Hindi-English code-mixed social media text. arXiv preprint arXiv:1604.03136
Sharma Y, Gupta S (2018) Deep learning approaches for question answering system. Procedia Comput Sci 132:785–794
Singh K, Sen I, Kumaraguru P (2018) A Twitter corpus for Hindi-English code mixed POS tagging. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 12–17
Singh V, Vijay D, Akhtar SS (2018) Named entity recognition for Hindi-English code-mixed social media text. In: Named entity recognition for Hindi-English code-mixed social media text. Association for Computational Linguistics, Melbourne, Australia, pp 27–35. https://doi.org/10.18653/v1/W18-2405
Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In :Thirty-first AAAI conference on artificial intelligence
Tay Y, Dehghani M, Bahri D, Metzler D(2020) Efficient transformers: a survey. arXiv preprint arXiv:2009.06732
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vyas Y, Gella S, Sharma J, Bali K, Choudhury M (2014) POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 974–979
Wang W, Pan SJ (2020) Integrating deep learning with logic fusion for information extraction. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 9225–9232
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A tag wise result
Appendix A tag wise result
The precision, recall, and F1 score for each tag in all datasets are shown in the tables below for weighted and Macro-average along with respective support values. Tables 12, 13, 14, 15, 16 and 17 show the performance on POS ICON, POS UD, LID FIRE, LID ICON, LID UD, IIITH NER datasets, respectively.
Rights and permissions
About this article
Cite this article
Bhowmick, R.S., Ganguli, I. & Sil, J. Character-level inclusive transformer architecture for information gain in low resource code-mixed language. Neural Comput & Applic 37, 559–577 (2025). https://doi.org/10.1007/s00521-022-06983-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-06983-2