Character-level inclusive transformer architecture for information gain in low resource code-mixed language

Bhowmick, Rajat Subhra; Ganguli, Isha; Sil, Jaya

doi:10.1007/s00521-022-06983-2

Character-level inclusive transformer architecture for information gain in low resource code-mixed language

S.I.: 2020 India Intl. Congress on Computational Intelligence
Published: 09 March 2022

Volume 37, pages 559–577, (2025)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

411 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The use of code-mixed languages in social media platforms is very common to communicate in an informal way and has immense importance in a multilingual society, like India. Implementing various NLP tasks on code-mixed language for machine comprehension and NLP applications is the need of the hour. The implementation of complex learning models is difficult due to the scarcity of available code-mixed resources. Designing more effective architectures to perform learning from low resource dataset along with transfer learning settings are the possible solutions. We propose an improvised transformer network (Character Inclusion Transformer) that utilizes and learns character-level information available in the words of code-mixed sentences. The proposed model improves the performance of the transformer model when trained from scratch using low resource code-mixed datasets. We also propose two more architecture settings, useful for transfer learning strategy using the mBERT pre-trained model. Three basic word-level tagging NLP tasks, i.e., NER, POS Tagging, and Language Identification (LID) are considered in the paper where Language Identification is specific to code-mixed language. Six separate datasets, namely IIITH NER, LID FIRE, LID ICON, LID UD, POS ICON, POS UD, have been tested, and results are reported using weighted and macro-average while evaluating precision, recall and F1 score

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvised Neural Machine Translation Model for Hinglish to English

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

Classification of Code-Mixed Tamil Text Using Deep Learning Algorithms

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal N, Kaur Randhawa A (2015) A survey on parts of speech tagging for Indian languages. Int J Comput Appl 975:8887
MATH Google Scholar
Aguilar G, AlGhamdi F, Soto V, Solorio T, Diab M, Hirschberg J (2018) Proceedings of the third workshop on computational approaches to linguistic code-switching. In: Proceedings of the third workshop on computational approaches to linguistic code-switching
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Bali K, Sharma J, Choudhury M, Vyas Y(2014) “I am borrowing ya mixing?” an analysis of English-Hindi code mixing in Facebook. In: Proceedings of the first workshop on computational approaches to code switching, pp 116–126
Banerjee S, Moghe N, Arora S, Khapra MM (2018) A dataset for building code-mixed goal oriented conversation systems. arXiv preprint arXiv:1806.05997
Barman U, Das A, Wagner J, Foster J (2014) Code mixing: a challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching, pp 13–23
Bhargava R, Vamsi B, Sharma Y (2016) Named entity recognition for code mixing in Indian languages using hybrid approach. Facilities 23(10)
Bhat IA, Bhat RA, Shrivastava M, Sharma DM (2018) Universal dependency parsing for Hindi-English code-switching. arXiv preprint arXiv:1804.05868
Bhat IA, Shrivastava M, Bhat RA (2016) Code mixed entity extraction in Indian languages using neural networks. In: FIRE (Working Notes), pp 296–297
Bhattu SN, Nunna SK, Somayajulu DVLN, Pradhan B (2020) Improving code-mixed POS tagging using code-mixed embeddings. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(4):1–31
Article Google Scholar
Bhowmick RS, Ganguli I, Paul J, Sil J (2021) Effectiveness of decoder transformer network in breaking low-resource real-time text captcha system. In: 2021 international conference on cyberworlds (CW), pp 287–290. IEEE
Bhowmick RS, Ganguli I, Paul J, Sil J (2021) A multimodal deep framework for derogatory social media post identification of a recognized person. Trans Asian Low-Resour Lang Inf Process 21(1):1–19
Google Scholar
Bhowmick RS, Ganguli I, Sil J (2020) Improvised transformer network for NER on low resource EnglishHindi code-mixed language from scratch. In: 2020 7th international conference on soft computing & machine intelligence (ISCMI), pp 265–269. IEEE
Bhowmick RS, Ganguli I, Sil J (2020) Introduction and correction of Bengali-Hindi noise in large word vocabulary using RNN. In: 2020 international conference on communication and signal processing (ICCSP), pp 277–281. IEEE
Bhowmick RS, Ghosh T, Singh A, Chakraborty S, Sil J (2021) Shallow learning for MTL in end-to-end RNN for basic sequence tagging. In: 2021 thirteenth international conference on contemporary computing (IC3-2021), pp 252–261
Bhowmick RS, Sil J (2020) Memorizing and retrieving of text using recurrent neural network—a case study on Gitanjali dataset. In: Computational intelligence in pattern recognition, pp 413–422. Springer
Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection. In: Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, pp 36–41
Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80
Article MATH Google Scholar
Chen Y, Feng J (2016) Improved symmetry method for the mobility of regular structures using graph products. J Struct Eng 142(9):04016051
Article MATH Google Scholar
Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Correia A, Paredes H, Fonseca B (2018) Scientometric analysis of scientific publications in CSCW. Scientometrics 114(1):31–89
Article MATH Google Scholar
Dabre R, Chu C, Kunchukuttan A (2020) A comprehensive survey of multilingual neural machine translation. ACM Comput Surv 53(5):38. https://doi.org/10.1145/3406095
Article Google Scholar
Dai Z, Callan J (2019) Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 985–988
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240
Dong D, Wu H, He W, Yu D, Wang H (2015) Multi-task learning for multiple language translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long Papers), pp 1723–1732
Fan W, Chen Y, Li J, Sun Y, Feng J, Hassanin H, Sareh P (2021) Machine learning applied to the design and inspection of reinforced concrete bridges: resilient methods and emerging applications. Structures 33:3954–3963. https://doi.org/10.1016/j.istruc.2021.06.110
Article MATH Google Scholar
Ganguli I, Bhowmick RS, Biswas S, Sil J (2021) Empirical auto-evaluation of python code for performance analysis of transformer network using T5 architecture. In: 2021 8th international conference on smart computing and communications (ICSCC), pp 75–79. IEEE
Ganguli I, Bhowmick RS, Sil J (2021) Deep insights of erroneous Bengali-English code-mixed bilingual language. IETE J Res. https://doi.org/10.1080/03772063.2021.1934125
Article MATH Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article MATH Google Scholar
Jamatia A, Das A, Gambäck B (2019) Deep learning-based language identification in English-Hindi-Bengali code-mixed social media corpora. J Intell Syst 28(3):399–408
MATH Google Scholar
Jamatia A, Gambäck B, Das A (2015) Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. Association for Computational Linguistics
Jamatia A, Gambäck B, Das A (2016) Collecting and annotating Indian social media code-mixed corpora. In: International conference on intelligent text processing and computational linguistics. Springer, pp 406–417
Jiang J (2012) Information extraction from text. Mining text data. Springer, Boston, MA, pp 11–41
Chapter MATH Google Scholar
Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G et al (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351
Kaveh A, Bakhshpoori T (2019) Metaheuristics: outlines, MATLAB codes and examples. Springer, Cham
Book MATH Google Scholar
Khanuja S, Dandapat S, Sitaram S, Choudhury M (2020) A new dataset for natural language inference from code-mixed conversations. arXiv preprint arXiv:2004.05051
Khanuja S, Dandapat S, Srinivasan A, Sitaram S, Choudhury M (2020) Gluecos: an evaluation benchmark for code-switched NLP. arXiv preprint arXiv:2004.12376
Koehn P, Knowles R (2017) Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70
Article MATH Google Scholar
Mandal S, Singh AK (2018) Language identification in code-mixed data using multichannel neural networks and context capture. arXiv preprint arXiv:1808.07118
Metzing D (2019) Frame conceptions and text understanding. De Gruyter, Berlin. https://doi.org/10.1515/9783110858778
Book MATH Google Scholar
Molina G, AlGhamdi F, Ghoneim M, Hawwari A, Rey-Villamizar N, Diab M, Solorio T (2019) Overview for the second shared task on language identification in code-switched data. arXiv preprint arXiv:1909.13016
Mujjiga S, Krishna V, Chakravarthi K, Vijayananda J (2019) Identifying semantics in clinical reports using neural machine translation. In: Proceedings of the conference on artificial intelligence, vol 33, pp 9552–9557
Paul J, Bhowmick RS, Das B, Sikdar BK (2020) A smart home security system in low computing IoT environment. In: 2020 IEEE 17th India council international conference (INDICON), pp 1–7. IEEE
Ratner A, Ré C (2018) Knowledge base construction in the machine-learning era: three critical design points: joint-learning, weak supervision, and new representations. Queue 16(3):79–90
Article MATH Google Scholar
Roy RS, Choudhury M, Majumder P, Agarwal K (2013) Overview of the fire 2013 track on transliterated search. In: Post-proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, pp 1–7
Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge
MATH Google Scholar
Sen S, Gupta KK, Ekbal A, Bhattacharyya P (2018) IITP-MT at WAT2018: transformer-based multilingual indic-English neural machine translation system. In: Proceedings of the 32nd Pacific Asia conference on language, information and computation: 5th workshop on Asian translation: 5th workshop on Asian translation
Sequiera R, Choudhury M, Bali K (2015) POS tagging of Hindi-English code mixed text from social media: some machine learning experiments. In: Proceedings of the 12th international conference on natural language processing, pp 237–246
Sharma A, Gupta S, Motlani R, Bansal P, Srivastava M, Mamidi R, Sharma DM (2016) Shallow parsing pipeline for Hindi-English code-mixed social media text. arXiv preprint arXiv:1604.03136
Sharma Y, Gupta S (2018) Deep learning approaches for question answering system. Procedia Comput Sci 132:785–794
Article MATH Google Scholar
Singh K, Sen I, Kumaraguru P (2018) A Twitter corpus for Hindi-English code mixed POS tagging. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 12–17
Singh V, Vijay D, Akhtar SS (2018) Named entity recognition for Hindi-English code-mixed social media text. In: Named entity recognition for Hindi-English code-mixed social media text. Association for Computational Linguistics, Melbourne, Australia, pp 27–35. https://doi.org/10.18653/v1/W18-2405
Chapter MATH Google Scholar
Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In :Thirty-first AAAI conference on artificial intelligence
Tay Y, Dehghani M, Bahri D, Metzler D(2020) Efficient transformers: a survey. arXiv preprint arXiv:2009.06732
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vyas Y, Gella S, Sharma J, Bali K, Choudhury M (2014) POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 974–979
Wang W, Pan SJ (2020) Integrating deep learning with logic fusion for information extraction. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 9225–9232
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

Download references

Author information

Authors and Affiliations

Department of Computer Science & Technology, Indian Institute of Engineering Science & Technology, Shibpur, Howrah, India
Rajat Subhra Bhowmick, Isha Ganguli & Jaya Sil

Authors

Rajat Subhra Bhowmick
View author publications
You can also search for this author inPubMed Google Scholar
Isha Ganguli
View author publications
You can also search for this author inPubMed Google Scholar
Jaya Sil
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Rajat Subhra Bhowmick.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A tag wise result

The precision, recall, and F1 score for each tag in all datasets are shown in the tables below for weighted and Macro-average along with respective support values. Tables 12, 13, 14, 15, 16 and 17 show the performance on POS ICON, POS UD, LID FIRE, LID ICON, LID UD, IIITH NER datasets, respectively.

Table 12 The comparison precision, recall and F1 score for each tag on the POS ICON dataset among the direct transformer and the proposed character-level inclusive transformer

Full size table

Table 13 The comparison precision, recall and F1 score for each tag on the POS UD dataset among the direct transformer and the proposed character-level inclusive transformer

Full size table

Table 14 The comparison precision, recall and F1 score for each tag on the LID FIRE dataset among the direct transformer and the proposed character-level inclusive transformer

Full size table

Table 15 The comparison precision, recall and F1 score for each tag on the LID ICON dataset among the direct transformer and the proposed character-level inclusive transformer

Full size table

Table 16 The comparison precision, recall and F1 score for each tag on the LID UD dataset among the direct transformer and the proposed character-level inclusive transformer

Full size table

Table 17 The comparison precision, recall and F1 score for each tag on the IIITH NER dataset among the direct transformer and the proposed character-level inclusive transformer

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhowmick, R.S., Ganguli, I. & Sil, J. Character-level inclusive transformer architecture for information gain in low resource code-mixed language. Neural Comput & Applic 37, 559–577 (2025). https://doi.org/10.1007/s00521-022-06983-2

Download citation

Received: 29 March 2021
Accepted: 27 January 2022
Published: 09 March 2022
Issue Date: January 2025
DOI: https://doi.org/10.1007/s00521-022-06983-2

Keywords

Part of a collection:

Special Issue on 2020 India Intl. Congress on Computational Intelligence

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Character-level inclusive transformer architecture for information gain in low resource code-mixed language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improvised Neural Machine Translation Model for Hinglish to English

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

Classification of Code-Mixed Tamil Text Using Deep Learning Algorithms

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A tag wise result

Appendix A tag wise result

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now