skip to main content
10.1145/3632754.3633075acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
extended-abstract

CoLI@FIRE2023: Findings of Word-level Language Identification in Code-mixed Tulu Text

Published: 12 February 2024 Publication History

Abstract

Word-level Language Identification (LI) task determines the language of each word in a given code-mixed sentence, where a sentence is made up of words belonging to more than one language at word/sub-word level. This task is explored to a greater extent in high-resource languages like Spanish, French, and German in a code-mixed context, whereas it is very less explored in a few under-resourced languages and not yet addressed in a few other languages. In view of this, "CoLI-Tunglish: Word-level Language Identification in Code-mixed Tulu Texts" shared task at Forum for Information Retrieval Evaluation (FIRE) 2023 invites researchers to develop learning models for Word-level LI in Code-mixed Tulu Texts. CoLI-Tunglish dataset consists of mixing of three languages (Tulu, Kannada, and English) at word/sub-word level with the objective of assigning one of seven predefined labels: Tulu, Kannada, English, Mixed (a combination of Tulu, Kannada, and/or English languages), Name, Location, and Other, to each word in a given sentence. This paper describes the overview of the methodology and results obtained by five distinct teams who submitted 10 different runs out of 14 registered teams. Among all the models submitted by the participants, the top-performing model obtained a macro F1 score of 0.81. The outcomes achieved by the participating teams indicate a promising direction for tackling word-level LI challenges in code-mixed Tulu text. These results offer valuable insights and potential solutions, opening the new avenues of research for advancements in linguistic technologies for code-mixed Tulu text.

References

[1]
F Balouchzahi, S Butt, A Hegde, N Ashraf, HL Shashirekha, G Sidorov, and A Gelbukh. 2022. Overview of CoLI-Kanglish: Word Level Language Identification in Code-mixed Kannada-English Texts at ICON 2022. In Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts. 38.
[2]
Asha Hagde, Fazlourrahman Balouchzahi, Sharal Coelho, Shashirekha Hosahalli Lakshmaiah, Hamada A Nayel, and Sabur Butt. 2023. Overview of CoLI-Tunglish: Word-level Language Identification in Code-mixed Tulu Texts at FIRE 2023. In Forum for Information Retrieval Evaluation FIRE - 2023 (Goa University, Panji).
[3]
Asha Hegde, Mudoor Devadas Anusha, Sharal Coelho, Hosahalli Lakshmaiah Shashirekha, and Bharathi Raja Chakravarthi. 2022. Corpus Creation for Sentiment Analysis in Code-Mixed Tulu Text. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. 33–40.
[4]
Hosahalli Lakshmaiah Shashirekha, Fazlourrahman Balouchzahi, Mudoor Devadas Anusha, and Grigori Sidorov. 2022. CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts. In Acta Polytechnica Hungarica. 123–141.

Index Terms

  1. CoLI@FIRE2023: Findings of Word-level Language Identification in Code-mixed Tulu Text
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
          December 2023
          170 pages
          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 12 February 2024

          Check for updates

          Author Tags

          1. Language Identification
          2. Sequence Labeling
          3. Tulu
          4. Word-level

          Qualifiers

          • Extended-abstract
          • Research
          • Refereed limited

          Conference

          FIRE 2023

          Acceptance Rates

          Overall Acceptance Rate 19 of 64 submissions, 30%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 39
            Total Downloads
          • Downloads (Last 12 months)36
          • Downloads (Last 6 weeks)7
          Reflects downloads up to 08 Mar 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media