research-article

Indian Native Language Identification - INLI 2018

Authors:
Anand Kumar M

Department of Information Technology, National Institute of Technology, Karnataka, Surathkal, Mangalore

Department of Information Technology, National Institute of Technology, Karnataka, Surathkal, Mangalore
View Profile

,
Barathi Ganesh HB

Center for Computational Engineering & Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

Center for Computational Engineering & Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
View Profile

,
Soman KP

Center for Computational Engineering & Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

Center for Computational Engineering & Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
View Profile

,
Ajay SG

Arnekt Solutions Pvt. Ltd., Magarpatta City, Pune, Maharashtra, India

Arnekt Solutions Pvt. Ltd., Magarpatta City, Pune, Maharashtra, India
View Profile

FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval EvaluationDecember 2018Pages 11–15https://doi.org/10.1145/3293339.3293342

Published:06 December 2018Publication History

FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

Pages 11–15

ABSTRACT

The growth of digital platforms enables the industries to serve user specific services. Most of the time, the information of the internet users are not explicitly available and it acts as a constrain in developing the personalized applications. There comes the need for author profiling tasks, which intends to predict the internet users characteristics from their texts. Native language Identification is one among the author profiling task, that predicts the authors native language from their texts available in other language. We have proposed Indian Native Language Identification task, where the internet users texts are written in English and participants needs to find, whether the user's native language is from Tamil, Malayalam, Kannada, Telugu, Bengali and Hindi. The corpus is collected from texts from regional news paper pages available in Facebook by considering the hypothesis that the user belongs to a particular region will read the news from respective regional news paper.

References

M Anand Kumar, Barathi Ganesh HB, Shivkaran Singh, KP Soman, and Paolo Rosso. {n. d.}. Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification. ({n. d.}).Google Scholar
Kunal Chakma and Amitava Das. 2016. Cmir: A corpus for evaluation of code mixed information retrieval of hindi-english tweets. Computatión y Sistemas 20, 3 (2016), 425--434.Google Scholar
Anupam Jamatia, Björn Gambäck, and Amitava Das. 2016. Collecting and Annotating Indian Social Media Code-Mixed Corpora. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 406--417.Google Scholar
Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of hindi-english code mixed text. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2482--2491.Google Scholar
Sue Knight. 2002. NLP at work: the difference that makes the difference in business. Nicholas Brealey London.Google Scholar
Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and Yao Qian. 2017. A report on the 2017 native language identification shared task. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 62--75.Google ScholarCross Ref
JG Phillips and Alex Blaszczynski. 2010. Gambling and the Impact of New and Emerging Technologies and Associated Products, Tender No 119/06, Final Report-August 2010. Gambling Research Australia.Google Scholar
Francisco Rangel, Paolo Rosso, Irina Chugur, Martin Potthast, Martin Trenkmann, Benno Stein, Ben Verhoeven, and Walter Daelemans. 2014. Overview of the 2nd author profiling task at pan 2014. In CLEF 2014 Evaluation Labs and Workshop Working Notes Papers, Sheffield, UK, 2014. 1--30.Google Scholar
Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and Giacomo Inches. 2013. Overview of the author profiling task at PAN 2013. In CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT, 352--365.Google Scholar
Francisco Rangel, Paolo Rosso, Martin Potthast, and Benno Stein. 2017. Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter. Working Notes Papers of the CLEF (2017).Google Scholar
Francisco Manuel Rangel Pardo, Fabio Celli, Paolo Rosso, Martin Potthast, Benno Stein, and Walter Daelemans. 2015. Overview of the 3rd Author Profiling Task at PAN 2015. In CLEF 2015 Evaluation Labs and Workshop Working Notes Papers. 1--8.Google Scholar
Björn W Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K Burgoon, Alice Baird, Aaron C Elkins, Yue Zhang, Eduardo Coutinho, and Keelan Evanini. 2016. The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language.. In Interspeech, Vol. 2016. 2001--2005.Google Scholar
Joel Tetreault, Daniel Blanchard, and Aoife Cahill. 2013. A report on the first native language identification shared task. In Proceedings of the eighth workshop on innovative use of NLP for building educational applications. 48--57.Google Scholar
Joel Tetreault, Daniel Blanchard, Aoife Cahill, and Martin Chodorow. 2012. Native tongues, lost and found: Resources and empirical evaluations in native language identification. Proceedings of COLING 2012 (2012), 2585--2602.Google Scholar

Recommendations

Native Language Identification: The Role of Consonants
ACAI '19: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence

Native language identification is relevant in speech technology, second language learning, forensic analysis and cross-cultural communication. Here we explore the contribution of consonantal articulation in this process. Specifically, we investigate ...
Read More
Native Language Identification of Fluent and Advanced Non-Native Writers

Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language ...
Read More
Portuguese Native Language Identification
Computational Processing of the Portuguese Language
Abstract
This study presents the first Native Language Identification (NLI) study for L2 Portuguese. We used a sub-set of the NLI-PT dataset, containing texts written by speakers of five different native languages: Chinese, English, German, Italian, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2018
68 pages
ISBN:9781450362085
DOI:10.1145/3293339
Editors:
Prasenjit Majumder,
Mandar Mitra,
Jainisha Sankhavara,
Parth Mehta
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 December 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Author Profiling
FIRE 2018
Indian Languages
Native Language Identification
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate19of64submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 91
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Indian Native Language Identification - INLI 2018

FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Recommendations

Native Language Identification: The Role of Consonants

Native Language Identification of Fluent and Advanced Non-Native Writers

Portuguese Native Language Identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Indian Native Language Identification - INLI 2018

FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Recommendations

Native Language Identification: The Role of Consonants

Native Language Identification of Fluent and Advanced Non-Native Writers

Portuguese Native Language Identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media