Predicting Second Language Proficiency Level Using Linguistic Cognitive Task and Machine Learning Techniques

Yang, YeongWook; Yu, WonHee; Lim, HeuiSeok

doi:10.1007/s11277-015-3062-2

Predicting Second Language Proficiency Level Using Linguistic Cognitive Task and Machine Learning Techniques

Published: 11 September 2015

Volume 86, pages 271–285, (2016)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

YeongWook Yang¹,
WonHee Yu¹ &
HeuiSeok Lim¹

764 Accesses
Explore all metrics

Abstract

This paper proposes a novel method for predicting second language proficiency based on linguistic cognitive ability measured in linguistic cognitive response test. Our method is based on an assumption that there is a correlation between language aptitude test scores and linguistic cognitive ability. Our proposed method for predicting L2 language proficiency uses as input learner’s linguistic cognition aptitude data. In our experiment, the method produced promising results with the predictive power as high as 70 %. Linguistic cognitive ability is measured through linguistic cognition tasks, which are: reading lexical decision tasks (LDT), listening LDT, translation recognition tasks, and semantic recognition tasks. Each type of the tasks is related to a different linguistic function in the brain. After measuring the learner’s linguistic cognitive aptitude, the result is fed as input for a machine learning model, which makes predictions for the corresponding language proficiency level. In training the linguistic proficiency classifier, we used multi-layer perceptron, Naive Bayes, logistic regression, and random forest model. For input data set in our experiment, we had 42 participants take our cognitive aptitude tests and used the result. Our classifier showed an accuracy >70 % in predicting proficiency level. Among the models, random forest model produced the best predictive power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proficiency Level Classification of Foreign Language Learners Using Machine Learning Algorithms and Multilingual Models

L2 Learners’ Proficiency Evaluation Using Statistics Based on Relationship Among CEFR Rating Scales

Classification of Speaking Proficiency Level by Machine Learning and Feature Selection

References

Wikipedia. (2009). Language proficiency. San Francisco: Wikimedia Foundation Inc.
Google Scholar
Wikipedia. (2009). TOEIC, TOEFL, TEPS. San Francisco: Wikimedia Foundation Inc.
Google Scholar
Kroll, J. F., Michael, E., Tokowicz, N., & Dufour, R. (2002). The development of lexical fluency in a second language. Second Language Research, 18(2), 137–171.
Article Google Scholar
Ferre, P., Sanchez-Casas, R., & Guasch, M. (2006). Can a horse be a donkey? Semantic and form interference effects in translation recognition in early and late proficient and nonproficient Spanish-Catalan bilinguals. Language Learning, 56(4), 257–608.
Article Google Scholar
Fairclough, M. (2011). Testing the lexical recognition task with Spanish/English bilinguals in the United States. Language Testing, 28(2), 273–297.
Article Google Scholar
Phillips, N. A., Segalowitz, N., O’Brien, I., & Yamasaki, N. (2004). Semantic priming in a first and second language: evidence from reaction time variability and event-related brain potentials. Journal of Neurolinguistics, 17, 237–262.
Article Google Scholar
Schoonbaert, S., Duyck, W., Brysbaert, M., & Hartsuiker, R. J. (2009). Semantic and translation priming from a first language to a second and back: Making sense of the findings. Memory and Cognition, 17(5), 569–586.
Article Google Scholar
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2011). Predicting the proficiency level of language learners using lexical indices. Language Testing, 29(2), 240–260.
Google Scholar
De Wet, F., Van Der Walt, C., & Niesler, T. R. (2009). Automatic assessment of oral language proficiency and listening comprehension. Speech Communication, 52, 864–874.
Article Google Scholar
Van der Walt, C., De Wet, F., & Niesler, T. R. (2008). Oral proficiency assessment: The use of automatic speech recognition systems. South African Linguist and Applied Language Studies, 26(1), 135–146.
Article Google Scholar
Luo, D., Minematsu, N., Yamauchi, Y., & Hirose, K. (2008). Automatic assessment of language proficiency through shadowing. ISCSLP, 41–44.
Yang, Y., Ji, H., Lim, H. (2014). Second language proficiency prediction model through cognitive ability. ICISCA, 1(1), 48–50.
Google Scholar
de Annette, A. M. B., & de Cornijs, H. (1995). Translation recognition and translation production: Comparing a new and an old tool in the study of Bilingualism. Language Learning, 45(3), 467–509.
Article Google Scholar
Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). New York: Prentice-Hall.
MATH Google Scholar
Taspınar, N., & Çiçek, M. (2013). Neural network based receiver for multiuser detection in MC-CDMA systems. Wireless Personal Communications, 68, 463–472.
Article Google Scholar
Çiflikli, C., Özsahin, A. T., & Yapici, A. C. (2009). Artificial neural network channel estimation based on Levenberg-Marquardt for OFDM systems. Wireless Personal Communications, 68, 221–229.
Article Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(9), 533–536.
Article Google Scholar
Çalhan, A., & Çeken, C. (2013). Artificial neural network based vertical handoff algorithm for reducing handoff latency. Wireless Personal Communications, 71, 2399–2415.
Article Google Scholar
Ho, T. J. (2005). Data mining and data warehousing. New York: Prentice Hall.
Google Scholar
Breiman, L. (2001). Random forest. Machine Learning, 45, 5–32.
Article MATH Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
MathSciNet MATH Google Scholar
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). CELEX. Philadelphia: Linguistic Data Consortium.
Google Scholar

Download references

Acknowledgments

This research was supported by the ICT R&D program of MSIP/IITP. [2014, Development of distribution and diffusion service technology through individual and collective intelligence to digital contents]. This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIP) (No. NRF-2015R1A5A7037674).

Author information

Authors and Affiliations

Department of Computer Science Education, Korea University, Seoul, Korea
YeongWook Yang, WonHee Yu & HeuiSeok Lim

Authors

YeongWook Yang
View author publications
You can also search for this author inPubMed Google Scholar
WonHee Yu
View author publications
You can also search for this author inPubMed Google Scholar
HeuiSeok Lim
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to HeuiSeok Lim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Yu, W. & Lim, H. Predicting Second Language Proficiency Level Using Linguistic Cognitive Task and Machine Learning Techniques. Wireless Pers Commun 86, 271–285 (2016). https://doi.org/10.1007/s11277-015-3062-2

Download citation

Published: 11 September 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11277-015-3062-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Second Language Proficiency Level Using Linguistic Cognitive Task and Machine Learning Techniques

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Proficiency Level Classification of Foreign Language Learners Using Machine Learning Algorithms and Multilingual Models

L2 Learners’ Proficiency Evaluation Using Statistics Based on Relationship Among CEFR Rating Scales

Classification of Speaking Proficiency Level by Machine Learning and Feature Selection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now