Abstract
This paper proposes a novel method for predicting second language proficiency based on linguistic cognitive ability measured in linguistic cognitive response test. Our method is based on an assumption that there is a correlation between language aptitude test scores and linguistic cognitive ability. Our proposed method for predicting L2 language proficiency uses as input learner’s linguistic cognition aptitude data. In our experiment, the method produced promising results with the predictive power as high as 70 %. Linguistic cognitive ability is measured through linguistic cognition tasks, which are: reading lexical decision tasks (LDT), listening LDT, translation recognition tasks, and semantic recognition tasks. Each type of the tasks is related to a different linguistic function in the brain. After measuring the learner’s linguistic cognitive aptitude, the result is fed as input for a machine learning model, which makes predictions for the corresponding language proficiency level. In training the linguistic proficiency classifier, we used multi-layer perceptron, Naive Bayes, logistic regression, and random forest model. For input data set in our experiment, we had 42 participants take our cognitive aptitude tests and used the result. Our classifier showed an accuracy >70 % in predicting proficiency level. Among the models, random forest model produced the best predictive power.


Similar content being viewed by others
References
Wikipedia. (2009). Language proficiency. San Francisco: Wikimedia Foundation Inc.
Wikipedia. (2009). TOEIC, TOEFL, TEPS. San Francisco: Wikimedia Foundation Inc.
Kroll, J. F., Michael, E., Tokowicz, N., & Dufour, R. (2002). The development of lexical fluency in a second language. Second Language Research, 18(2), 137–171.
Ferre, P., Sanchez-Casas, R., & Guasch, M. (2006). Can a horse be a donkey? Semantic and form interference effects in translation recognition in early and late proficient and nonproficient Spanish-Catalan bilinguals. Language Learning, 56(4), 257–608.
Fairclough, M. (2011). Testing the lexical recognition task with Spanish/English bilinguals in the United States. Language Testing, 28(2), 273–297.
Phillips, N. A., Segalowitz, N., O’Brien, I., & Yamasaki, N. (2004). Semantic priming in a first and second language: evidence from reaction time variability and event-related brain potentials. Journal of Neurolinguistics, 17, 237–262.
Schoonbaert, S., Duyck, W., Brysbaert, M., & Hartsuiker, R. J. (2009). Semantic and translation priming from a first language to a second and back: Making sense of the findings. Memory and Cognition, 17(5), 569–586.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2011). Predicting the proficiency level of language learners using lexical indices. Language Testing, 29(2), 240–260.
De Wet, F., Van Der Walt, C., & Niesler, T. R. (2009). Automatic assessment of oral language proficiency and listening comprehension. Speech Communication, 52, 864–874.
Van der Walt, C., De Wet, F., & Niesler, T. R. (2008). Oral proficiency assessment: The use of automatic speech recognition systems. South African Linguist and Applied Language Studies, 26(1), 135–146.
Luo, D., Minematsu, N., Yamauchi, Y., & Hirose, K. (2008). Automatic assessment of language proficiency through shadowing. ISCSLP, 41–44.
Yang, Y., Ji, H., Lim, H. (2014). Second language proficiency prediction model through cognitive ability. ICISCA, 1(1), 48–50.
de Annette, A. M. B., & de Cornijs, H. (1995). Translation recognition and translation production: Comparing a new and an old tool in the study of Bilingualism. Language Learning, 45(3), 467–509.
Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). New York: Prentice-Hall.
Taspınar, N., & Çiçek, M. (2013). Neural network based receiver for multiuser detection in MC-CDMA systems. Wireless Personal Communications, 68, 463–472.
Çiflikli, C., Özsahin, A. T., & Yapici, A. C. (2009). Artificial neural network channel estimation based on Levenberg-Marquardt for OFDM systems. Wireless Personal Communications, 68, 221–229.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(9), 533–536.
Çalhan, A., & Çeken, C. (2013). Artificial neural network based vertical handoff algorithm for reducing handoff latency. Wireless Personal Communications, 71, 2399–2415.
Ho, T. J. (2005). Data mining and data warehousing. New York: Prentice Hall.
Breiman, L. (2001). Random forest. Machine Learning, 45, 5–32.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). CELEX. Philadelphia: Linguistic Data Consortium.
Acknowledgments
This research was supported by the ICT R&D program of MSIP/IITP. [2014, Development of distribution and diffusion service technology through individual and collective intelligence to digital contents]. This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIP) (No. NRF-2015R1A5A7037674).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, Y., Yu, W. & Lim, H. Predicting Second Language Proficiency Level Using Linguistic Cognitive Task and Machine Learning Techniques. Wireless Pers Commun 86, 271–285 (2016). https://doi.org/10.1007/s11277-015-3062-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-015-3062-2