Native-language identification is the task of determining a speaker’s native language based only on their speeches in a second language. In this paper we propose the use of the well-known i-vector representation of the speech signal to detect the native language of an English speaker. The i-vector representation has shown an excellent performance on the quite similar task of distinguishing between different languages. We have evaluated different ways to extract i-vectors in order to adapt them to the specificities of the native language detection task. The experimental results on the 2016 ComParE Native language sub-challenge test set have shown that the proposed system based on a conventional i-vector extractor outperforms the baseline system with a 42% relative improvement.
Cite as: Senoussaoui, M., Cardinal, P., Dehak, N., Koerich, A.L. (2016) Native Language Detection Using the I-Vector Framework. Proc. Interspeech 2016, 2398-2402, doi: 10.21437/Interspeech.2016-1473
@inproceedings{senoussaoui16_interspeech, author={Mohammed Senoussaoui and Patrick Cardinal and Najim Dehak and Alessandro L. Koerich}, title={{Native Language Detection Using the I-Vector Framework}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={2398--2402}, doi={10.21437/Interspeech.2016-1473} }