Spoken Arabic dialect recognition using X-vectors

Abualsoud Hanani; Rabee Naser

doi:10.1017/S1351324920000091

Spoken Arabic dialect recognition using X-vectors

Published online by Cambridge University Press: 04 May 2020

Abualsoud Hanani and

Rabee Naser

Show author details

Abualsoud Hanani*: Affiliation:
Electrical and Computer Engineering, Birzeit University, Palestine
Rabee Naser: Affiliation:
Electrical and Computer Engineering, Birzeit University, Palestine
*: *Corresponding author. E-mail: abualsoudh@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper describes our automatic dialect identification system for recognizing four major Arabic dialects, as well as Modern Standard Arabic. We adapted the X-vector framework, which was originally developed for speaker recognition, to the task of Arabic dialect identification (ADI). The training and development ADI VarDial 2018 and VarDial 2017 were used to train and test all of our ADI systems. In addition to the introduced X-vectors, other systems use the traditional i-vectors, bottleneck features, phonetic features, words transcriptions, and GMM-tokens. X-vectors achieved good performance (0.687) on the ADI 2018 Discriminating between Similar Languages shared task testing dataset, outperforming other systems. The performance of the X-vector system is slightly improved (0.697) when fused with i-vectors, bottleneck features, and word uni-gram features.

Keywords

X-vectors Arabic Dialect Recognition

Type: Article
Information: Natural Language Engineering , Volume 26 , Issue 6: Natural Language Processing for Similar Languages, Varieties, and Dialects , November 2020 , pp. 691 - 700

DOI: https://doi.org/10.1017/S1351324920000091 [Opens in a new window]
Copyright: © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ali, A., Dehak, N., Cardinal, P., Khurana, S., Yella, S.H., Glass, J., Bell, P. and Renals, S. (2015). Automatic dialect detection in arabic broadcast speech. arXiv preprint arXiv:1509.06928.Google Scholar

Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S. and Glass, J. (2014). A complete Kaldi recipe for building Arabic speech recognition systems. In 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 525–529. IEEE.CrossRef Google Scholar

Ali, A., Zhang, Y. and Vogel, S. (2014). QCRI advanced transcription system (QATS). Proceedings of SLT.Google Scholar

Brümmer, N. (2007). Focal multi-class: Toolkit for evaluation, fusion and calibration of multi-class recognition scorestutorial and user manual. Software. Available at http://sites.google.com/site/nikobrummer/focalmulticlass/ Google Scholar

Çöltekin, Ç. and Rama, T. (2017). Tübingen system in vardial 2017 shared task: Experiments with language identification and cross-lingual parsing. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pp. 146–155.CrossRef Google Scholar

Dehak, N., Dehak, R., Glass, J.R., Reynolds, D.A., Kenny, P., et al. (2010). Cosine similarity scoring without score normalization techniques. In Odyssey, p. 15.Google Scholar

DeMarco, A. and Cox, S.J. (2013). Native accent classification via i-vectors and speaker compensation fusion. In INTERSPEECH, pp. 1472–1476.Google Scholar

Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.-R. and Lee, C.-H. (2014). Robust speech recognition with speech enhanced deep neural networks. In Fifteenth Annual Conference of the International Speech Communication Association.Google Scholar

Eldesouki, M., Dalvi, F., Sajjad, H. and Darwish, K. (2016). Qcri@ dsl 2016: Spoken arabic dialect identification using textual features. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 221–226.Google Scholar

Elfardy, H. and Diab, M. (2013). Sentence level dialect identification in arabic. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. 2, pp. 456–461.Google Scholar

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin, C.-J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research 9(Aug), pp. 1871–1874.Google Scholar

Garcia-Romero, D., Snyder, D., Sell, G., Povey, D. and McCree, A. (2017). Speaker diarization using deep neural network embeddings. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4930–4934. IEEE.CrossRef Google Scholar

Habash, N. Y. (2010). Introduction to arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1), 1–187.CrossRef Google Scholar

Hanani, A., Qaroush, A. and Taylor, S. (2017). Identifying dialects with textual and acoustic cues. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pp. 93–101.CrossRef Google Scholar

Hanani, A., Russell, M.J. and Carey, M.J. (2013). Human and computer recognition of regional accents and ethnic groups from british english speech. Computer Speech & Language 27(1), 59–74.CrossRef Google Scholar

Malmasi, S., Zampieri, M., Ljubešić, N., Nakov, P., Ali, A. and Tiedemann, J. (2016). Discriminating between similar languages and arabic dialect identification: A report on the third DSL shared task. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 1–14.Google Scholar

Najafian, M., Khurana, S., Shan, S., Ali, A. and Glass, J. (2018). Exploiting convolutional neural networks for phonotactic based dialect identification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5174–5178. IEEE.CrossRef Google Scholar

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Number EPFL-CONF-192584. IEEE Signal Processing Society.Google Scholar

Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D. and Khudanpur, S. (2018a). Spoken language recognition using x-vectors. In Odyssey: The Speaker and Language Recognition Workshop, Les Sables dOlonne.CrossRef Google Scholar

Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. and Khudanpur, S. (2018b). X-vectors: Robust DNN embeddings for speaker recognition. Submitted to ICASSP.Google Scholar

Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y. and Khudanpur, S. (2016). Deep neural network-based speaker embeddings for end-to-end speaker verification. In 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 165–170. IEEE.CrossRef Google Scholar

Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A. and Deller, J.R., Jr. (2002). Approaches to language identification using gaussian mixture models and shifted delta Cepstral features. In Seventh International Conference on Spoken Language Processing.Google Scholar

Tüske, Z., Golik, P., Schlüter, R. and Ney, H. (2014). Acoustic modeling with deep neural networks using raw time signal for LVCSR. In Fifteenth Annual Conference of the International Speech Communication Association.Google Scholar

Wray, S. and Ali, A. (2015). Crowdsource a little to label a lot: Labeling a speech corpus of dialectal arabic. In Sixteenth Annual Conference of the International Speech Communication Association.Google Scholar

Zaidan, O.F. and Callison-Burch, C. (2014). Arabic dialect identification. Computational Linguistics 40(1), 171–202.CrossRef Google Scholar

Zirikly, A., Desmet, B. and Diab, M. (2016). The GW/LT3 vardial 2016 shared task system for dialects and similar languages detection. In COLING, pp. 33–41. The COLING 2016 Organizing Committee.Google Scholar

Article contents

Spoken Arabic dialect recognition using X-vectors

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests