skip to main content
10.1145/3411109.3411123acmotherconferencesArticle/Chapter ViewAbstractPublication PagesamConference Proceedingsconference-collections
short-paper

A study on spoken language identification using deep neural networks

Published: 16 September 2020 Publication History

Abstract

In this paper, we investigate a previously proposed algorithm for spoken language identification based on convolutional neural networks and convolutional recurrent neural networks. We improve the algorithm by modifying the training strategy to ensure equal class distribution and efficient memory usage. We successfully replicate previous experimental findings using a modified set of languages. Our findings confirm that both a convolutional neural network as well as convolutional recurrent neural networks are capable to learn language-specific patterns in mel spectrogram representations of speech recordings.

References

[1]
Christian Bartz, Tom Herold, Haojin Yang, and Christoph Meinel. 2017. Language identification using deep convolutional recurrent neural networks. In International Conference on Neural Information Processing (ICONIP). Springer, Guangzhou, China, 880--889.
[2]
Panikos Heracleous, Kohichi Takai, Keiji Yasuda, Yasser Mohammad, and Akio Yoneyama. 2018. Comparative Study on Spoken Language Identification Based on Deep Learning. In Proceedings of the 26th European Signal Processing Conference (EUSIPCO). Rome, Italy, 2265--2269.
[3]
Rigas Kotsakis, Maria Matsiola, George Kalliris, and Charalampos Dimoulas. 2020. Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Con-tent. Information 11, 4 (2020), 211.
[4]
Gregoire Montavon. 2009. Deep learning for spoken language identification. In NIPS Workshop on deep learning for speech recognition and related applications. Vancouver, BC, Canada, 1--4.
[5]
Shauna Revay and Matthew Teschke. 2019. Multiclass language identification using deep learning on spectral images of audio signals. CoRR abs/1905.04348 (2019).
[6]
Pedro A. Torres-Carrasquillo, Douglas A. Reynolds, and John R. Deller. 2002. Language identification using Gaussian mixture model tokenization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Orlando, FL, USA, I-757--I-760.
[7]
Qian Zhang and John HL Hansen. 2018. Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 5 (2018), 873--882.

Cited By

View all
  • (2024)Teager Energy Cepstral Coefficients for Spoken Language Identification2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC63619.2025.10849047(1-6)Online publication date: 3-Dec-2024
  • (2024)Is Attention always needed? A case study on language identification from speechNatural Language Processing10.1017/nlp.2024.22(1-27)Online publication date: 31-May-2024
  • (2024)A Review on Indian Language Identification Using Deep LearningModeling, Simulation and Optimization10.1007/978-981-99-6866-4_23(315-328)Online publication date: 20-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AM '20: Proceedings of the 15th International Audio Mostly Conference
September 2020
281 pages
ISBN:9781450375634
DOI:10.1145/3411109
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. convolutional recurrent neural networks
  3. speech recognition
  4. spoken language identification

Qualifiers

  • Short-paper

Funding Sources

  • European Union

Conference

AM'20
AM'20: Audio Mostly 2020
September 15 - 17, 2020
Graz, Austria

Acceptance Rates

AM '20 Paper Acceptance Rate 29 of 47 submissions, 62%;
Overall Acceptance Rate 177 of 275 submissions, 64%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Teager Energy Cepstral Coefficients for Spoken Language Identification2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC63619.2025.10849047(1-6)Online publication date: 3-Dec-2024
  • (2024)Is Attention always needed? A case study on language identification from speechNatural Language Processing10.1017/nlp.2024.22(1-27)Online publication date: 31-May-2024
  • (2024)A Review on Indian Language Identification Using Deep LearningModeling, Simulation and Optimization10.1007/978-981-99-6866-4_23(315-328)Online publication date: 20-Feb-2024
  • (2024)Language Detection Based on Audio for Indian LanguagesAutomatic Speech Recognition and Translation for Low Resource Languages10.1002/9781394214624.ch14(275-296)Online publication date: 29-Mar-2024
  • (2023)The Internet of Sounds: Convergent Trends, Insights, and Future DirectionsIEEE Internet of Things Journal10.1109/JIOT.2023.325360210:13(11264-11292)Online publication date: 1-Jul-2023
  • (2023)Automatic Language Identification from Audio Signals using LSTM-RNN2023 17th International Conference on Electronics Computer and Computation (ICECCO)10.1109/ICECCO58239.2023.10146603(1-5)Online publication date: 1-Jun-2023
  • (2023)Review of Features and Classification for Spoken Indian Language Recognition using Deep Learning and Machine Learning Techniques2023 International Conference on Emerging Smart Computing and Informatics (ESCI)10.1109/ESCI56872.2023.10099742(1-6)Online publication date: 1-Mar-2023
  • (2023)A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other LanguagesIEEE Access10.1109/ACCESS.2023.324185511(11613-11628)Online publication date: 2023
  • (2023)Towards audio-based identification of Ethio-Semitic languages using recurrent neural networkScientific Reports10.1038/s41598-023-46646-313:1Online publication date: 7-Nov-2023
  • (2023)Spoken language identification using a genetic-based fusion approach to combine acoustic and universal phonetic resultsComputers and Electrical Engineering10.1016/j.compeleceng.2022.108549105(108549)Online publication date: Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media