short-paper

A study on spoken language identification using deep neural networks

Authors:

Alexandra Draghici,

Jakob Abeßer,

Hanna LukashevichAuthors Info & Claims

AM '20: Proceedings of the 15th International Audio Mostly Conference

Pages 253 - 256

https://doi.org/10.1145/3411109.3411123

Published: 16 September 2020 Publication History

Get Access

Abstract

In this paper, we investigate a previously proposed algorithm for spoken language identification based on convolutional neural networks and convolutional recurrent neural networks. We improve the algorithm by modifying the training strategy to ensure equal class distribution and efficient memory usage. We successfully replicate previous experimental findings using a modified set of languages. Our findings confirm that both a convolutional neural network as well as convolutional recurrent neural networks are capable to learn language-specific patterns in mel spectrogram representations of speech recordings.

References

[1]

Christian Bartz, Tom Herold, Haojin Yang, and Christoph Meinel. 2017. Language identification using deep convolutional recurrent neural networks. In International Conference on Neural Information Processing (ICONIP). Springer, Guangzhou, China, 880--889.

Crossref

Google Scholar

[2]

Panikos Heracleous, Kohichi Takai, Keiji Yasuda, Yasser Mohammad, and Akio Yoneyama. 2018. Comparative Study on Spoken Language Identification Based on Deep Learning. In Proceedings of the 26th European Signal Processing Conference (EUSIPCO). Rome, Italy, 2265--2269.

Crossref

Google Scholar

[3]

Rigas Kotsakis, Maria Matsiola, George Kalliris, and Charalampos Dimoulas. 2020. Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Con-tent. Information 11, 4 (2020), 211.

Crossref

Google Scholar

[4]

Gregoire Montavon. 2009. Deep learning for spoken language identification. In NIPS Workshop on deep learning for speech recognition and related applications. Vancouver, BC, Canada, 1--4.

Google Scholar

[5]

Shauna Revay and Matthew Teschke. 2019. Multiclass language identification using deep learning on spectral images of audio signals. CoRR abs/1905.04348 (2019).

Google Scholar

[6]

Pedro A. Torres-Carrasquillo, Douglas A. Reynolds, and John R. Deller. 2002. Language identification using Gaussian mixture model tokenization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Orlando, FL, USA, I-757--I-760.

Google Scholar

[7]

Qian Zhang and John HL Hansen. 2018. Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 5 (2018), 873--882.

Digital Library

Google Scholar

Cited By

View all

Shah AYadav SPatil H(2024)Teager Energy Cepstral Coefficients for Spoken Language Identification2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC63619.2025.10849047(1-6)Online publication date: 3-Dec-2024
https://doi.org/10.1109/APSIPAASC63619.2025.10849047
Mandal APal SDutta IBhattacharya MNaskar S(2024)Is Attention always needed? A case study on language identification from speechNatural Language Processing10.1017/nlp.2024.22(1-27)Online publication date: 31-May-2024
https://doi.org/10.1017/nlp.2024.22
Sawalkar SRoy P(2024)A Review on Indian Language Identification Using Deep LearningModeling, Simulation and Optimization10.1007/978-981-99-6866-4_23(315-328)Online publication date: 20-Feb-2024
https://doi.org/10.1007/978-981-99-6866-4_23
Show More Cited By

Index Terms

A study on spoken language identification using deep neural networks
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Speech / audio search

Recommendations

A review into deep learning techniques for spoken language identification
Abstract
Information Technology has touched new vistas for a couple of decades mostly to simplify the day-to-day life of the humans. One of the key contributions of Information Technology is the application of Artificial Intelligence to achieve better ...
I-vectors and Deep Convolutional Neural Networks for Language Identification in Clean and Reverberant Environments
Computational Linguistics and Intelligent Text Processing
Abstract
In the current study, a method for automatic language identification based on deep convolutional neural networks (DCNN) and the i-vector paradigm is proposed. Convolutional neural networks (CNN) have been successfully applied to image ...
Spoken language identification: An overview of past and present research trends
Highlights
- Analysis of speech signals for automatic estimation of the language spoken.
- Automatic speech recognition, speaker verification, and language identification are compared.
- What distinguishes different spoken languages is discussed.
Abstract
Identification of the language used in spoken utterances is useful for multiple applications, e.g., assist in directing or automating telephone calls, or selecting which language-specific speech recognizer to use. This paper reviews modern ...

Comments

Information & Contributors

Information

Published In

AM '20: Proceedings of the 15th International Audio Mostly Conference

September 2020

281 pages

ISBN:9781450375634

DOI:10.1145/3411109

Conference Chairs:
Katharina Groß-Vogt,
Robert Höldrich

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

European Union

Conference

AM'20

AM'20: Audio Mostly 2020

September 15 - 17, 2020

Graz, Austria

Acceptance Rates

AM '20 Paper Acceptance Rate 29 of 47 submissions, 62%;

Overall Acceptance Rate 177 of 275 submissions, 64%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
133
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Shah AYadav SPatil H(2024)Teager Energy Cepstral Coefficients for Spoken Language Identification2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC63619.2025.10849047(1-6)Online publication date: 3-Dec-2024
https://doi.org/10.1109/APSIPAASC63619.2025.10849047
Mandal APal SDutta IBhattacharya MNaskar S(2024)Is Attention always needed? A case study on language identification from speechNatural Language Processing10.1017/nlp.2024.22(1-27)Online publication date: 31-May-2024
https://doi.org/10.1017/nlp.2024.22
Sawalkar SRoy P(2024)A Review on Indian Language Identification Using Deep LearningModeling, Simulation and Optimization10.1007/978-981-99-6866-4_23(315-328)Online publication date: 20-Feb-2024
https://doi.org/10.1007/978-981-99-6866-4_23
Amogh AHari Priya AKanchumarti TBommilla LRegunathan R(2024)Language Detection Based on Audio for Indian LanguagesAutomatic Speech Recognition and Translation for Low Resource Languages10.1002/9781394214624.ch14(275-296)Online publication date: 29-Mar-2024
https://doi.org/10.1002/9781394214624.ch14
Turchet LLagrange MRottondi CFazekas GPeters NØstergaard JFont FBäckström TFischione C(2023)The Internet of Sounds: Convergent Trends, Insights, and Future DirectionsIEEE Internet of Things Journal10.1109/JIOT.2023.325360210:13(11264-11292)Online publication date: 1-Jul-2023
https://doi.org/10.1109/JIOT.2023.3253602
Sharimbayev BKadyrov S(2023)Automatic Language Identification from Audio Signals using LSTM-RNN2023 17th International Conference on Electronics Computer and Computation (ICECCO)10.1109/ICECCO58239.2023.10146603(1-5)Online publication date: 1-Jun-2023
https://doi.org/10.1109/ICECCO58239.2023.10146603
Watve SPatil MShinde A(2023)Review of Features and Classification for Spoken Indian Language Recognition using Deep Learning and Machine Learning Techniques2023 International Conference on Emerging Smart Computing and Informatics (ESCI)10.1109/ESCI56872.2023.10099742(1-6)Online publication date: 1-Mar-2023
https://doi.org/10.1109/ESCI56872.2023.10099742
Alashban AAlotaibi Y(2023)A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other LanguagesIEEE Access10.1109/ACCESS.2023.324185511(11613-11628)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3241855
Alemu AMelese MSalau A(2023)Towards audio-based identification of Ethio-Semitic languages using recurrent neural networkScientific Reports10.1038/s41598-023-46646-313:1Online publication date: 7-Nov-2023
https://doi.org/10.1038/s41598-023-46646-3
Moradi AShekofteh Y(2023)Spoken language identification using a genetic-based fusion approach to combine acoustic and universal phonetic resultsComputers and Electrical Engineering10.1016/j.compeleceng.2022.108549105(108549)Online publication date: Jan-2023
https://doi.org/10.1016/j.compeleceng.2022.108549
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A review into deep learning techniques for spoken language identification

I-vectors and Deep Convolutional Neural Networks for Language Identification in Clean and Reverberant Environments

Spoken language identification: An overview of past and present research trends

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations