Identifying language from songs

Mukherjee, Himadri; Dhar, Ankita; Obaidullah, Sk. Md.; Santosh, K. C.; Phadikar, Santanu; Roy, Kaushik

doi:10.1007/s11042-020-10163-6

Identifying language from songs

1166: Advances of machine learning in data analytics and visual information
Published: 02 January 2021

Volume 80, pages 35319–35339, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Himadri Mukherjee¹,
Ankita Dhar¹,
Sk. Md. Obaidullah²,
K. C. Santosh³,
Santanu Phadikar ORCID: orcid.org/0000-0002-7620-5518⁴ &
…
Kaushik Roy¹

235 Accesses
Explore all metrics

Abstract

Audio signal-based applications have significantly evolved over the last decade from speech recognizers to audio-based search engines, and healthcare is no exception. It also holds true when multimedia content needs to be analyzed. One of the most popular and rapidly increasing sources of multimedia is music that can be in either audio or video format. To efficiently retrieve data, such ever-increasing information demands for different indexing and categorization techniques. The automated song search engines can benefit largely from a language identifier that can segregate songs by the language used. In this paper, we propose to identify the language of songs using Line Spectral Frequency-Approximation Gradation (LSF-AG) features and an ensemble learning-based classification technique. Ensemble learning was used due to its better generalization ability. Using 70+ hours of data for three different languages: English, Bangla, and Hindi, in our experiments, we achieved the highest average accuracy of 98.61% that outperforms standard techniques. Further, the robustness of the system was tested by taking noisy datasets into account.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

LIFA: Language identification from audio with LPCC-G features

Article 14 December 2023

Himadri Mukherjee, Ankita Dhar, … Umapada Pal

Ensemble Model for Music Genre Classification

References

Bhanja CC, Laskar MA, Laskar RH, Bandyopadhyay S Deep neural network based two-stage indian language identification system using glottal closure instants as anchor points. Journal of King Saud University-Computer and Information Sciences
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Chandrasekhar V, Sargin M E, Ross D A (2011) Automatic language identification in music videos with low level audio and visual features. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5724–5727
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Deshwal D, Sangwan P, Kumar D (2020) A language identification system using hybrid features and back-propagation neural network. Appl Acoust 164:107289
Article Google Scholar
Dietterich T G, et al. (2002) Ensemble learning. Handbook Brain Theory Neural Netw 2:110–125
Google Scholar
Dutta A K, Rao K S (2018) Language identification using phase information. International Journal of Speech Technology 1–11
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18
Article Google Scholar
https://www.ethnologue.com/statistics/size, Ethnologue, Visited on 24.07.2018
https://www.youtube.com, Youtube, Visited on 24.07.2018
Irtza S, Sethu V, Bavattichalil H, Ambikairajah E, Li H (2016) A hierarchical framework for language identification. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5820–5824
Irtza S, Sethu V, Ambikairajah E, Li H (2018) Using language cluster models in hierarchical language identification. Speech Comm 100:30–40
Article Google Scholar
Jin M, Song Y, McLoughlin I, Dai L-R (2018) Lid-senones and their statistics for language identification. IEEE/ACM Trans Audio, Speech, Lang Process 26(1):171–183
Article Google Scholar
Manwani N, Mitra S K, Joshi M V (2007) Spoken language identification for indian languages using split and merge em algorithm. In: International conference on pattern recognition and machine intelligence, Springer, pp 463–468
Masumura R, Asami T, Masataki H, Aono Y (2017) Parallel phonetically aware dnns and lstm-rnns for frame-by-frame discriminative modeling of spoken language identification. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5260–5264
Mehrabani M, Hansen J H (2011) Language identification for singing. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4408–4411
Mitra V, Garcia-Romero D, Espy-Wilson C Y (2008) Language detection in audio content analysis. In: 2008 IEEE International conference on acoustics, speech and signal processing, IEEE, pp 2109–2112
Monteiro J, Alam J, Falk T H (2019) Residual convolutional neural network with attentive feature pooling for end-to-end language identification from short-duration speech. Comput Speech Lang 58:364–376
Article Google Scholar
Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal—a language identification system. In: 2017 International conference on signal processing and communication (ICSPC), IEEE, pp 300–304
Mukherjee H, Ghosh S, Sen S, Md O S, Santosh K, Phadikar S, Roy K (2019) Deep learning for spoken language identification: can we visualize speech signal patterns?. Neural Comput Applic 31(12):8483–8501
Article Google Scholar
Mukherjee H, Obaidullah S M, Santosh K, Phadikar S, Roy K (2020) A lazy learning-based language identification from speech using mfcc-2 features. Int J Mach Learn Cybern 11(1):1–14
Article Google Scholar
Mukherjee H, Dhar A, Obaidullah S M, Santosh K, Phadikar S, Roy K (2020) Linear predictive coefficients-based feature to identify top-seven spoken languages. Int J Pattern Recognit Artif Intell 34(06):2058006
Article Google Scholar
Mukherjee H, Dhar A, Obaidullah S M, Phadikar S, Roy K (2020) Image-based features for speech signal classification. Multimed Tools Appl 1–17
Nagarajan T, Murthy H A (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Comm 48(8):913–926
Article Google Scholar
Nandi D, Pati D, Rao K S (2017) Parametric representation of excitation source information for language identification. Comput Speech Lang 41:88–115
Article Google Scholar
Paliwal K (1992) On the use of line spectral frequency parameters for speech recognition. Digit Signal Process 2(2):80–87
Article Google Scholar
Polasi P K, Krishna K S R (2016) Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int J Speech Technol 19(1):75–85
Article Google Scholar
Rouas J-L, Farinas J, Pellegrino F, André-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Comm 47(4):436–456
Article Google Scholar
Sadjadi S O, Hansen J H (2015) Mean hilbert envelope coefficients (mhec) for robust speaker and language identification. Speech Comm 72:138–148
Article Google Scholar
Schwenninger J, Brueckner R, Willett D, Hennecke M E (2006) Language identification in vocal music. In: ISMIR, Citeseer, pp 377–379
Singer E, Torres-Carrasquillo P A, Gleason T P, Campbell W M, Reynolds D A (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European conference on speech communication and technology
Srivastava B M L, Vydana H, Vuppala A K, Shrivastava M (2017) Significance of neural phonotactic models for large-scale spoken language identification. In: 2017 International joint conference on neural networks (IJCNN), IEEE, pp 2144–2151
Tang Z, Wang D, Chen Y, Li L, Abel A (2018) Phonetic temporal neural model for language identification. IEEE/ACM Trans Audio, Speech, Lang Process 26(1):134–144
Article Google Scholar
Tsai W-H, Wang H-M (2004) Towards automatic identification of singing language in popular music recordings. In: ISMIR
Tsai W-H, Wang H-M (2007) Automatic identification of the sung language in popular music recordings. J New Music Res 36(2):105–114
Article Google Scholar
Van Segbroeck M, Travadi R, Narayanan S S (2015) Rapid language identification. IEEE Trans Audio Speech Lang Process 23(7):1118–1129
Article Google Scholar
Veera M K, Vuddagiri R K, Gangashetty S V, Vuppala A K (2018) Combining evidences from excitation source and vocal tract system features for indian language identification using deep neural networks. Int J Speech Technol 1–8
Vuddagiri R K, Vydana H K, Vuppala A K (2018) Curriculum learning based approach for noise robust language identification using dnn with attention. Expert Syst Appl 110:290–297
Article Google Scholar
Yeh C-F, Lee L-S (2015) An improved framework for recognizing highly imbalanced bilingual code-switched lectures with cross-language acoustic modeling and frame-level language identification. IEEE Trans Audio Speech Lang Process 23(7):1144–1159
Google Scholar
Zissman M A, Berkling K M (2001) Automatic language identification. Speech Comm 35(1-2):115–124
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee, Ankita Dhar & Kaushik Roy
Department of Computer Science and Engineering, Aliah University, Kolkata, India
Sk. Md. Obaidullah
Department of Computer Science, The University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Department of Computer Science & Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
Santanu Phadikar

Authors

Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Dhar
View author publications
You can also search for this author in PubMed Google Scholar
Sk. Md. Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santanu Phadikar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, H., Dhar, A., Obaidullah, S.M. et al. Identifying language from songs. Multimed Tools Appl 80, 35319–35339 (2021). https://doi.org/10.1007/s11042-020-10163-6

Download citation

Received: 26 February 2020
Revised: 15 October 2020
Accepted: 10 November 2020
Published: 02 January 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11042-020-10163-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying language from songs

Abstract

Access this article

Similar content being viewed by others

Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

LIFA: Language identification from audio with LPCC-G features

Ensemble Model for Music Genre Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifying language from songs

Abstract

Access this article

Similar content being viewed by others

Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

LIFA: Language identification from audio with LPCC-G features

Ensemble Model for Music Genre Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation