Abstract
Audio signal-based applications have significantly evolved over the last decade from speech recognizers to audio-based search engines, and healthcare is no exception. It also holds true when multimedia content needs to be analyzed. One of the most popular and rapidly increasing sources of multimedia is music that can be in either audio or video format. To efficiently retrieve data, such ever-increasing information demands for different indexing and categorization techniques. The automated song search engines can benefit largely from a language identifier that can segregate songs by the language used. In this paper, we propose to identify the language of songs using Line Spectral Frequency-Approximation Gradation (LSF-AG) features and an ensemble learning-based classification technique. Ensemble learning was used due to its better generalization ability. Using 70+ hours of data for three different languages: English, Bangla, and Hindi, in our experiments, we achieved the highest average accuracy of 98.61% that outperforms standard techniques. Further, the robustness of the system was tested by taking noisy datasets into account.
Similar content being viewed by others
References
Bhanja CC, Laskar MA, Laskar RH, Bandyopadhyay S Deep neural network based two-stage indian language identification system using glottal closure instants as anchor points. Journal of King Saud University-Computer and Information Sciences
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chandrasekhar V, Sargin M E, Ross D A (2011) Automatic language identification in music videos with low level audio and visual features. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5724–5727
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Deshwal D, Sangwan P, Kumar D (2020) A language identification system using hybrid features and back-propagation neural network. Appl Acoust 164:107289
Dietterich T G, et al. (2002) Ensemble learning. Handbook Brain Theory Neural Netw 2:110–125
Dutta A K, Rao K S (2018) Language identification using phase information. International Journal of Speech Technology 1–11
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18
https://www.ethnologue.com/statistics/size, Ethnologue, Visited on 24.07.2018
https://www.youtube.com, Youtube, Visited on 24.07.2018
Irtza S, Sethu V, Bavattichalil H, Ambikairajah E, Li H (2016) A hierarchical framework for language identification. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5820–5824
Irtza S, Sethu V, Ambikairajah E, Li H (2018) Using language cluster models in hierarchical language identification. Speech Comm 100:30–40
Jin M, Song Y, McLoughlin I, Dai L-R (2018) Lid-senones and their statistics for language identification. IEEE/ACM Trans Audio, Speech, Lang Process 26(1):171–183
Manwani N, Mitra S K, Joshi M V (2007) Spoken language identification for indian languages using split and merge em algorithm. In: International conference on pattern recognition and machine intelligence, Springer, pp 463–468
Masumura R, Asami T, Masataki H, Aono Y (2017) Parallel phonetically aware dnns and lstm-rnns for frame-by-frame discriminative modeling of spoken language identification. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5260–5264
Mehrabani M, Hansen J H (2011) Language identification for singing. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4408–4411
Mitra V, Garcia-Romero D, Espy-Wilson C Y (2008) Language detection in audio content analysis. In: 2008 IEEE International conference on acoustics, speech and signal processing, IEEE, pp 2109–2112
Monteiro J, Alam J, Falk T H (2019) Residual convolutional neural network with attentive feature pooling for end-to-end language identification from short-duration speech. Comput Speech Lang 58:364–376
Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal—a language identification system. In: 2017 International conference on signal processing and communication (ICSPC), IEEE, pp 300–304
Mukherjee H, Ghosh S, Sen S, Md O S, Santosh K, Phadikar S, Roy K (2019) Deep learning for spoken language identification: can we visualize speech signal patterns?. Neural Comput Applic 31(12):8483–8501
Mukherjee H, Obaidullah S M, Santosh K, Phadikar S, Roy K (2020) A lazy learning-based language identification from speech using mfcc-2 features. Int J Mach Learn Cybern 11(1):1–14
Mukherjee H, Dhar A, Obaidullah S M, Santosh K, Phadikar S, Roy K (2020) Linear predictive coefficients-based feature to identify top-seven spoken languages. Int J Pattern Recognit Artif Intell 34(06):2058006
Mukherjee H, Dhar A, Obaidullah S M, Phadikar S, Roy K (2020) Image-based features for speech signal classification. Multimed Tools Appl 1–17
Nagarajan T, Murthy H A (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Comm 48(8):913–926
Nandi D, Pati D, Rao K S (2017) Parametric representation of excitation source information for language identification. Comput Speech Lang 41:88–115
Paliwal K (1992) On the use of line spectral frequency parameters for speech recognition. Digit Signal Process 2(2):80–87
Polasi P K, Krishna K S R (2016) Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int J Speech Technol 19(1):75–85
Rouas J-L, Farinas J, Pellegrino F, André-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Comm 47(4):436–456
Sadjadi S O, Hansen J H (2015) Mean hilbert envelope coefficients (mhec) for robust speaker and language identification. Speech Comm 72:138–148
Schwenninger J, Brueckner R, Willett D, Hennecke M E (2006) Language identification in vocal music. In: ISMIR, Citeseer, pp 377–379
Singer E, Torres-Carrasquillo P A, Gleason T P, Campbell W M, Reynolds D A (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European conference on speech communication and technology
Srivastava B M L, Vydana H, Vuppala A K, Shrivastava M (2017) Significance of neural phonotactic models for large-scale spoken language identification. In: 2017 International joint conference on neural networks (IJCNN), IEEE, pp 2144–2151
Tang Z, Wang D, Chen Y, Li L, Abel A (2018) Phonetic temporal neural model for language identification. IEEE/ACM Trans Audio, Speech, Lang Process 26(1):134–144
Tsai W-H, Wang H-M (2004) Towards automatic identification of singing language in popular music recordings. In: ISMIR
Tsai W-H, Wang H-M (2007) Automatic identification of the sung language in popular music recordings. J New Music Res 36(2):105–114
Van Segbroeck M, Travadi R, Narayanan S S (2015) Rapid language identification. IEEE Trans Audio Speech Lang Process 23(7):1118–1129
Veera M K, Vuddagiri R K, Gangashetty S V, Vuppala A K (2018) Combining evidences from excitation source and vocal tract system features for indian language identification using deep neural networks. Int J Speech Technol 1–8
Vuddagiri R K, Vydana H K, Vuppala A K (2018) Curriculum learning based approach for noise robust language identification using dnn with attention. Expert Syst Appl 110:290–297
Yeh C-F, Lee L-S (2015) An improved framework for recognizing highly imbalanced bilingual code-switched lectures with cross-language acoustic modeling and frame-level language identification. IEEE Trans Audio Speech Lang Process 23(7):1144–1159
Zissman M A, Berkling K M (2001) Automatic language identification. Speech Comm 35(1-2):115–124
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mukherjee, H., Dhar, A., Obaidullah, S.M. et al. Identifying language from songs. Multimed Tools Appl 80, 35319–35339 (2021). https://doi.org/10.1007/s11042-020-10163-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10163-6