Skip to main content
Log in

Identifying language from songs

  • 1166: Advances of machine learning in data analytics and visual information
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Audio signal-based applications have significantly evolved over the last decade from speech recognizers to audio-based search engines, and healthcare is no exception. It also holds true when multimedia content needs to be analyzed. One of the most popular and rapidly increasing sources of multimedia is music that can be in either audio or video format. To efficiently retrieve data, such ever-increasing information demands for different indexing and categorization techniques. The automated song search engines can benefit largely from a language identifier that can segregate songs by the language used. In this paper, we propose to identify the language of songs using Line Spectral Frequency-Approximation Gradation (LSF-AG) features and an ensemble learning-based classification technique. Ensemble learning was used due to its better generalization ability. Using 70+ hours of data for three different languages: English, Bangla, and Hindi, in our experiments, we achieved the highest average accuracy of 98.61% that outperforms standard techniques. Further, the robustness of the system was tested by taking noisy datasets into account.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bhanja CC, Laskar MA, Laskar RH, Bandyopadhyay S Deep neural network based two-stage indian language identification system using glottal closure instants as anchor points. Journal of King Saud University-Computer and Information Sciences

  2. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  3. Chandrasekhar V, Sargin M E, Ross D A (2011) Automatic language identification in music videos with low level audio and visual features. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5724–5727

  4. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  5. Deshwal D, Sangwan P, Kumar D (2020) A language identification system using hybrid features and back-propagation neural network. Appl Acoust 164:107289

    Article  Google Scholar 

  6. Dietterich T G, et al. (2002) Ensemble learning. Handbook Brain Theory Neural Netw 2:110–125

    Google Scholar 

  7. Dutta A K, Rao K S (2018) Language identification using phase information. International Journal of Speech Technology 1–11

  8. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H (2009) The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18

    Article  Google Scholar 

  9. https://www.ethnologue.com/statistics/size, Ethnologue, Visited on 24.07.2018

  10. https://www.youtube.com, Youtube, Visited on 24.07.2018

  11. Irtza S, Sethu V, Bavattichalil H, Ambikairajah E, Li H (2016) A hierarchical framework for language identification. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5820–5824

  12. Irtza S, Sethu V, Ambikairajah E, Li H (2018) Using language cluster models in hierarchical language identification. Speech Comm 100:30–40

    Article  Google Scholar 

  13. Jin M, Song Y, McLoughlin I, Dai L-R (2018) Lid-senones and their statistics for language identification. IEEE/ACM Trans Audio, Speech, Lang Process 26(1):171–183

    Article  Google Scholar 

  14. Manwani N, Mitra S K, Joshi M V (2007) Spoken language identification for indian languages using split and merge em algorithm. In: International conference on pattern recognition and machine intelligence, Springer, pp 463–468

  15. Masumura R, Asami T, Masataki H, Aono Y (2017) Parallel phonetically aware dnns and lstm-rnns for frame-by-frame discriminative modeling of spoken language identification. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5260–5264

  16. Mehrabani M, Hansen J H (2011) Language identification for singing. In: 2011 IEEE International conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4408–4411

  17. Mitra V, Garcia-Romero D, Espy-Wilson C Y (2008) Language detection in audio content analysis. In: 2008 IEEE International conference on acoustics, speech and signal processing, IEEE, pp 2109–2112

  18. Monteiro J, Alam J, Falk T H (2019) Residual convolutional neural network with attentive feature pooling for end-to-end language identification from short-duration speech. Comput Speech Lang 58:364–376

    Article  Google Scholar 

  19. Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal—a language identification system. In: 2017 International conference on signal processing and communication (ICSPC), IEEE, pp 300–304

  20. Mukherjee H, Ghosh S, Sen S, Md O S, Santosh K, Phadikar S, Roy K (2019) Deep learning for spoken language identification: can we visualize speech signal patterns?. Neural Comput Applic 31(12):8483–8501

    Article  Google Scholar 

  21. Mukherjee H, Obaidullah S M, Santosh K, Phadikar S, Roy K (2020) A lazy learning-based language identification from speech using mfcc-2 features. Int J Mach Learn Cybern 11(1):1–14

    Article  Google Scholar 

  22. Mukherjee H, Dhar A, Obaidullah S M, Santosh K, Phadikar S, Roy K (2020) Linear predictive coefficients-based feature to identify top-seven spoken languages. Int J Pattern Recognit Artif Intell 34(06):2058006

    Article  Google Scholar 

  23. Mukherjee H, Dhar A, Obaidullah S M, Phadikar S, Roy K (2020) Image-based features for speech signal classification. Multimed Tools Appl 1–17

  24. Nagarajan T, Murthy H A (2006) Language identification using acoustic log-likelihoods of syllable-like units. Speech Comm 48(8):913–926

    Article  Google Scholar 

  25. Nandi D, Pati D, Rao K S (2017) Parametric representation of excitation source information for language identification. Comput Speech Lang 41:88–115

    Article  Google Scholar 

  26. Paliwal K (1992) On the use of line spectral frequency parameters for speech recognition. Digit Signal Process 2(2):80–87

    Article  Google Scholar 

  27. Polasi P K, Krishna K S R (2016) Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int J Speech Technol 19(1):75–85

    Article  Google Scholar 

  28. Rouas J-L, Farinas J, Pellegrino F, André-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Comm 47(4):436–456

    Article  Google Scholar 

  29. Sadjadi S O, Hansen J H (2015) Mean hilbert envelope coefficients (mhec) for robust speaker and language identification. Speech Comm 72:138–148

    Article  Google Scholar 

  30. Schwenninger J, Brueckner R, Willett D, Hennecke M E (2006) Language identification in vocal music. In: ISMIR, Citeseer, pp 377–379

  31. Singer E, Torres-Carrasquillo P A, Gleason T P, Campbell W M, Reynolds D A (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European conference on speech communication and technology

  32. Srivastava B M L, Vydana H, Vuppala A K, Shrivastava M (2017) Significance of neural phonotactic models for large-scale spoken language identification. In: 2017 International joint conference on neural networks (IJCNN), IEEE, pp 2144–2151

  33. Tang Z, Wang D, Chen Y, Li L, Abel A (2018) Phonetic temporal neural model for language identification. IEEE/ACM Trans Audio, Speech, Lang Process 26(1):134–144

    Article  Google Scholar 

  34. Tsai W-H, Wang H-M (2004) Towards automatic identification of singing language in popular music recordings. In: ISMIR

  35. Tsai W-H, Wang H-M (2007) Automatic identification of the sung language in popular music recordings. J New Music Res 36(2):105–114

    Article  Google Scholar 

  36. Van Segbroeck M, Travadi R, Narayanan S S (2015) Rapid language identification. IEEE Trans Audio Speech Lang Process 23(7):1118–1129

    Article  Google Scholar 

  37. Veera M K, Vuddagiri R K, Gangashetty S V, Vuppala A K (2018) Combining evidences from excitation source and vocal tract system features for indian language identification using deep neural networks. Int J Speech Technol 1–8

  38. Vuddagiri R K, Vydana H K, Vuppala A K (2018) Curriculum learning based approach for noise robust language identification using dnn with attention. Expert Syst Appl 110:290–297

    Article  Google Scholar 

  39. Yeh C-F, Lee L-S (2015) An improved framework for recognizing highly imbalanced bilingual code-switched lectures with cross-language acoustic modeling and frame-level language identification. IEEE Trans Audio Speech Lang Process 23(7):1144–1159

    Google Scholar 

  40. Zissman M A, Berkling K M (2001) Automatic language identification. Speech Comm 35(1-2):115–124

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santanu Phadikar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukherjee, H., Dhar, A., Obaidullah, S.M. et al. Identifying language from songs. Multimed Tools Appl 80, 35319–35339 (2021). https://doi.org/10.1007/s11042-020-10163-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10163-6

Keywords

Navigation