Skip to main content
Log in

Image-based features for speech signal classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Like other applications, under the purview of pattern classification, analyzing speech signals is crucial. People often mix different languages while talking which makes this task complicated. This happens mostly in India, since different languages are used from one state to another. Among many, Southern part of India suffers a lot from this situation, where distinguishing their languages is important. In this paper, we propose image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns. Modified Mel frequency cepstral coefficient (MFCC) features namely MFCC- Statistics Grade (MFCC-SG) were extracted which were visualized by plotting techniques and thereafter fed to a convolutional neural network. In this study, we used the top 4 languages namely Telugu, Tamil, Malayalam, and Kannada. Experiments were performed on more than 900 hours of data collected from YouTube leading to over 150000 images and the highest accuracy of 94.51% was obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Alexa, https://www.alexa.com/ [Online; Accessed 5 Oct 2019]

  2. Ambikairajah E, Li H, Wang L, Yin B, Sethu V (2011) Language identification: a tutorial. IEEE Circuits and Systems Magazine 11(2):82–108

    Article  Google Scholar 

  3. Anjana JS, Poorna SS (2018) Language identification from speech features using SVM and LDA. In: 2018 international conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 1–4

  4. Bansal S, Agrawal SS (2017) Modeling of linguistic and acoustic information from speech signal for multilingual spoken language identification system (SLID). In: 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA). IEEE, pp 1–6

  5. Bartz C, Herold T, Yang H, Meinel C (2017) Language identification using deep convolutional recurrent neural networks. In: International conference on neural information processing. Springer, Cham, pp 880–889

  6. Bouguelia MR, Nowaczyk S, Santosh KC, Verikas A (2017) Agreeing to disagree: active learning with noisy labels without crowdsourcing. In: International journal of machine learning and cybernetics, pp 1–13

  7. Cortana, https://www.microsoft.com/en-in/windows/cortana [Online; Accessed 5 Oct 2019]

  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learning Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  9. Ethnologue, http://www.ethnologue.com, [Online; Accessed 19 Jan 2019]

  10. Giwa O, Davel MH (2017) The effect of language identification accuracy on speech recognition accuracy of proper names. In: Pattern recognition association of South Africa and robotics and mechatronics (PRASA-RobMech), 2017. IEEE, pp 187–192

  11. Gunawan TS, Husain R, Kartiwi M (2017) Development of language identification system using MFCC and vector quantization. In: 2017 IEEE 4th international conference on smart instrumentation, measurement and application (ICSIMA). IEEE, pp 1–4

  12. Gupta M, Bharti SS, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. In: 2017 4th international conference on power, control & embedded systems (ICPCES). IEEE, pp 1–6

  13. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18

    Article  Google Scholar 

  14. https://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/, [Online; Accessed 19 Aug 2018]

  15. https://www.youtube.com, [Online; Accessed 19 Aug 2018]

  16. https://en.wikipedia.org/wiki/Dravidian_languages [Online; Accessed 5 Oct 2019]

  17. Jin M, Song Y, McLoughlin I, Dai LR, Jin M, Song Y, McLoughlin I, Dai LR (2018) LID-senones and their statistics for language identification. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26(1):171–183

    Article  Google Scholar 

  18. Jothilakshmi S, Ramalingam V, Palanivel S (2012) A hierarchical language identification system for Indian languages. Digital Signal Processing 22(3):544–553

    Article  MathSciNet  Google Scholar 

  19. Kadambe S, Hieronymus JL (1995) Language identification with phonological and lexical models. In: 1995 International conference on acoustics, speech, and signal processing, 1995. ICASSP-95, vol 5. IEEE, pp 3507–3510

  20. Mukherjee H, Dhar A, Phadikar S, Roy K (2017) RECAL-A language identification system. In: 2017 international conference on signal processing and communication (ICSPC). IEEE, pp 300–304

  21. Mukherjee H, Dhar A, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2018) Identification of top-3 spoken Indian languages: an ensemble learning-based approach. In: 2018 fourth international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 135–140

  22. Mukherjee H, Dutta M, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2018) Lazy learning based segregation of top-3 South Indian languages with LSF-A feature. In: International conference on recent trends in image processing and pattern recognition . Springer, Singapore, pp 449–459

  23. Mukherjee H, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Techno 21(4):753–760

    Article  Google Scholar 

  24. Mukherjee H, Dhar A, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2019) Linear predictive coefficients-based feature to identify top-7 spoken language. In: International journal of pattern recognition and artificial intelligence, DOI https://doi.org/10.1142/S0218001420580069, (to appear in print)

  25. Niesler T, Willett D (2006) Language identification and multilingual speech recognition using discriminatively trained acoustic models. In: Multilingual speech and language processing

  26. Nyodu K, Sambyo K (2018) Automatic identification of Arunachal language using K-nearest neighbor algorithm. In: 2018 international conference on advances in computing, communication control and networking (ICACCCN). IEEE, pp 213–216

  27. Obaidullah SM, Bose A, Mukherjee H, Santosh KC, Das N, Roy K (2018) Extreme learning machine for handwritten Indic script identification in multiscript documents. J Electron Imaging 27(5):051214

    Article  Google Scholar 

  28. Rao KS, Maity S, Reddy VR (2013) Pitch synchronous and glottal closure based speech analysis for language recognition. Int J Speech Technol 16(4):413–430

    Article  Google Scholar 

  29. Rebai I, BenAyed Y, Mahdi W (2017) Improving of open-set language identification by using deep SVM and thresholding functions. In: 2017 IEEE/ACS 14th international conference on computer systems and applications (AICCSA). IEEE, pp 796–802

  30. Reddy VR, Maity S, Rao KS (2013) Identification of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol 16(4):489–511

    Article  Google Scholar 

  31. Revathi A, Jeyalakshmi C (2017) Comparative analysis on the use of features and models for validating language identification system. In: International conference on inventive computing and informatics (ICICI). IEEE, pp 693–698

  32. Siri, https://www.apple.com/in/siri/ [Online; Accessed 5 Oct 2019]

  33. Tang Z, Wang D, Chen Y, Li L, Abel A (2018) Phonetic temporal neural model for language identification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(1):134–144

    Article  Google Scholar 

  34. Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2018) Deep learning for word-level handwritten Indic script identification. arXiv:1801.01627

  35. Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2019) Improved word-level handwritten Indic script identification by integrating small convolutional neural networks. Neural Comput & Appl: 1–16 https://doi.org/10.1007/s00521-019-04111-1

  36. Vajda S, Santosh KC (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: RTIP2R-2016, pp 185–193

  37. Wang JC, Wang CY, Chin YH, Liu YT, Chen ET, Chang PC (2017) Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition. Multimed Tools Appl 76(3):4055–4068

    Article  Google Scholar 

  38. Zhan Q, Zhang L, Deng H, Xie X (2018) An improved LSTM for language identification. In: 2018 14th IEEE international conference on signal processing (ICSP). IEEE, pp 609–612

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankita Dhar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukherjee, H., Dhar, A., Obaidullah, S.M. et al. Image-based features for speech signal classification. Multimed Tools Appl 79, 34913–34929 (2020). https://doi.org/10.1007/s11042-019-08553-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08553-6

Keywords

Navigation