Skip to main content
Log in

An ensemble model of CNN with Bi-LSTM for automatic singer identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the present-day scenario, gender detection has become significant in content-based multimedia systems. An automated mechanism for gender identification is mainly in demand to process the massive data. Singer identification is a popular topic in music information recommender systems that includes identifying the singer from the song based on the singer’s voice and other background key features like timbre and pitch. Many models like GMM, SVM, and MLP are broadly used for classification and singer identification. Moreover, most current models have limitations where vocals and instrumental music are separated manually, and only vocals are used to build and train the model. To deal with unstructured data like music, the deep learning techniques are very suitable and have exhibited exemplary performance in similar studies. In acoustic modeling, the Deep Neural Networks (DNN) models like convolutional neural networks (CNN) have played a promising role in classifying unstructured and poorly labeled data. In the current study, an ensemble model, a combination of a CNN model with bi-directional LSTM, is considered for singer identification from the spectrogram images generated from the audio clip. CNN models are proven to better handle variable-length input data by identifying the features. Bi-LSTM will yield better accuracy by remembering the essential features over time and addressing temporal contextual information. The experimentation is performed on the Indian songs and MIR-1 k data set, and it is observed that the proposed model has outperformed with a prediction accuracy of 97.4%. The performance of the proposed model is being compared against the existing models in the current study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Not applicable for the current study.

References

  1. Alkhawaldeh RS (2019) DGR: gender recognition of human speech using one-dimensional conventional neural network, Sci Program, vol. 2019, Article ID 7213717, pp.1–12. https://doi.org/10.1155/2019/721371.

  2. Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In 2017 international conference on platform technology and service (PlatCon). pp. 1–5. IEEE

  3. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  4. Bhatia R, Srivastava S, Bhatia V, Singh M (2018) Analysis of audio features for music representation. In 2018 7th international conference on reliability, infocom technologies and optimization (trends and future directions)(ICRITO) (pp. 261-266). IEEE

  5. Björkner E (2006) Why so different?-aspects of voice characteristics in operatic and musical theatre singing: aspects of voice characteristics in operatic and musical theatre singing. Doctoral dissertation, KTH

  6. Costa YMG, Oliveira LS, Silla CN Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38

    Article  Google Scholar 

  7. Deshmukh SH, Bhirud SG (2014) North Indian classical Music’s singer identification by timbre recognition using MIR toolbox. Int J Comput Appl 91(4):1–5

    Google Scholar 

  8. Dharini D, Revathy A (2014) Singer identification using clustering algorithm. In 2014 international conference on communication and signal processing (pp. 1927-1931). IEEE.

  9. Dharini D, Revathy A, Kalaivani M (2018) Contrast of Gaussian mixture model and clustering algorithm for singer identification. In 2018 international conference on computer communication and informatics (ICCCI) (pp. 1-7). IEEE

  10. Fujihara H, Goto M, Kitahara T, Okuno HG (2010) A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Trans Audio Speech Lang Process 18(3):638–648

    Article  Google Scholar 

  11. Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical and jazz music databases. Ismir 2:287–288

    Google Scholar 

  12. Jitendra M, Radhika Y (2021) An automated music recommendation system based on listener preferences. In recent trends in intensive computing (pp. 80-87). IOS Press. https://doi.org/10.3233/APC210182

  13. Kooshan S, Fard H, Toroghi RM (2019) Singer identification by vocal parts detection and singer classification using lstm neural networks. In 2019 4th international conference on pattern recognition and image analysis (IPRIA) (pp. 246-250). IEEE.

  14. Lagrange M, Ozerov A, Vincent E (2012) Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning. In 13th International Society for Music Information Retrieval Conference (ISMIR).

  15. Leglaive S, Hennequin R, Badeau R (2015) Singing voice detection with deep recurrent neural networks. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 121-125). IEEE

  16. Lehner B, Widmer G, Sonnleitner R (2014) On the reduction of false positives in singing voice detection. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7480-7484). IEEE.

  17. Lehner B, Widmer G, Bock S (2015) A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In 2015 23rd European signal processing conference (EUSIPCO) (pp. 21-25). IEEE

  18. Li L, Ni I, Yang L (2017) Music transcription using deep learning

  19. Loni DY, Subbaraman S (2019) Robust singer identification of Indian playback singers. EURASIP J Audio, Speech, Music Process 2019(1):1–14

    Article  Google Scholar 

  20. Metilda Florence S, Mohan S (2017) A novel approach to identify a singer in a video song using spectral and cepstral features. J Chem Pharm Sci 10(1):462–465

    Google Scholar 

  21. Mukkamala SNVJ, Radhika Y (2021) Singer Gender Classification using Feature-based and Spectrograms with Deep Convolutional Neural Network. Int J Adv Comput Sci Appl (IJACSA) 12(2). https://doi.org/10.14569/IJACSA.2021.0120218

  22. Murthy, Srinivasa YV, Koolagudi SG (2018) Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS). Expert Syst Appl 106:77–91

    Article  Google Scholar 

  23. Murthy YVS, Jeshventh TKR, Zoeb M, Saumyadip M, Shashidhar GK (2018) Singer identification from smaller snippets of audio clips using acoustic features and DNNs. In 2018 Eleventh International Conference on Contemporary Computing (IC3). IEEE. 1–6

  24. Murthy, Srinivasa YV, Koolagudi SG (2018) Content-based music information retrieval (cb-mir) and its applications toward the music industry: A review. ACM Computing Surveys (CSUR) 51(3):1–46

    Article  Google Scholar 

  25. Murthy YS, Koolagudi SG, Raja TJ (2021) Singer identification for Indian singers using convolutional neural networks. Int J Speech Technol 1-16

  26. Nameirakpam J, Biswas S, Bonjyostna A (2019) Singer identification using wavelet transform. In 2019 2nd international conference on innovations in electronics, signal processing and communication (IESC) (pp. 238-242). IEEE

  27. Passricha V, Aggarwal RK (2020) A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. J Intell Syst 29(1):1261–1274

    Google Scholar 

  28. Patil HA, Radadia PG, Basu TK (2012) IEEE International Conference on Asian language Processing, Hanoi. Combining evidences from Mel cepstral features and cepstral mean subtracted features for singer identification, 145–148. https://doi.org/10.1109/IALP.2012.33.

  29. Ratanpara T, Patel N (2015) Singer identification using perceptual features and cepstral coefficients of an audio signal from Indian video songs. EURASIP J Audio, Speech Music Process 2015(1):1–12

    Article  Google Scholar 

  30. Robinson T, Hochberg M, Renals S (1996). The use of recurrent neural networks in continuous speech recognition. In automatic speech and speaker recognition (pp. 233–258). Springer, Boston, MA

  31. Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling

  32. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  33. Sharma B, Das RK, Li H (2019) On the importance of audio-source separation for singer identification in polyphonic music. In INTERSPEECH (pp. 2020-2024).

  34. Shen J, Shepherd J, Cui B, Tan KL (2009) A novel framework for efficient automated singer identification in large music databases. ACM Trans Inform Syst (TOIS) 27(3):1–31

    Article  Google Scholar 

  35. Sridhar R, Geetha TV (2008) Music information retrieval of carnatic songs based on carnatic music singer identification. In 2008 international conference on computer and electrical engineering (pp. 407-411). IEEE

  36. Srinivasu PN, Balas VE (2021) Self-learning network-based segmentation for real-time brain MR images through HARIS. PeerJ Comput Sci 7:e654

    Article  Google Scholar 

  37. Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852. https://doi.org/10.3390/s21082852

    Article  Google Scholar 

  38. Tasleem A, Singh S, Singh B, Pahuja H (2016) Designing of a gender based classifier for Western music. In international conference on advances in computing and data sciences (pp. 81-90). Springer, Singapore.

  39. Wai SL (2010) Singer identification using Gaussian mixture model (GMM). Doctoral dissertation, MERAL Portal

  40. Weninger F, Wöllmer M, Schuller B. (2011). Automatic assessment of singer traits in popular music: Gender, age, height and race. In Proc. 12th Intern. Society for Music Information Retrieval Conference, ISMIR 2011, Miami, FL, USA

  41. Zhang T (2003) Automatic singer identification. In 2003 international conference on multimedia and expo. ICME’03. Proceedings (cat. No. 03TH8698) (Vol. 1, pp. I-33). IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mukkamala S. N. V. Jitendra.

Ethics declarations

Conflict of interest

The authors declare no potential conflict of interest concerning the current study aspects.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jitendra, M.S.N.V., Radhika, Y. An ensemble model of CNN with Bi-LSTM for automatic singer identification. Multimed Tools Appl 82, 38853–38874 (2023). https://doi.org/10.1007/s11042-023-14802-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14802-6

Keywords

Navigation