An ensemble model of CNN with Bi-LSTM for automatic singer identification

Jitendra, Mukkamala S. N. V.; Radhika, Y.

doi:10.1007/s11042-023-14802-6

An ensemble model of CNN with Bi-LSTM for automatic singer identification

Published: 27 March 2023

Volume 82, pages 38853–38874, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mukkamala S. N. V. Jitendra¹ &
Y. Radhika¹

198 Accesses
1 Citation
Explore all metrics

Abstract

In the present-day scenario, gender detection has become significant in content-based multimedia systems. An automated mechanism for gender identification is mainly in demand to process the massive data. Singer identification is a popular topic in music information recommender systems that includes identifying the singer from the song based on the singer’s voice and other background key features like timbre and pitch. Many models like GMM, SVM, and MLP are broadly used for classification and singer identification. Moreover, most current models have limitations where vocals and instrumental music are separated manually, and only vocals are used to build and train the model. To deal with unstructured data like music, the deep learning techniques are very suitable and have exhibited exemplary performance in similar studies. In acoustic modeling, the Deep Neural Networks (DNN) models like convolutional neural networks (CNN) have played a promising role in classifying unstructured and poorly labeled data. In the current study, an ensemble model, a combination of a CNN model with bi-directional LSTM, is considered for singer identification from the spectrogram images generated from the audio clip. CNN models are proven to better handle variable-length input data by identifying the features. Bi-LSTM will yield better accuracy by remembering the essential features over time and addressing temporal contextual information. The experimentation is performed on the Indian songs and MIR-1 k data set, and it is observed that the proposed model has outperformed with a prediction accuracy of 97.4%. The performance of the proposed model is being compared against the existing models in the current study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation

Singer identification for Indian singers using convolutional neural networks

Article 04 May 2021

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Article Open access 14 September 2022

Data availability

Not applicable for the current study.

References

Alkhawaldeh RS (2019) DGR: gender recognition of human speech using one-dimensional conventional neural network, Sci Program, vol. 2019, Article ID 7213717, pp.1–12. https://doi.org/10.1155/2019/721371.
Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In 2017 international conference on platform technology and service (PlatCon). pp. 1–5. IEEE
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Bhatia R, Srivastava S, Bhatia V, Singh M (2018) Analysis of audio features for music representation. In 2018 7th international conference on reliability, infocom technologies and optimization (trends and future directions)(ICRITO) (pp. 261-266). IEEE
Björkner E (2006) Why so different?-aspects of voice characteristics in operatic and musical theatre singing: aspects of voice characteristics in operatic and musical theatre singing. Doctoral dissertation, KTH
Costa YMG, Oliveira LS, Silla CN Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38
Article Google Scholar
Deshmukh SH, Bhirud SG (2014) North Indian classical Music’s singer identification by timbre recognition using MIR toolbox. Int J Comput Appl 91(4):1–5
Google Scholar
Dharini D, Revathy A (2014) Singer identification using clustering algorithm. In 2014 international conference on communication and signal processing (pp. 1927-1931). IEEE.
Dharini D, Revathy A, Kalaivani M (2018) Contrast of Gaussian mixture model and clustering algorithm for singer identification. In 2018 international conference on computer communication and informatics (ICCCI) (pp. 1-7). IEEE
Fujihara H, Goto M, Kitahara T, Okuno HG (2010) A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Trans Audio Speech Lang Process 18(3):638–648
Article Google Scholar
Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical and jazz music databases. Ismir 2:287–288
Google Scholar
Jitendra M, Radhika Y (2021) An automated music recommendation system based on listener preferences. In recent trends in intensive computing (pp. 80-87). IOS Press. https://doi.org/10.3233/APC210182
Kooshan S, Fard H, Toroghi RM (2019) Singer identification by vocal parts detection and singer classification using lstm neural networks. In 2019 4th international conference on pattern recognition and image analysis (IPRIA) (pp. 246-250). IEEE.
Lagrange M, Ozerov A, Vincent E (2012) Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning. In 13th International Society for Music Information Retrieval Conference (ISMIR).
Leglaive S, Hennequin R, Badeau R (2015) Singing voice detection with deep recurrent neural networks. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 121-125). IEEE
Lehner B, Widmer G, Sonnleitner R (2014) On the reduction of false positives in singing voice detection. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7480-7484). IEEE.
Lehner B, Widmer G, Bock S (2015) A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In 2015 23rd European signal processing conference (EUSIPCO) (pp. 21-25). IEEE
Li L, Ni I, Yang L (2017) Music transcription using deep learning
Loni DY, Subbaraman S (2019) Robust singer identification of Indian playback singers. EURASIP J Audio, Speech, Music Process 2019(1):1–14
Article Google Scholar
Metilda Florence S, Mohan S (2017) A novel approach to identify a singer in a video song using spectral and cepstral features. J Chem Pharm Sci 10(1):462–465
Google Scholar
Mukkamala SNVJ, Radhika Y (2021) Singer Gender Classification using Feature-based and Spectrograms with Deep Convolutional Neural Network. Int J Adv Comput Sci Appl (IJACSA) 12(2). https://doi.org/10.14569/IJACSA.2021.0120218
Murthy, Srinivasa YV, Koolagudi SG (2018) Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS). Expert Syst Appl 106:77–91
Article Google Scholar
Murthy YVS, Jeshventh TKR, Zoeb M, Saumyadip M, Shashidhar GK (2018) Singer identification from smaller snippets of audio clips using acoustic features and DNNs. In 2018 Eleventh International Conference on Contemporary Computing (IC3). IEEE. 1–6
Murthy, Srinivasa YV, Koolagudi SG (2018) Content-based music information retrieval (cb-mir) and its applications toward the music industry: A review. ACM Computing Surveys (CSUR) 51(3):1–46
Article Google Scholar
Murthy YS, Koolagudi SG, Raja TJ (2021) Singer identification for Indian singers using convolutional neural networks. Int J Speech Technol 1-16
Nameirakpam J, Biswas S, Bonjyostna A (2019) Singer identification using wavelet transform. In 2019 2nd international conference on innovations in electronics, signal processing and communication (IESC) (pp. 238-242). IEEE
Passricha V, Aggarwal RK (2020) A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. J Intell Syst 29(1):1261–1274
Google Scholar
Patil HA, Radadia PG, Basu TK (2012) IEEE International Conference on Asian language Processing, Hanoi. Combining evidences from Mel cepstral features and cepstral mean subtracted features for singer identification, 145–148. https://doi.org/10.1109/IALP.2012.33.
Ratanpara T, Patel N (2015) Singer identification using perceptual features and cepstral coefficients of an audio signal from Indian video songs. EURASIP J Audio, Speech Music Process 2015(1):1–12
Article Google Scholar
Robinson T, Hochberg M, Renals S (1996). The use of recurrent neural networks in continuous speech recognition. In automatic speech and speaker recognition (pp. 233–258). Springer, Boston, MA
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Sharma B, Das RK, Li H (2019) On the importance of audio-source separation for singer identification in polyphonic music. In INTERSPEECH (pp. 2020-2024).
Shen J, Shepherd J, Cui B, Tan KL (2009) A novel framework for efficient automated singer identification in large music databases. ACM Trans Inform Syst (TOIS) 27(3):1–31
Article Google Scholar
Sridhar R, Geetha TV (2008) Music information retrieval of carnatic songs based on carnatic music singer identification. In 2008 international conference on computer and electrical engineering (pp. 407-411). IEEE
Srinivasu PN, Balas VE (2021) Self-learning network-based segmentation for real-time brain MR images through HARIS. PeerJ Comput Sci 7:e654
Article Google Scholar
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852. https://doi.org/10.3390/s21082852
Article Google Scholar
Tasleem A, Singh S, Singh B, Pahuja H (2016) Designing of a gender based classifier for Western music. In international conference on advances in computing and data sciences (pp. 81-90). Springer, Singapore.
Wai SL (2010) Singer identification using Gaussian mixture model (GMM). Doctoral dissertation, MERAL Portal
Weninger F, Wöllmer M, Schuller B. (2011). Automatic assessment of singer traits in popular music: Gender, age, height and race. In Proc. 12th Intern. Society for Music Information Retrieval Conference, ISMIR 2011, Miami, FL, USA
Zhang T (2003) Automatic singer identification. In 2003 international conference on multimedia and expo. ICME’03. Proceedings (cat. No. 03TH8698) (Vol. 1, pp. I-33). IEEE

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, GITAM School of Technology, GITAM (Deemed-to-be University), Visakhapatnam, AP, 530045, India
Mukkamala S. N. V. Jitendra & Y. Radhika

Authors

Mukkamala S. N. V. Jitendra
View author publications
You can also search for this author in PubMed Google Scholar
Y. Radhika
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mukkamala S. N. V. Jitendra.

Ethics declarations

Conflict of interest

The authors declare no potential conflict of interest concerning the current study aspects.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jitendra, M.S.N.V., Radhika, Y. An ensemble model of CNN with Bi-LSTM for automatic singer identification. Multimed Tools Appl 82, 38853–38874 (2023). https://doi.org/10.1007/s11042-023-14802-6

Download citation

Received: 14 October 2021
Revised: 08 August 2022
Accepted: 05 February 2023
Published: 27 March 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11042-023-14802-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An ensemble model of CNN with Bi-LSTM for automatic singer identification

Abstract

Access this article

Similar content being viewed by others

Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation

Singer identification for Indian singers using convolutional neural networks

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An ensemble model of CNN with Bi-LSTM for automatic singer identification

Abstract

Access this article

Similar content being viewed by others

Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation

Singer identification for Indian singers using convolutional neural networks

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation