Skip to main content
Log in

Speech/music classification using visual and spectral chromagram features

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Automatic speech/music classification is an important tool in multimedia content analysis and retrieval which efficiently categorizes input audio and store it into relevant classes. This article proposes use of chromagram textural and spectral features for speech and music classification. Chromagram textural feature set is based on transforming the input audio into a chromagram image representation and then extracting uniform local binary pattern textural descriptors. Chroma spectral features involves novel chroma bin features which exploits music tonality present in the music signal. The optimal feature subset from the original feature set is selected using eigenvector centrality based feature selection, removing the redundant and irrelevant features and further enhancing the prediction performance. The performance of the algorithm is evaluated using S&S, GTZAN and MUSAN databases providing the advantage and suitability of both chroma spectral and visual features for the classification task. Extensive experiments performed using support vector machine classifier shows that the chromagram textural descriptors outperform other state-of-the-art approaches. Besides, good results are also achieved in the mismatched training and testing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

Download references

Acknowledgements

Authors would like to thank Eric Scheirer and Malcolm Slaney for making their speech/music database available for us. Also, we acknowledge the help of David Snyder and Guoguo Chen and Daniel Povey for providing MUSAN corpus. We are also thankful to the anonymous reviewers for their insightful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gajanan K. Birajdar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Birajdar, G.K., Patil, M.D. Speech/music classification using visual and spectral chromagram features. J Ambient Intell Human Comput 11, 329–347 (2020). https://doi.org/10.1007/s12652-019-01303-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-019-01303-4

Keywords

Navigation