Skip to main content
Log in

Combined angular margin and cosine margin softmax loss for music classification based on spectrograms

  • S. I. : Effective and Efficient Deep Learning
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Spectrograms provide rich feature information of music data. Significant progress has been made in music classification using spectrograms and Convolutional Neural Networks (CNNs). However, the softmax loss commonly used in existing CNNs lacks sufficient power to discriminate deep features of music. To overcome this limitation, we propose a Combined Angular Margin and Cosine Margin Softmax Loss (AMCM-Softmax) approach in this paper to enhance intra-class compactness and inter-class discrepancy simultaneously. Specifically, normalization on the weight vectors and feature vectors is adopted to eliminate radial variations. Then, an angular margin parameter and a cosine margin parameter are introduced to maximize the decision margin by enforcing angular and cosine margin constraints. Consequently, the discrimination of features is enhanced by normalization and margin maximization. The decision boundary and the target logit curve of AMCM-Softmax can provide a clear geometric interpretation. Extensive experiments on music datasets show that AMCM-Softmax consistently outperforms the current state-of-the-art approaches in classifying genre and emotion. Our work also shows that a margin loss function can lead to better performance and be used in an advanced CNN model for music classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/makvel/mer500

References

  1. Costa YM, Oliveira LS, Silla CN Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38

    Article  Google Scholar 

  2. Simonetta F, Ntalampiras S, Avanzini F (2019) Multimodal music information processing and retrieval: survey and future challenges. In: 2019 international workshop on multilayer music representation and processing (MMRP), pp 10–18

  3. Zhuang Y, Chen Y, Zheng J (2020) Music genre classification with transformer classifier. In: Proceedings of the 2020 4th international conference on digital signal processing, Chengdu, China, June 19–21, 2020, pp 155–159

  4. Chaudhary D, Singh NP, Singh S (2020) Development of music emotion classification system using convolution neural network. Int J Speech Technol 1–10

  5. Doerfler M, Grill T, Bammer R, Flexer A (2020) Basic filters for convolutional neural networks: training or design. Neural Comput Appl 32(4):941–954

    Article  Google Scholar 

  6. Choi K, Fazekas G, Sandler M (2016) Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298

  7. Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA, March 5–9, 2017, pp 2392–2396

  8. Liu W, Wen Y, Yu Z, Yang M (2016) Large-margin softmax loss for convolutional neural networks. In: International conference on machine learning, New York City, NY, USA, June 19–24, 2016, pp 507–516

  9. Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) SphereFace: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, July 21–26, 2017, pp 6738–6746

  10. Wang F, Cheng J, Liu W, Liu H (2018) Additive margin softmax for face verification. IEEE Signal Process Lett 25(7):926–930

    Article  Google Scholar 

  11. Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, June 18–22, 2018, pp 5265–5274

  12. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Amsterdam, The Netherlands, October 11–14, 2016, pp 499–515

  13. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Long Beach, CA, USA, June 16–20, 2019, pp 4690–4699

  14. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), New York, NY, USA, June 17–22, 2006, Vol. 2, pp 1735–1742

  15. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, June 7–12, 2015, pp 815–823

  16. Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, June 23–28, 2014, pp 1386–1393

  17. Liu H, Zhu X, Lei Z, Li SZ (2019) Adaptiveface: Adaptive margin and sampling for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Long Beach, CA, USA, June 16–20, 2019, pp 11947–11956

  18. Ferraro A, Bogdanov D, Jay XS, Jeon H, Yoon J (2021) How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging. In: 2020 28th European signal processing conference (EUSIPCO), Amsterdam, Netherlands, January 18–21, 2021, pp 131–135

  19. Liang B, Gu M (2020) Music genre classification using transfer learning. In: 2020 IEEE conference on multimedia information processing and retrieval (MIPR), Shenzhen, China, August 6–8, 2020, pp 392–393

  20. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In Proceedings of the British machine vision conference, Swansea, UK, September 7–10, 2015, pp 1–12

  21. Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identification-verification. Adv Neural Inf Process Syst 27:1988–1996

    Google Scholar 

  22. Taenzer M, Abeßer J, Mimilakis SI, Weiß C, Müller M, Lukashevich H, Fraunhofer IDMT (2019) Investigating CNN-based instrument family recognition for western classical music recordings. In: Proceedings of the 20th international society for music information retrieval conference, Delft, The Netherlands, November 4–8, 2019, pp 612–619

  23. Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, June 23–28, 2014, pp 1701–1708

  24. Bhattacharjee M, Prasanna SM, Guha P (2020) Speech/music classification using features from spectral peaks. IEEE/ACM Trans Audio Speech Language Process 28:1549–1559

    Article  Google Scholar 

  25. Kim J, Urbano J, Liem C, Hanjalic A (2020) One deep music representation to rule them all? a comparative analysis of different representation learning strategies. Neural Comput Appl 32(4):1067–1093

    Article  Google Scholar 

  26. Hansen C, Hansen C, Maystre L, Mehrotra R, Brost B, Tomasi F, Lalmas M (2020) Contextual and sequential user embeddings for large-scale music recommendation. In: Fourteenth ACM conference on recommender systems, virtual event, Brazil, September 22–26, 2020, pp 53–62

  27. Rahardwika DS, Rachmawanto EH, Sari CA, Irawan C, Kusumaningrum DP, Trusthi SL (2020) Comparison of SVM, KNN, and NB Classifier for Genre Music Classification based on Metadata. In: 2020 international seminar on application for technology of information and communication (iSemantic), pp 12–16

  28. Sharma G, Umapathy K, Krishnan S (2020) Trends in audio signal feature extraction methods. Appl Acoust 158:107020

    Article  Google Scholar 

  29. Zeng Y, Mao H, Peng D, Yi Z (2019) Spectrogram based multi-task audio classification. Multimed Tools Appl 78(3):3705–3722

    Article  Google Scholar 

  30. Choi K, Fazekas G, Sandler M (2016) Explaining deep convolutional neural networks on music classification. arXiv preprint arXiv:1607.02444

  31. Kong Q, Feng X, Li Y (2014) Music genre classification using convolutional neural network. In: Proceedings of international society for music information retrieval conference, Taipei, Taiwan, China, October 27–31, 2014

  32. Lidy T, Schindler A (2016) Parallel convolutional neural networks for music genre and mood classification. In: Proceedings of the 17th international society for music information retrieval conference, New York City, United States, August 7–11, 2016

  33. Liu X, Chen Q, Wu X, Liu Y, Liu Y (2017) CNN based music emotion classification. arXiv preprint arXiv:1704.05665

  34. Zhang W, Lei W, Xu X, Xing X (2016) Improved music genre classification with convolutional neural networks. In: 17th annual conference of the international speech communication association, San Francisco, CA, USA, September 8–12, 2016, pp 3304–3308

  35. Pons J, Serra X (2019) Randomly weighted CNNs for (music) audio classification. In: IEEE international conference on acoustics, speech and signal processing, Brighton, United Kingdom, May 12–17, 2019, pp 336–340

  36. Li C, Bao Z, Li L, Zhao Z (2020) Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition. Inf Process Manag 57(3):102185

    Article  Google Scholar 

  37. Russo M, Kraljević L, Stella M, Sikora M (2020) Cochleogram-based approach for detecting perceived emotions in music. Inf Process Manag 57(5):102270

  38. Zhou ZH, Feng J (2019) Deep forest. National Sci Rev 6(1):74–86

    Article  Google Scholar 

  39. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), San Diego, CA, USA, June 20–26, 2005, Vol. 1, pp 539–546

  40. Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, Copenhagen, Denmark, October 12–14, 2015, pp 84–92

  41. Kemelmacher-Shlizerman I, Seitz SM, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June 27–30, 2016, pp 4873–4882

  42. Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: the 24th IEEE conference on computer vision and pattern recognition, Colorado Springs, CO, USA, 20–25 June 2011, pp 529–534

  43. Ranjan R, Castillo CD, Chellappa R (2017) L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507

  44. Wang F, Xiang X, Cheng J, Yuille AL (2017) Normface: L2 hypersphere embedding for face verification. In: Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA, October 23–27, 2017, pp 1041–1049

  45. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302

    Article  Google Scholar 

  46. Defferrard M, Benzi K, Vandergheynst P, Bresson X (2017) FMA: a dataset for music analysis. In: 18th international society for music information retrieval conference, Suzhou, China, October 23–27, 2017, pp 316–323

  47. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, June 27–30, 2016, pp 770–778

  48. Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd

  49. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Acknowledgements

This work was supported by the Jiangsu Provincial Key Constructive Laboratory for Big Data of Psychology and Cognitive Science under Grant No.72592062003G, the Natural Science Foundation of the Colleges and Universities in Anhui Province of China under Grant No. KJ2020A0035 and No. KJ2021A0640, and the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA).

Funding

Jiangsu Provincial Key Constructive Laboratory for Big Data of Psychology and Cognitive Science, No.72592062003G, Xiaofeng Yuan, Natural Science Foundation of the Colleges and Universities in Anhui Province of China, No. KJ2020A0035, Yi Yang, Natural Science Foundation of the Colleges and Universities in Anhui Province of China, No. KJ2021A0640, Yang Wang, Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA), Hong Yan.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingxian Li or Lixin Han.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Han, L., Wang, Y. et al. Combined angular margin and cosine margin softmax loss for music classification based on spectrograms. Neural Comput & Applic 34, 10337–10353 (2022). https://doi.org/10.1007/s00521-022-06896-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-06896-0

Keywords

Navigation