Skip to main content

Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12103))

Abstract

As a subset of music information retrieval (MIR), predominant musical instruments recognition (PMIR) has attracted substantial interest in recent years due to its uniqueness and high commercial value in key areas of music analysis such as music retrieval and automatic music transcription. With the attention paid to deep learning and artificial intelligence, they have been more and more widely applied in the field of MIR, thus making breakthroughs in some sub-fields that have been stuck in the bottleneck. In this paper, the Hilbert-Huang Transform (HHT) is employed to map one-dimensional audio data into two-dimensional matrix format, followed by a deep convolutional neural network developed to learn affluent and effective features for PMIR. In total 6705 audio pieces including 11 musical instruments are used to validate the efficacy of our proposed approach. The results are compared to four benchmarking methods and show significant improvements in terms of precision, recall and F1 measures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)

    Article  Google Scholar 

  2. Battiti, R.: First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput. 4, 141–166 (1992)

    Article  Google Scholar 

  3. Downie, J.S., Ehmann, A.F., Bay, M., Jones, M.C.: The music information retrieval evaluation eXchange: some observations and insights. Stud. Comput. Intell. 274, 93–115 (2010)

    Google Scholar 

  4. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  5. Kim, D., Sung, T.T., Cho, S., Lee, G., Sohn, C.-B.: A single predominant instrument recognition of polyphonic music using CNN-based timbre analysis. Int. J. Eng. Technol. 7, 590–593 (2018)

    Article  Google Scholar 

  6. Han, Y., Kim, J., Lee, K., Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25, 208–221 (2017)

    Article  Google Scholar 

  7. Fletcher, N.H., Rossing, T.D.: The Physics of Musical Instruments. Springer, Heidelberg (2012)

    MATH  Google Scholar 

  8. McAdams, S., Giordano, B.L.: The perception of musical timbre. In: The Oxford Handbook of Music Psychology, pp. 113–123 (2016)

    Google Scholar 

  9. Bhalke, D., Rao, C.R., Bormane, D.S.: Automatic musical instrument classification using fractional fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 46, 425–446 (2016)

    Article  Google Scholar 

  10. Banerjee, A., Ghosh, A., Palit, S., Ballester, M.A.F.: A novel approach to string instrument recognition. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds.) ICISP 2018. LNCS, vol. 10884, pp. 165–175. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94211-7_19

    Chapter  Google Scholar 

  11. Slizovskaia, O., Gómez, E., Haro, G.: Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. In: SMC 2016 – 13th Sound and Music Computing Conference, Proceedings, pp. 442–447 (2016)

    Google Scholar 

  12. Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012)

    Google Scholar 

  13. Li, P., Qian, J., Wang, T.: Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv preprint: arXiv:1511.05520 (2015)

  14. Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive mir research. In: ISMIR, pp. 155–160 (2014)

    Google Scholar 

  15. Hung, Y.N., Chen, Y.A., Yang, Y.H.: Multitask learning for frame-level instrument recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 381–385 (2019)

    Google Scholar 

  16. Gururani, S., Sharma, M., Lerch, A.: An Attention Mechanism for Musical Instrument Recognition. arXiv preprint: arXiv:1907.04294 (2019)

  17. Humphrey, E., Durand, S., McFee, B.: OpenMIC-2018: an open data-set for multiple instrument recognition. In: ISMIR, pp. 438–444 (2018)

    Google Scholar 

  18. Sandoval, S., De Leon, P.L., Liss, J.M.: Hilbert spectral analysis of vowels using intrinsic mode functions. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 569–575. IEEE (2015)

    Google Scholar 

  19. Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms. Applications. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21945-5

    Book  Google Scholar 

  20. Lerch, A.: An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics. Wiley-IEEE Press, Hoboken (2012)

    Book  Google Scholar 

  21. Ayenu-Prah, A., Attoh-Okine, N.: Comparative study of Hilbert-Huang transform, Fourier transform and wavelet transform in pavement profile analysis. Veh. Syst. Dyn. 47, 437–456 (2009)

    Article  Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2014)

    Google Scholar 

  23. Yan, Y., et al.: Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn. Comput. 10, 94–104 (2018)

    Article  Google Scholar 

  24. Peeters, G.: A large set of audio features for sound description (similarity and classification). CUIDADO project IRCAM technical report (2004)

    Google Scholar 

  25. Zabalza, J., et al.: Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 185, 1–10 (2016)

    Article  Google Scholar 

  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  27. Sun, G., Ma, P., Ren, J., Zhang, A., Jia, X.: A stability constrained adaptive alpha for gravitational search algorithm. Knowl.-Based Syst. 139, 200–213 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank, Dr Yijun Yan for useful discussions and Calum MacLellan for kindly proofreading of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinchang Ren .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Wang, K., Soraghan, J., Ren, J. (2020). Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition. In: Romero, J., Ekárt, A., Martins, T., Correia, J. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2020. Lecture Notes in Computer Science(), vol 12103. Springer, Cham. https://doi.org/10.1007/978-3-030-43859-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43859-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43858-6

  • Online ISBN: 978-3-030-43859-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics