Abstract
As a subset of music information retrieval (MIR), predominant musical instruments recognition (PMIR) has attracted substantial interest in recent years due to its uniqueness and high commercial value in key areas of music analysis such as music retrieval and automatic music transcription. With the attention paid to deep learning and artificial intelligence, they have been more and more widely applied in the field of MIR, thus making breakthroughs in some sub-fields that have been stuck in the bottleneck. In this paper, the Hilbert-Huang Transform (HHT) is employed to map one-dimensional audio data into two-dimensional matrix format, followed by a deep convolutional neural network developed to learn affluent and effective features for PMIR. In total 6705 audio pieces including 11 musical instruments are used to validate the efficacy of our proposed approach. The results are compared to four benchmarking methods and show significant improvements in terms of precision, recall and F1 measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Battiti, R.: First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput. 4, 141–166 (1992)
Downie, J.S., Ehmann, A.F., Bay, M., Jones, M.C.: The music information retrieval evaluation eXchange: some observations and insights. Stud. Comput. Intell. 274, 93–115 (2010)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Kim, D., Sung, T.T., Cho, S., Lee, G., Sohn, C.-B.: A single predominant instrument recognition of polyphonic music using CNN-based timbre analysis. Int. J. Eng. Technol. 7, 590–593 (2018)
Han, Y., Kim, J., Lee, K., Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25, 208–221 (2017)
Fletcher, N.H., Rossing, T.D.: The Physics of Musical Instruments. Springer, Heidelberg (2012)
McAdams, S., Giordano, B.L.: The perception of musical timbre. In: The Oxford Handbook of Music Psychology, pp. 113–123 (2016)
Bhalke, D., Rao, C.R., Bormane, D.S.: Automatic musical instrument classification using fractional fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 46, 425–446 (2016)
Banerjee, A., Ghosh, A., Palit, S., Ballester, M.A.F.: A novel approach to string instrument recognition. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds.) ICISP 2018. LNCS, vol. 10884, pp. 165–175. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94211-7_19
Slizovskaia, O., Gómez, E., Haro, G.: Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. In: SMC 2016 – 13th Sound and Music Computing Conference, Proceedings, pp. 442–447 (2016)
Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012)
Li, P., Qian, J., Wang, T.: Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv preprint: arXiv:1511.05520 (2015)
Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive mir research. In: ISMIR, pp. 155–160 (2014)
Hung, Y.N., Chen, Y.A., Yang, Y.H.: Multitask learning for frame-level instrument recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 381–385 (2019)
Gururani, S., Sharma, M., Lerch, A.: An Attention Mechanism for Musical Instrument Recognition. arXiv preprint: arXiv:1907.04294 (2019)
Humphrey, E., Durand, S., McFee, B.: OpenMIC-2018: an open data-set for multiple instrument recognition. In: ISMIR, pp. 438–444 (2018)
Sandoval, S., De Leon, P.L., Liss, J.M.: Hilbert spectral analysis of vowels using intrinsic mode functions. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 569–575. IEEE (2015)
Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms. Applications. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21945-5
Lerch, A.: An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics. Wiley-IEEE Press, Hoboken (2012)
Ayenu-Prah, A., Attoh-Okine, N.: Comparative study of Hilbert-Huang transform, Fourier transform and wavelet transform in pavement profile analysis. Veh. Syst. Dyn. 47, 437–456 (2009)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2014)
Yan, Y., et al.: Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn. Comput. 10, 94–104 (2018)
Peeters, G.: A large set of audio features for sound description (similarity and classification). CUIDADO project IRCAM technical report (2004)
Zabalza, J., et al.: Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 185, 1–10 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Sun, G., Ma, P., Ren, J., Zhang, A., Jia, X.: A stability constrained adaptive alpha for gravitational search algorithm. Knowl.-Based Syst. 139, 200–213 (2018)
Acknowledgements
The authors would like to thank, Dr Yijun Yan for useful discussions and Calum MacLellan for kindly proofreading of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, X., Wang, K., Soraghan, J., Ren, J. (2020). Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition. In: Romero, J., Ekárt, A., Martins, T., Correia, J. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2020. Lecture Notes in Computer Science(), vol 12103. Springer, Cham. https://doi.org/10.1007/978-3-030-43859-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-43859-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43858-6
Online ISBN: 978-3-030-43859-3
eBook Packages: Computer ScienceComputer Science (R0)