Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition

Li, Xiaoquan; Wang, Kaiqi; Soraghan, John; Ren, Jinchang

doi:10.1007/978-3-030-43859-3_6

Xiaoquan Li¹²,
Kaiqi Wang¹²,
John Soraghan¹² &
…
Jinchang Ren¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12103))

Included in the following conference series:

International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar)

1630 Accesses
10 Citations

Abstract

As a subset of music information retrieval (MIR), predominant musical instruments recognition (PMIR) has attracted substantial interest in recent years due to its uniqueness and high commercial value in key areas of music analysis such as music retrieval and automatic music transcription. With the attention paid to deep learning and artificial intelligence, they have been more and more widely applied in the field of MIR, thus making breakthroughs in some sub-fields that have been stuck in the bottleneck. In this paper, the Hilbert-Huang Transform (HHT) is employed to map one-dimensional audio data into two-dimensional matrix format, followed by a deep convolutional neural network developed to learn affluent and effective features for PMIR. In total 6705 audio pieces including 11 musical instruments are used to validate the efficacy of our proposed approach. The results are compared to four benchmarking methods and show significant improvements in terms of precision, recall and F1 measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

A Multitask Learning Approach for Chinese National Instruments Recognition and Timbre Space Regression

Music Genre Classification with Convolutional Neural Networks and Comparison with F, Q, and Mel Spectrogram-Based Images

References

Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Article Google Scholar
Battiti, R.: First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput. 4, 141–166 (1992)
Article Google Scholar
Downie, J.S., Ehmann, A.F., Bay, M., Jones, M.C.: The music information retrieval evaluation eXchange: some observations and insights. Stud. Comput. Intell. 274, 93–115 (2010)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Google Scholar
Kim, D., Sung, T.T., Cho, S., Lee, G., Sohn, C.-B.: A single predominant instrument recognition of polyphonic music using CNN-based timbre analysis. Int. J. Eng. Technol. 7, 590–593 (2018)
Article Google Scholar
Han, Y., Kim, J., Lee, K., Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25, 208–221 (2017)
Article Google Scholar
Fletcher, N.H., Rossing, T.D.: The Physics of Musical Instruments. Springer, Heidelberg (2012)
MATH Google Scholar
McAdams, S., Giordano, B.L.: The perception of musical timbre. In: The Oxford Handbook of Music Psychology, pp. 113–123 (2016)
Google Scholar
Bhalke, D., Rao, C.R., Bormane, D.S.: Automatic musical instrument classification using fractional fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 46, 425–446 (2016)
Article Google Scholar
Banerjee, A., Ghosh, A., Palit, S., Ballester, M.A.F.: A novel approach to string instrument recognition. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds.) ICISP 2018. LNCS, vol. 10884, pp. 165–175. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94211-7_19
Chapter Google Scholar
Slizovskaia, O., Gómez, E., Haro, G.: Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. In: SMC 2016 – 13th Sound and Music Computing Conference, Proceedings, pp. 442–447 (2016)
Google Scholar
Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012)
Google Scholar
Li, P., Qian, J., Wang, T.: Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv preprint: arXiv:1511.05520 (2015)
Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive mir research. In: ISMIR, pp. 155–160 (2014)
Google Scholar
Hung, Y.N., Chen, Y.A., Yang, Y.H.: Multitask learning for frame-level instrument recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 381–385 (2019)
Google Scholar
Gururani, S., Sharma, M., Lerch, A.: An Attention Mechanism for Musical Instrument Recognition. arXiv preprint: arXiv:1907.04294 (2019)
Humphrey, E., Durand, S., McFee, B.: OpenMIC-2018: an open data-set for multiple instrument recognition. In: ISMIR, pp. 438–444 (2018)
Google Scholar
Sandoval, S., De Leon, P.L., Liss, J.M.: Hilbert spectral analysis of vowels using intrinsic mode functions. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 569–575. IEEE (2015)
Google Scholar
Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms. Applications. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21945-5
Book Google Scholar
Lerch, A.: An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics. Wiley-IEEE Press, Hoboken (2012)
Book Google Scholar
Ayenu-Prah, A., Attoh-Okine, N.: Comparative study of Hilbert-Huang transform, Fourier transform and wavelet transform in pavement profile analysis. Veh. Syst. Dyn. 47, 437–456 (2009)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2014)
Google Scholar
Yan, Y., et al.: Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn. Comput. 10, 94–104 (2018)
Article Google Scholar
Peeters, G.: A large set of audio features for sound description (similarity and classification). CUIDADO project IRCAM technical report (2004)
Google Scholar
Zabalza, J., et al.: Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 185, 1–10 (2016)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Sun, G., Ma, P., Ren, J., Zhang, A., Jia, X.: A stability constrained adaptive alpha for gravitational search algorithm. Knowl.-Based Syst. 139, 200–213 (2018)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank, Dr Yijun Yan for useful discussions and Calum MacLellan for kindly proofreading of the paper.

Author information

Authors and Affiliations

Department of Electronic and Electrical Engineering, University of Strathclyde, Royal College Building, 204 George Street, Glasgow, G1 1XW, UK
Xiaoquan Li, Kaiqi Wang, John Soraghan & Jinchang Ren

Authors

Xiaoquan Li
View author publications
You can also search for this author in PubMed Google Scholar
Kaiqi Wang
View author publications
You can also search for this author in PubMed Google Scholar
John Soraghan
View author publications
You can also search for this author in PubMed Google Scholar
Jinchang Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinchang Ren .

Editor information

Editors and Affiliations

University of A Coruña, A Coruña, Spain
Juan Romero
Aston University, Birmingham, UK
Anikó Ekárt
University of Coimbra, Coimbra, Portugal
Tiago Martins
University of Coimbra, Coimbra, Portugal
João Correia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Wang, K., Soraghan, J., Ren, J. (2020). Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition. In: Romero, J., Ekárt, A., Martins, T., Correia, J. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2020. Lecture Notes in Computer Science(), vol 12103. Springer, Cham. https://doi.org/10.1007/978-3-030-43859-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-43859-3_6
Published: 09 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43858-6
Online ISBN: 978-3-030-43859-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

A Multitask Learning Approach for Chinese National Instruments Recognition and Timbre Space Regression

Music Genre Classification with Convolutional Neural Networks and Comparison with F, Q, and Mel Spectrogram-Based Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fusion of Hilbert-Huang Transform and Deep Convolutional Neural Network for Predominant Musical Instruments Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

A Multitask Learning Approach for Chinese National Instruments Recognition and Timbre Space Regression

Music Genre Classification with Convolutional Neural Networks and Comparison with F, Q, and Mel Spectrogram-Based Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation