Birdsong classification based on multi-feature fusion

Yan, Na; Chen, Aibin; Zhou, Guoxiong; Zhang, Zhiqiang; Liu, Xiangyong; Wang, Jianwu; Liu, Zhihua; Chen, Wenjie

doi:10.1007/s11042-021-11396-9

Birdsong classification based on multi-feature fusion

Published: 08 September 2021

Volume 80, pages 36529–36547, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Na Yan¹,
Aibin Chen ORCID: orcid.org/0000-0003-4410-412X^1,3,
Guoxiong Zhou¹,
Zhiqiang Zhang²,
Xiangyong Liu⁴,
Jianwu Wang⁵,
Zhihua Liu¹ &
…
Wenjie Chen¹

626 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

The classification of birdsong has very important signification to monitor the bird population in the habitats. Aiming at the birdsong dataset with complex and diverse audio background, this paper attempts to introduce an acoustic feature for voice and music analysis: Chroma. It is spliced and fused with the commonly used birdsong features, Log-Mel Spectrogram (LM) and Mel Frequency Cepstrum Coefficient (MFCC), to enrich the representational capacity of single feature; At the same time, in view of the characteristic that birdsong has continuous and dynamic changes in time, a 3DCNN-LSTM combined model is proposed as a classifier to make the network more sensitive to the birdsong information that changes with time. In this paper, we selected four bird audio data from the Xeno-Canto website to evaluate how LM, MFCC and Chroma were fused to maximize the birdsong audio information. The experimental results show that the LM-MFCC-C feature combination achieves the best result of 97.9% mean average precision (mAP) in the experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel deep transfer learning models for recognition of birds sounds in different environment

Article 14 January 2022

Bird Audio Diarization with Faster R-CNN

Birdsong classification based on multi feature channel fusion

Article 28 February 2022

References

Abadi M, Barham P, Chen J et al (2016) Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation {OSDI} 16: 265–283
Bai S, Kolter J Z, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint: arXiv:1803.01271
Bardeli R, Wolff D, Kurth F et al (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recogn Lett 31(12):1524–1534
Article Google Scholar
Boddapati V, Petef A, Rasmusson J et al (2017) Classifying environmental sounds using image recognition networks. Procedia computer science 112:2048–2056. https://doi.org/10.1016/j.procs.2017.08.250
Article Google Scholar
Chachada S, Kuo CCJ (2014) Environmental sound recognition: A survey APSIPA Transactions on Signal and Information Processing 3 https://doi.org/10.1017/ATSIP.2014.12
Ellis D (2007) Chroma feature analysis and synthesis. Resources of Laboratory for the Recognition and Organization of Speech and Audio-LabROSA
Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP J Adv in Signal Process 1:038637. https://doi.org/10.1155/2007/38637
Article MATH Google Scholar
Ghosal D, Kolekar MH (2018) Music Genre Recognition Using Deep Neural Networks and Transfer Learning. In Interspeech. 2087–2091
Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing(pp.6645–6649). IEEE. https://doi.org/10.1109/ICASSP.2013.6638947
Ganchev T, Fakotakis N, Kokkinakis G (2005) Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM(Vol. 1, 191–194).
Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition 580–587
Himawan I, Towsey M, Roe P (2018) 3D convolution recurrent neural networks for bird sound detection
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 770–778)
Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708
Izonin I, Tkachenko R, Kryvinska N, et al (2019) Multiple Linear Regression based on Coefficients Identification using Non-Iterative SGTM Neural-Like Structure. In International Work-Conference on Artificial Neural Networks 467–479. Springer, Cham
Ji S, Xu W, Yang M et al (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Joly A, Goëau H, Glotin H et al (2017) Lifeclef lab overview: multimedia species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages 255–274. Springer, Cham
Kahl S, Wilhelm-Stein T, Hussein H et al (2017) Large-Scale Bird Sound Classification using Convolutional Neural Networks. In CLEF (Working Notes).
Kalan AK, Mundry R, Wagner OJJ et al (2015) Towards the automated detection and occupancy estimation of primates using passive acoustic monitoring. Ecol Ind 54:217–226. https://doi.org/10.1016/j.ecolind.2015.02.023
Article Google Scholar
Koops HV, Van Balen J, Wiering F (2014) A deep neural network approach to the lifeclef 2014 bird task. CLEF2014 Working Notes, 1180: 634–642
Lavanya Sudha PV, Lavanya Devi G, Nelaturi N (2018) Random Forest Algorithm for Recognition of Bird Species using Audio Recordings. Int J Manage, Tech And Engr 8(11):90–94
Google Scholar
Li S, Yao Y, Hu J et al (2018) An ensemble stacked convolutional neural network model for environmental event sound recognition. Appl Sci 8(7):1152. https://doi.org/10.3390/app8071152
Article Google Scholar
Lee CH, Chou CH, Han CC et al (2006) Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis[J]. Pattern Recogn Lett 27(2):93–101
Article Google Scholar
Lee CH, Hsu SB, Shih JL et al (2012) Continuous birdsong recognition using Gaussian mixture modeling of image shape features. IEEE Trans Multimedia 15(2):454–464. https://doi.org/10.1109/TMM.2012.2229969
Article Google Scholar
Leng YR, Tran HD (2014) Multi-label bird classification using an ensemble classifier with simple features. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific. 1–5. IEEE. https://doi.org/10.1109/APSIPA.2014.7041649
McFee B, Raffel C, Liang D et al (2015) librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference 8: 18–25
Müller M, Kurth F, Clausen M (2005) Audio Matching via Chroma-Based Statistical Features. In ISMIR(Vol. 2005, p. 6)
Müller M (2007) Information retrieval for music and motion(Vol. 2, p. 59). Heidelberg: Springer
Nirosha P, Marsland S, Castro I (2018) "Automated Birdsong Recognition in Complex Acoustic Environments: A Review." J Avian Biol 49, no. 5. https://doi.org/10.1111/jav.01447
Paulus J, Müller M, Klapuri A (2010) State of the Art Report: Audio-Based Music Structure Analysis. In Ismir. 625–636
Pereira HM, Cooper HD (2006) Towards the global monitoring of biodiversity change. Trends Ecol Evol 21(3):123–129
Article Google Scholar
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP) 1–6. IEEE. https://doi.org/10.1109/MLSP.2015.7324337
Stowell D, Plumbley MD (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2: e488.
Sainath TN, Vinyals O, Senior A et al (2015) Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4580–4584. IEEE. https://doi.org/10.1109/ICASSP.2015.7178838
Sangiorgio M, Dercole F (2020) Robustness of LSTM neural networks for multi-step forecasting of chaotic time series. Chaos, Solitons & Fractals, 139, 110045
Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun 54(4):543–565. https://doi.org/10.1016/j.specom.2011.11.004
Article Google Scholar
Sprengel E, Jaggi M, Kilcher Y et al (2016) Audio based bird species identification using deep learning techniques No. CONF; 547–559
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Article Google Scholar
Stowell D, Wood MD, Pamuła H et al (2019) Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge. Methods Ecol Evol 10(3):368–380
Article Google Scholar
Shen J, Pang R, Weiss RJ et al (2018) Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4779–4783 IEEE
Shore J, Johnson R (1980) Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans Inf Theory 26(1):26–37. https://doi.org/10.1109/TIT.1980.1056144
Article MathSciNet MATH Google Scholar
Sprengel E, Jaggi M, Kilcher Y et al (2016) Audio based bird species identification using deep learning techniques No. CONF, 547–559
Su Y, Zhang K, Wang J et al (2019) Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7):1733. https://doi.org/10.3390/s19071733
Article Google Scholar
Su Y, Zhang K, Wang J et al (2020) Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl Acoust 158:107050. https://doi.org/10.1016/j.apacoust.2019.107050
Article Google Scholar
Torfi A, Dawson J, Nasrabadi NM (2018) Text-independent speaker verification using 3d convolutional neural networks. In 2018 IEEE International Conference on Multimedia and Expo (ICME) 1–6. IEEE. https://doi.org/10.1109/ICME.2018.8486441
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2): 26–31
Tkachenko R, Doroshenko A, Izonin I et al (2018) Imbalance Data Classification via Neural-like Structures of Geometric Transformations Model: Local and Global Approaches. In International conference on computer science, engineering and education applications 112–122. Springer, Cham
Tkachenko R, Izonin I (2018) Model and principles for the implementation of neural-like structures based on geometric data transformations. In International Conference on Computer Science, Engineering and Education Applications 578–587. Springer, Cham
Torfi A, Iranmanesh SM, Nasrabadi N et al (2017) 3d convolutional neural networks for cross audio-visual matching recognition. IEEE Access 5:22081–22091. https://doi.org/10.1109/ACCESS.2017.2761539
Article Google Scholar
Walther GR, Post E, Convey P et al (2002) Ecological responses to recent climate change. Nature 416(6879):389–395
Article Google Scholar
Xie J, Towsey M, Zhang J et al (2016) Adaptive frequency scaled wavelet packet decomposition for frog call classification. Eco Inform 32:134–144
Article Google Scholar
Xing Z, Baik E, Jiao Y et al (2017) Modeling of the latent embedding of music using deep neural network. arXiv preprint: arXiv:1705.05229
Yang G-P, Tuan C-I, Lee H-Y, Lee L-S (2019) ‘‘Improved speech separation with Time-and-Frequency cross-domain joint embedding and clustering,’’ in Proc. Interspeech, Sep. pp. 1363–1367, 10. 21437/interspeech 2019–2181
Yin W, Kann K, Yu M et al (2017) Comparative study of cnn and rnn for natural language processing. arXiv preprint: arXiv:1702.01923
Zhang X, Chen A, Zhou G et al (2019) Spectrogram-frame linear network and continuous frame sequence for bird sound classification. Eco Inform 54:101009. https://doi.org/10.1016/j.ecoinf.2019.101009
Article Google Scholar
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616. https://doi.org/10.1109/TASL.2012.2186803
Article Google Scholar

Download references

Funding

This work supported in part by Scientific Innovation Fund for Post-graduates of Central South University of Forestry and Technology CX20192014; Hunan Key Laboratory of intelligent logistics technology 2019TP1015.

Author information

Authors and Affiliations

Institute of Artificial Intelligence Application, College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
Na Yan, Aibin Chen, Guoxiong Zhou, Zhihua Liu & Wenjie Chen
Wildlife Conservation and Utilization Laboratory, College of Forestry, Central South University of Forestry and Technology, Changsha, China
Zhiqiang Zhang
Hunan Provincial Key Laboratory of Urban Forest Ecology, College of Life Science and Technology, Central South University of Forestry and Technology, Changsha, China
Aibin Chen
Hunan Zixing Artificial Intelligence Research Academy, Hunan Zixing, China
Xiangyong Liu
HuangFengQiao State-Owned Forest Farm, YouXian County, Hunan Province, China
Jianwu Wang

Authors

Na Yan
View author publications
You can also search for this author in PubMed Google Scholar
Aibin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoxiong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aibin Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, N., Chen, A., Zhou, G. et al. Birdsong classification based on multi-feature fusion. Multimed Tools Appl 80, 36529–36547 (2021). https://doi.org/10.1007/s11042-021-11396-9

Download citation

Received: 08 September 2020
Revised: 18 January 2021
Accepted: 02 August 2021
Published: 08 September 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11042-021-11396-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Birdsong classification based on multi-feature fusion

Abstract

Access this article

Similar content being viewed by others

A novel deep transfer learning models for recognition of birds sounds in different environment

Bird Audio Diarization with Faster R-CNN

Birdsong classification based on multi feature channel fusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Birdsong classification based on multi-feature fusion

Abstract

Access this article

Similar content being viewed by others

A novel deep transfer learning models for recognition of birds sounds in different environment

Bird Audio Diarization with Faster R-CNN

Birdsong classification based on multi feature channel fusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation