Abstract
Animal biodiversity has been experiencing rapid decline due to various reasons such as habitat loss and degradation, invasive species, and environment pollution. Recent advances in acoustic sensors provide a novel way to monitor animals through investigating collected bioacoustic recordings. To accurately monitor animals, the precondition is the high performance of developed bioacoustic signal recognition model. However, since bioacoustic recordings are often obtained in an open environment, various sources of noise will affect the audio quality, which causes problems for automated analysis of animal sound recordings. Although various methods have been developed for addressing the noise in different bioacoustic recordings, to the best of our knowledge, there is still no paper that reviews and summarizes those methods. The main aim of this paper is to provide a systematic survey of the existing literature related to bioacoustic signal denoising. By investigating the existing denoising methods for bioacoustic recordings, current challenges, possible opportunities, and future research directions are discussed and concluded.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Alonso JB, Cabrera J, Shyamnani R, Travieso CM, Bolaños F, García A, Villegas A, Wainwright M (2017) Automatic anuran identification using noise removal and audio activity detection. Expert Syst Appl 72:83–92
Baker MC, Logue DM (2003) Population differentiation in a complex bird sound: a comparison of three bioacoustical analysis procedures. Ethology 109(3):223–242
Baker MC, Logue DM (2007) A comparison of three noise reduction procedures applied to bird vocal signals. J Field Ornithol 78(3):240–253
Bardeli R, Wolff D, Kurth F, Koch M, Tauchert KH, Frommolt KH (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit Lett 31(12):1524–1534
Barmatz H, Klein D, Vortman Y, Toledo S, Lavner Y (2019) A method for automatic segmentation and parameter estimation of bird vocalizations. In: 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), pp 211–216
Baumgartner MF, Mussoline SE (2011) A generalized baleen whale call detection and classification system. J Acoust Soc Am 129(5):2889–2902
Bedoya C, Isaza C, Daza JM, López JD (2014) Automatic recognition of anuran species based on syllable identification. Ecol Inf 24:200–209
Bergler C, Schröter H, Cheng RX, Barth V, Weber M, Nöth E, Hofer H, Maier A (2019) Orca-spot: an automatic killer whale sound detection toolkit using deep learning. Sci Rep 9(1):1–17
Bermant PC, Bronstein MM, Wood RJ, Gero S, Gruber DF (2019) Deep machine learning techniques for the detection and classification of sperm whale bioacoustics. Sci Rep 9(1):1–10
Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Sig Process 27(2):113–120
Brandes TS (2008) Feature vector selection and use with hidden markov models to identify frequency-modulated bioacoustic signals amidst noise. IEEE Trans Audio Speech Language Process 16(6):1173–1180
Brown A, Garg S, Montgomery J (2017) Automatic and efficient denoising of bioacoustics recordings using mmse stsa. IEEE Access 6:5010–5022
Brown A, Garg S, Montgomery J (2019) Automatic rain and cicada chorus filtering of bird acoustic data. Appl Soft Comput 81:105501
Cai J, Ee D, Pham B, Roe P, Zhang J (2007) Sensor network for the monitoring of ecosystem: Bird species recognition. In: 2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information, pp 293–298, https://doi.org/10.1109/ISSNIP.2007.4496859
Chandrakala S, Jayalakshmi S (2019) Generative model-driven representation learning in a hybrid framework for environmental audio scene and sound event recognition. IEEE Trans Multimed 22:3–14
Chen WP, Chen SS, Lin CC, Chen YZ, Lin WC (2012) Automatic recognition of frog calls using a multi-stage average spectrum. Comp Math Appl 64(5):1270–1281
Colonna JG, Nakamura EF (2018) Unsupervised selection of the singular spectrum components based on information theory for bioacoustic signal filtering. Dig Sig Process 82:64–79
Deichmann JL, Acevedo-Charry O, Barclay L, Burivalova Z, Campos-Cerqueira M, d’Horta F, Game ET, Gottesman BL, Hart PJ, Kalan AK et al (2018) It’s time to listen: there is much to be learned from the sounds of tropical ecosystems. Biotropica 50(5):713–718
Deller JR, Hansen JHL (1993) Proakis JG (2000) Discrete-time processing of speech signals. Institute of Electrical and Electronics Engineers. Macmillan, New York
Ding H, Soon Y, Koh SN, Yeo CK (2009) A spectral filtering method based on hybrid wiener filters for speech enhancement. Speech Commun 51(3):259–267
Dionelis N, Brookes M (2019) Modulation-domain kalman filtering for monaural blind speech denoising and dereverberation. IEEE/ACM Trans Audio Speech Language Process 27(4):799–814
Donoho DL, Johnstone JM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455
Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Sig Process 32(6):1109–1121
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Sig Process 33(2):443–445
Esfahanian M, Erdol N, Gerstein E, Zhuang H (2017) Two-stage detection of north atlantic right whale upcalls using local binary patterns and machine learning algorithms. Appl Acoust 120:158–166
Fletcher N (2007) Animal bioacoustics. Springer handbook of acoustics. Springer, Berlin, pp 785–804
Fu SW, Tsao Y, Lu X (2016) SNR-aware convolutional neural network modeling for speech enhancement. In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp 3768–3772
Gómez A, Ugarte JP, Gómez DMM (2018) Bioacoustic signals denoising using the undecimated discrete wavelet transform. In: Figueroa-García JC, Villegas JG, Orozco-Arroyave JR, Maya Duque PA (eds) Applied Computer Sciences in Engineering. Springer, Cham, pp 300–308
Gur BM, Niezrecki C (2007) Autocorrelation based denoising of manatee vocalizations using the undecimated discrete wavelet transform. J Acoust Soc Am 122(1):188–199
Gur MB, Niezrecki C (2011) A wavelet packet adaptive filtering algorithm for enhancing manatee vocalizations. J Acoust Soc Am 129(4):2059–2067
Härmä A (2003) Automatic identification of bird species based on sinusoidal modeling of syllables. In: Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, IEEE, vol 5, pp V–545
Heim O, Heim DM, Marggraf L, Voigt CC, Zhang X, Luo Y, Zheng J (2019) Variant maps for bat echolocation call identification algorithms. Bioacoustics 29:557–571
Henríquez A, Alonso JB, Travieso CM, Rodríguez-Herrera B, Bolaños F, Alpízar P, López-de Ipina K, Henríquez P (2014) An automatic acoustic bat identification system based on the audible spectrum. Expert Syst Appl 41(11):5451–5465
Hu W, Van Nghia Tran, Bulusu N, Chou CT, Jha S, Taylor A (2005) The design and evaluation of a hybrid sensor network for cane-toad monitoring. In: IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005., pp 503–508, https://doi.org/10.1109/IPSN.2005.1440984
Hu Y, Loizou PC (2006) Evaluation of objective measures for speech enhancement. In: Ninth International Conference on Spoken Language Processing
Huang CJ, Chen YJ, Chen HM, Jian JJ, Tseng SC, Yang YJ, Hsu PA (2014) Intelligent feature extraction and classification of anuran vocalizations. Appl Soft Comput 19:1–7
Hussein W, Hussein M, Becker T (2012) Spectrogram enhancement by edge detection approach applied to bioacoustics calls classification. Sig Image Process 3(2):1
Islam MT, Shahnaz C, Zhu WP, Ahmad MO (2015) Speech enhancement based on student \( t \) modeling of teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Language Process 23(11):1800–1811
Kandia V, Stylianou Y, Dutoit T (2008) Improve the accuracy of tdoa measurement using the teager-kaiser energy operator. In: 2008 New Trends for Environmental Monitoring Using Passive Systems, pp 1–6
Kim HG, Obermayer K, Bode M, Ruwisch D (2000) Real-time noise canceling based on spectral minimum detection and diffusive gain factors. J Acoust Soc Am 108(5):2484–2484
Klatt D (1982) Prediction of perceived phonetic distance from critical-band spectra: A first step. In: ICASSP’82. IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, vol 7, pp 1278–1281
Knight EC, Poo Hernandez S, Bayne EM, Bulitko V, Tucker BV (2019) Pre-processing spectrogram parameters improve the accuracy of bioacoustic classification using convolutional neural networks. Bioacoustics 29:337–355
Koluguri NR, Meenakshi GN, Ghosh PK (2017) Spectrogram enhancement using multiple window savitzky-golay (mwsg) filter for robust bird sound detection. IEEE/ACM Trans Audio Speech Language Process 25(6):1183–1192
Kong Q, Xu Y, Plumbley MD (2017) Joint detection and classification convolutional neural network on weakly labelled bird audio detection. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp 1749–1753, https://doi.org/10.23919/EUSIPCO.2017.8081509
Lamel L, Rabiner L, Rosenberg A, Wilpon J (1981) An improved endpoint detector for isolated word recognition. IEEE Trans Acoust Speech Sig Process 29(4):777–785
Le Roux J, Hershey JR, Weninger F (2015) Deep nmf for speech separation. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 66–70, https://doi.org/10.1109/ICASSP.2015.7177933
Lefkimmiatis S (2018) Universal denoising networks: a novel cnn architecture for image denoising. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3204–3213
Li J, Sakamoto S, Hongo S, Akagi M, Suzuki Y (2011) Two-stage binaural speech enhancement with wiener filter for high-quality speech communication. Speech Commun 53(5):677–689
Lim J, Oppenheim A (1978) All-pole modeling of degraded speech. IEEE Trans Acoust Speech Sig Process 26(3):197–210
Lin T, Yang H, Huang J, Yao C, Lien Y, Wang P, Hu F (2019) Evaluating changes in the marine soundscape of an offshore wind farm via the machine learning-based source separation. In: 2019 IEEE Underwater Technology (UT), pp 1–6
Lin TH, Tsao Y (2019) Source separation in ecoacoustics: A roadmap towards versatile soundscape information retrieval. Remote Sens Ecol Conserv 1–12
Lin TH, Chou LS, Akamatsu T, Chan HC, Chen CF (2013) An automatic detection algorithm for extracting the representative frequency of cetacean tonal sounds. J Acoust Soc Am 134(3):2477–2485
Lin TH, Fang SH, Tsao Y (2017) Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings. Sci Rep 7(1):1–10
Lostanlen V, Palmer K, Knight E, Clark C, Klinck H, Farnsworth A, Wong T, Cramer J, Bello JP (2019) Long-distance detection of bioacoustic events with per-channel energy normalization. arXiv preprint arXiv:191100417
Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Proceedings Interspeech, pp 436–440
Luque A, Romero-Lemos J, Carrasco A, Barbancho J (2018) Non-sequential automatic classification of anuran sounds for the estimation of climate-change indicators. Expert Syst Appl 95:248–260
McAulay R, Malpass M (1980) Speech enhancement using a soft-decision noise suppression filter. IEEE Trans Acoust Speech Sig Process 28(2):137–145
Mellinger DK (2004) A comparison of methods for detecting right whale calls. Can Acoust 32(2):55–65
Neal L, Briggs F, Raich R, Fern XZ (2011) Time-frequency segmentation of bird song in noisy acoustic environments. In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, IEEE, pp 2012–2015
Oikarinen T, Srinivasan K, Meisner O, Hyman JB, Parmar S, Fanucci-Kiss A, Desimone R, Landman R, Feng G (2019) Deep convolutional network for animal sound classification and source attribution using dual audio recordings. J Acoust Soc Am 145(2):654–662
Pandey PC, Pratapwar SS, Lehana PK (2004) Enhancement of electrolaryngeal speech by reducing leakage noise using spectral subtraction with quantile based dynamic estimation of noise. In: Proceeding of the 18th international congress on acoustics ICA 2004, pp 3029–3032
Patti A, Williamson GA (2013) Methods for classification of nocturnal migratory bird vocalizations using pseudo wigner-ville transform. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp 758–762
Pijanowski BC, Villanueva-Rivera LJ, Dumyahn SL, Farina A, Krause BL, Napoletano BM, Gage SH, Pieretti N (2011) Soundscape ecology: the science of sound in the landscape. BioScience 61(3):203–216
Pourhomayoun M, Dugan P, Popescu M, Clark C (2013) Bioacoustic signal classification based on continuous region processing, grid masking and artificial neural network. arXiv preprint arXiv:13053635
Priyadarshani N, Marsland S, Castro I, Punchihewa A (2016) Birdsong denoising using wavelets. PloS One 11(1):e0146790
Priyadarshani N, Marsland S, Castro I (2018) Automated birdsong recognition in complex acoustic environments: a review. J Avian Biol 49(5):jav–01447
Quackenbush SR (1995) Objective measures of speech quality. PhD thesis, Georgia Institute of Technology
Ren Y, Johnson MT, Tao J (2008) Perceptually motivated wavelet packet transform for bioacoustic signal enhancement. J Acoust Soc Am 124(1):316–327
Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 5069–5073
Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), IEEE, vol 2, pp 749–752
Roger V, Bartcus M, Chamroukhi F, Glotin H (2018) Unsupervised Bioacoustic Segmentation by Hierarchical Dirichlet Process Hidden Markov Model. Springer, Cham, pp 113–130
Ruiz-Muñoz JF, You Z, Raich R, Fern XZ (2018) Dictionary learning for bioacoustics monitoring with applications to species classification. J Sig Process Syst 90(2):233–247
Simões Amorim TO, Rezende de Castro F, Rodrigues Moron J, Ribeiro Duque B, Couto Di Tullio J, Resende Secchi E, Andriolo A (2019) Integrative bioacoustics discrimination of eight delphinid species in the western south atlantic ocean. PLOS ONE 14(6):1–17
Souza LS, Gatto BB, Fukui K (2018) Grassmann singular spectrum analysis for bioacoustics classification. In: 2018 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 256–260
Souza LS, Gatto BB, Fukui K (2019) Classification of bioacoustic signals with tangent singular spectrum analysis. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 351–355
Stowell D, Wood MD, Pamuła H, Stylianou Y, Glotin H (2019) Automatic acoustic detection of birds through deep learning: the first bird audio detection challenge. Methods Ecol Evol 10(3):368–380
Sun R, Marye Y, Zhao HA (2013) Wavelet transform digital sound processing to identify wild bird species. In: Wavelet Analysis and Pattern Recognition (ICWAPR), 2013 International Conference on, pp 306–309
Towsey MW, Planitz B, Nantes A, Wimmer J, Roe P (2012) A toolbox for animal call recognition. Bioacoust Int J Animal Sound Record 21(2):107–125
Xie J, Towsey M, Zhang J, Roe P (2015) Image processing and classification procedure for the analysis of australian frog vocalisations. In: Proceedings of the 2Nd International Workshop on Environmental Multimedia Retrieval, ACM, Shanghai, China, EMR ’15, pp 15–20
Xie J, Towsey M, Zhang J, Roe P (2016a) Acoustic classification of australian frogs based on enhanced features and machine learning algorithms. Appl Acoust 113:193–201
Xie J, Towsey M, Zhang J, Roe P (2016b) Adaptive frequency scaled wavelet packet decomposition for frog call classification. Ecol Inf 32:134–144
Xie J, Towsey M, Zhang J, Roe P (2018) Frog call classification: a survey. Artif Intell Rev 49(3):375–391
Xie J, Li X, Xing Z, Zhang B, Bao W, Zhang J (2019) Improved distributed minimum variance distortionless response (mvdr) beamforming method based on a local average consensus algorithm for bird audio enhancement in wireless acoustic sensor networks. Appl Sci 9(15):3153
Xie J, Hu K, Zhu M, Guo Y (2020) Bioacoustic signal classification in continuous recordings: syllable-segmentation vs. sliding-window. Expert Sys Appl 152:113390
Yan Z, Niezrecki C, Beusse DO (2005) Background noise cancellation for improved acoustic detection of manatee vocalizations. J Acoust Soc Am 117(6):3566–3573
Yan Z, Niezrecki C, Cattafesta LN III, Beusse DO (2006) Background noise cancellation of manatee vocalizations using an adaptive line enhancer. J Acoust Soc Am 120(1):145–152
Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Kalra MK, Zhang Y, Sun L, Wang G (2018) Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE Trans Med Imag 37(6):1348–1357
Yu S, Ma J, Wang W (2019) Deep learning for denoising. Geophysics 84(6):V333–V350
Zaugg S, Van Der Schaar M, Houégnigan L, Gervaise C, André M (2010) Real-time acoustic classification of sperm whale clicks and shipping impulses from deep-sea observatories. Appl Acoust 71(11):1011–1019
Zavarehei E (2020a) Berouti spectral subtraction (https://www.mathworks.com/matlabcentral/fileexchange/7675-boll-spectral-subtraction). MATLAB Central File Exchange Retrieved July 23, 2020
Zavarehei E (2020b) Boll spectral subtraction (https://www.mathworks.com/matlabcentral/fileexchange/7675-boll-spectral-subtraction). MATLAB Central File Exchange Retrieved July 23, 2020
Zavarehei E (2020c) Mmse stsa (https://www.mathworks.com/matlabcentral/fileexchange/10143-mmse-stsa). MATLAB Central File Exchange Retrieved July 23, 2020
Zavarehei E (2020d) Wiener filter (https://www.mathworks.com/matlabcentral/fileexchange/7673-wiener-filter). MATLAB Central File Exchange Retrieved July 23, 2020
Zeppelzauer M, Stöger AS, Breiteneder C (2013) Acoustic detection of elephant presence in noisy environments. In: Proceedings of the 2nd ACM international workshop on Multimedia analysis for ecological data, ACM, pp 3–8
Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
Acknowledgements
This work is supported by the 111 Project. This work is also supported by Fundamental Research Funds for the Central Universities (Grant No: JUSRP11924) and Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment & Technology (Grant No: FM-2019-06). This work is partially supported by National Natural Science Foundation of China (Grant No: 61902154). This work is also partially supported by Natural Science Foundation of Jiangsu Province (Grant No: BK2019043526) and Jiangsu Province Post Doctoral Fund (Grant No: 2020Z430). We also want to thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the institutional support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xie, J., Colonna, J.G. & Zhang, J. Bioacoustic signal denoising: a review. Artif Intell Rev 54, 3575–3597 (2021). https://doi.org/10.1007/s10462-020-09932-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09932-4