Abstract
The existing speech enhancement algorithm has shown poor performance under low Signal Noise Ratios (SNRs). To resolve this problem, a speech enhancement algorithm based on binaural sound source localization and cosh measure filtering is proposed. Firstly, the algorithm uses a sound source localization algorithm based on head correlation functions and two-level deep learning to extract the spatial information of the binaural sound source and determine the spatial position of the sound source. The beamforming method is then used to remove the noises in different directions from the speech. Finally, the Wiener filtering of cosh measure based on logarithmic relation is used to remove the noise in the same direction as the speech to achieve speech enhancement. Experiments show that the proposed algorithm has better robustness and denoising ability than the contrast algorithms.
Similar content being viewed by others
References
D. Ayllón, R. Gil−Pita, M. Rosa−Zurera, A machine learning approach for computationally and energy efficient speech enhancement in binaural hearing aids,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6515−6519 (2016). https://doi.org/10.1109/ICASSP.2016.7472932
Y. Bengio, Learning deep architectures for AI[J]. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
S. Doclo, M. Moonen, T. Van den Bogaert, Reduced−bandwidth and distributed mwf−based noise reduction algorithms for binaural hearing aids. IEEE Trans. Audio, Speech, Lang. Process. (2009). https://doi.org/10.1109/TASL.2008.2004291
W. Dongxia, Z. Jiachao, F. Zhenwei et al., Broadband beamforming for speech enhancement in reverberation environment. Comput. Eng. Appl. 48(34), 136–139 (2012)
Y. Fang, F. Haihong, C. Youyuan, A binaural speech enhancement algorithm: Application to background and directional noise fields”. Int. Congress Image Signal Process. (CISP) (2015). https://doi.org/10.1109/CISP.2015.7408075
M. Geravanchizadeh, S. Ghalami Osgouei, Dual−channel speech enhancement using normalized fractional least−mean−squares algorithm[C]. Iranian Conference on Electrical Engineering (2011)
A. Gore, S. Chakrabartty, A min−max optimization framework for designing learners: theory and hardware[j] circuits and systems i: regular papers. IEEE Trans. 57(3), 604–617 (2010). https://doi.org/10.1109/TCSI.2009.2025002
A. Gray, J. Markel, Distance measures for speech processing [J] IEEE Trans. Acoust Speech Signal Process. ASSP 24(5), 380–391 (1976). https://doi.org/10.1109/TASSP.1976.1162849
J. Hansen, B. Pellom, An efficient quality evaluation protocol for speech enhancement algorithms[C]. Int. Conf. Spoken Lang. Process. 7, 2819–2822 (1998)
ITU. ITU−T Recommendation p.862, Perceptual evaluation of speech quality(PESQ), an objective method for end−to−end speech quality assessment of narrowband telephone networks and speech codes[S](2000)
R. Li, D. Pan, S. Zhang, Speech enhancement algorithm based on sound source localization and scene matching for binaural digital hearing aids[J]. J. Med. Biol. Eng. 39(3), 403–417 (2019)
H. Liu, J. Zhang, Fu. Zhuo, A new hierarchical binaural sound source localization method based on Interaural Matching Filter. IEEE Int. Conf. Robot. Automation (ICRA) (2014). https://doi.org/10.1109/ICRA.2014.6907065
N. Ma, G. J. Brown, Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions[C]. Proc. Interspeech, pp. 3302−3306 (2015)
D. Marelli, R. Baumgartner, P. Manda, Efficient approximation of head−related transfer functions in subbands for accurate sound localization[J]. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(7), 1130–1143 (2015). https://doi.org/10.1109/TASLP.2015.2425219
Z. Mingru, X. Aimin, Z. Jiaxin et al., Speech enhancement algorithm based on improved signal subspace combined with wiener filtering[J]. Sci. Technol. Eng. 18(3), 74–78 (2018)
NTT, “Multi−Lingual Speech Database for Telephonometry,” NTT Advanced Technology Corporation (NTT−AT) (1994).
C. Pang, H. Liu, J. Zhang, X. Li, Binaural sound localization based on reverberation weighting and generalized parametric mapping. IEEE/ACM Trans. Audio Speech Lang. Process. (2017). https://doi.org/10.1109/TASLP.2017.2703650
B. Rafaely, M. Roccasalvafirenze, E. Payne, Feedback path variability modeling for robust hearing aids[J]. J. Sound Vibr. 302(1), 350–360 (2007). https://doi.org/10.1121/1.428652
M. Raspaud, H. Viste, Evangelista G (2010) “Binaural source localization by joint estimation of ILD and ITD. IEEE Trans. Audio Speech Lang. Process. 18(1), 68–77 (2010). https://doi.org/10.1109/TASL.2009.2023644
C. S. Reddy, R. Agarwal, L. Aggarwal, and R. M. Hegde, Binaural source localization using a HRTF data model with enhanced frequency diversity[C].in 24th European Signal Processing Conference (EUSIPCO), pp. 1463–1467(2016). https://doi.org/10.1109/EUSIPCO.2016.7760491
S. Rinivasan, J. Samuelsson, W. B. Kleijn Codebook−based Bayesian speech enhancement[C]. IEEE International Conference on Acoustics, Speech,and Signal Processing(ICASSP),1:1077−1080(2015). https://doi.org/10.1109/ICASSP.2005.1415304
L.I. Ruwei, Z. Pan Dongmei, Z.Y. Shuang, Binaural Sound source localization algorithm based on HRTF and GMM under gammatone filter decomposition [J]. J Beijing Univ. Technol. 44(11), 185–1390 (2018)
C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, A short−time objective intelligibility measure for time−frequency weighted noisy speech[C]. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) 23(3), 4214–4217 (2010). https://doi.org/10.1109/ICASSP.2010.5495701
X. Wu, D.S. Talagala, Spatial feature learning for robust binaural sound source localization using a composite feature vector. IEEE Int. Conf. Acoust. Speech Signal Process. ICASSP. (2016). https://doi.org/10.1109/ICASSP.2016.7472893
N. Yousefian, P. C. Loizou, “A dual−microphone algorithm that can cope with competing−talker scenarios,” IEEE Transactions on Audio, Speech, and Language Processing, pp.145−155 (2013). https://doi.org/10.1109/TASL.2012.2215594
C. Yu, Research on Chinese Information Extraction Based on Deep Belief Nets[D],Harbin Institute of Technology (2014) (In Chinese)
C. Yu, C. Su, Speech enhancement based on the generalized sidelobe cancellation and spectral subtraction for a microphone array. IEEE Int. Congress Image Signal Process. (CISP) (2015). https://doi.org/10.1109/CISP.2015.7408086
F. Zhao, R. Li, D. Pan. Deep Learning for Binaural Sound Source Localization with Low Signal−to−noise Ratio. The 2020 International Symposium on Automation, Information and Computing (ISAIC 2020), December 2nd−4th, 2020, Beijing China. Journal of Physics: Conference Series (JPCS)(ISSN:1742−6588) (2021)
J. Zou, F. Zhang, “A new generation of hearing aids communication technology: Binaural fusion”. J. Auditory Speech Pathol. 22(1), 15–16 (2014) (In Chinese)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61971016).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Ruwei LI, Fengnian ZHAO and Dongmei PAN. The first draft of the manuscript was written by Fengnian ZHAO and Dongmei PAN and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, R., Zhao, F., Pan, D. et al. Speech Enhancement Based on Binaural Sound Source Localization and Cosh Measure Wiener Filtering. Circuits Syst Signal Process 41, 395–424 (2022). https://doi.org/10.1007/s00034-021-01786-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01786-7