DCU-Net transient noise suppression based on joint spectrum estimation

Lan, Chaofeng; Zhao, Shilong; Zhang, Lei; Chen, Huan; Guo, Rui; Si, Zhenfei; Guo, Xiaoxia; Han, Chuang; Zhang, Meng

doi:10.1007/s11760-023-02541-y

DCU-Net transient noise suppression based on joint spectrum estimation

Original Paper
Published: 25 April 2023

Volume 17, pages 3265–3273, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Chaofeng Lan¹^na1,
Shilong Zhao¹,
Lei Zhang²,
Huan Chen³,
Rui Guo¹,
Zhenfei Si¹^na1,
Xiaoxia Guo¹,
Chuang Han¹ &
…
Meng Zhang⁴

161 Accesses
Explore all metrics

Abstract

Transient noise has a high short-time energy, a high degree of randomness, a wide frequency-domain distribution, and only causes local signal pollution. Traditional denoising methods usually establish the assumption of a certain kind of relationship between speech and noise, and this assumption does not necessarily match real-life scenarios. Therefore, using traditional denoising methods does not effectively suppress transient noise. For the above reasons, this paper proposes a new denoising scheme. First, based on the conventional optimally-modified log-spectral amplitude (OM-LSA) estimation algorithm, the minima controlled recursive averaging algorithm is replaced by the improved mean recurrence time algorithm, and the transient noise spectrum is estimated. Second, transient noise segments are determined using thresholds and fed into a deep complex-valued U-Net (DCU-Net) network for speech enhancement. Third, insert the enhanced results into the original sequence to reconstruct the denoised speech signal. Finally, this paper uses the Voice Bank corpus speech and homemade noise datasets to perform experimental tests. The test results show that the segmented signal-to-noise ratio, speech quality perception, and short-term target intelligibility of the proposed method in 0 dB, − 5 dB, and − 10 dB environments have improved than the traditional OM-LSA algorithm. When the signal-to-noise ratio is − 10 dB, the segmented signal-to-noise ratio is improved by 9.8%. The test results show that this paper's proposed method can solidly suppress transient noise at low signal-to-noise ratios and simultaneously improve speech quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time speech enhancement algorithm for transient noise suppression

Article 23 September 2020

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Article Open access 10 December 2020

Variable Quantile Level Based Noise Suppression for Robust Speech Recognition

Availability of data and materials

All the data included in this study are available upon request by contacting the corresponding author.

Not applicable.

References

Cao, K., Wang, M.: Transient noise suppression algorithm in speech system. AIP Conf. Proc. 1864(1), 20006–20006 (2017)
Article Google Scholar
Vaseghi, S.V.: Advanced Digital Signal Processing and Noise Reduction. Wiley, New York (2000)
Google Scholar
Kammi, S., Mollaei, M.: A novel regularization framework for transient noise reduction. Appl. Acoust. 129, 135–143 (2018)
Article Google Scholar
Fennick, J.H.: A report on some characteristics of impulse noise in telephone communication. IEEE Trans. Commun. 83(75), 700–705 (1964)
Article Google Scholar
Boll, S.F.: A spectral subtraction algorithm for suppression of acoustic noise in speech. In: ICASSP'79. IEEE International Conference on Acoustics, Speech, and Signal Process, pp. 200–203 (1979)
Richards, D.S.: VLSI median filters. IEEE Trans. Acoust. Speech Signal Process. ASSP 38(1), 145–153 (1990)
Article Google Scholar
Choi, M.S., Kang, H.G.: Transient noise reduction in speech signal with a modified long-term predictor. EURASIP J. Adv. Signal Processing. 2011(1), 1–9 (2011)
Article MathSciNet Google Scholar
Wang, J., Zhang, X., Zhu, J., Wu, Y.: Impulsive noise suppression based on time-frequency spectrogram. J. Vib. Shock. 29(2), 149–153 (2010)
Google Scholar
Tanwar, P., Somkuwar, A.: Hard component detection of transient noise and its removal using empirical mode decomposition and wavelet-based predictive filter. IET Signal Process. 12(7), 907–916 (2018)
Article Google Scholar
Nongpiur, R.C.: Impulse noise removal in speech using wavelets. In: IEEE International Conference on Acoustics, Speech and Signal Process, pp. 1593–1596 (2008)
He, Z., Zhu, Z., Zhang, M.: Impulsive noise removal based on noise energy distribution in wavelet packet domain. Chin. J. Sci. Instrum. 32(9), 2071–2078 (2011)
Google Scholar
Ram, R., Mohapatra, S.K., Nayak, P.K., Mohanty, M.N.: Single Channel Speech Enhancement Using Fractional Wavelet Transform Advances in Intelligent Computing and Communication, pp. 629–643. Springer, Singapore (2021)
Google Scholar
Hirszhorn, A., Dov, D., Talmon, R., Cohen, I.: Transient interference suppression in speech signals based on the OM-LSA algorithm. International Workshop on Acoustic Signal Enhancement, VDE, pp. 1–4 (2012)
Talmon, R., Cohen, I., Gannot, S.: Clustering and suppression of transient noise in speech signals using diffusion maps. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5084–5087 (2011)
Ullah, R., Islam, M.S., Ye, Z., Asif, M.: Semi-supervised transient noise suppression using omlsa and snmf algorithms. Appl. Acoust. 170, 107533 (2020)
Article Google Scholar
Hao, Y., Cheng, S., Chen, G., Chen, Y., Ruan, L.: A neural network based noise suppression method for transient noise control with low-complexity computation. INTER-NOISE NOISE-CON Congr. Conf. Proc. 263(1), 5902–5909 (2021)
Article Google Scholar
Choi, H.S., Kim, J.H., Huh, J., Kim, A., Ha, J.W., Lee, K.: Phase-aware speech enhancement with deep complex u-net. In: International Conference on Learning Representations (2018)
Choi, H.S., Heo, H., Lee, J.H., Lee, K.: Phase-aware single-stage speech denoising and dereverberation with u-net. arXiv:2006.00687, (2020)
Rajamani, K.T., Rani, P., Siebert, H., ElagiriRamalingam, R., Heinrich, M.P.: Attention-augmented U-Net (AA-U-Net) for semantic segmentation. Signal Image Video Process. pp. 1–9 (2022)
Liang, R., Xie, Y., Cheng, J., Tang, G., Sun, S.: Real-time speech enhancement algorithm for transient noise suppression. Multimed. Tools Appl. 80(3), 3681–3702 (2021)
Article Google Scholar
Williamson, D.S., Wang, Y., Wang, D.L.: Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 483–492 (2016)
Article Google Scholar
Chen, R., Xue, J., Chen, D.: Transient noise suppression algorithm based on deep learning. Commun. Electroacoust. 44(06), 107–110 (2020)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, pp. 234–241 (2015)
Wang, M., Lin, F., Chen, K., Luo, W., Qiang, S.: TEM-NLnet: a deep denoising network for transient electromagnetic signal with noise learning. IEEE Trans. Geosci. Remote Sens. 60, 1–14 (2022)
Google Scholar
Venkataramani, S., Casebeer, J., Smaragdis, P.: End-to-end Source Separation with Adaptive Front-Ends. (2017).

Download references

Funding

This research was received by the Natural Science Foundation of Heilongjiang Province (No. LH2020F033), the National Natural Science Youth Foundation of China (No.11804068), and Research Project of the Heilongjiang Province Health Commission(20221111001069).

Author information

Chaofeng Lan and Zhenfei Si have contributed equally to this work.

Authors and Affiliations

School of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, 150080, China
Chaofeng Lan, Shilong Zhao, Rui Guo, Zhenfei Si, Xiaoxia Guo & Chuang Han
Beidahuang Industry Group General Hospital, Harbin, 150088, China
Lei Zhang
China Ship Development and Design Center, Wuhan, 430064, China
Huan Chen
School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
Meng Zhang

Authors

Chaofeng Lan
View author publications
You can also search for this author in PubMed Google Scholar
Shilong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Rui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhenfei Si
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxia Guo
View author publications
You can also search for this author in PubMed Google Scholar
Chuang Han
View author publications
You can also search for this author in PubMed Google Scholar
Meng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CL contributed to the conception of the study and contributed significantly to analysis and manuscript preparation; LZ and SZ made important contributions in making adjustments to the structure, revising the paper, English editing, and revisions of this manuscript; ZS performed the experiment, the data analyses, and wrote the original manuscript; HC made significant contributions to the structure, English modification, and data analysis of the paper; MZ carried on the data analysis and did the major revision of the paper; and XG and CH made important contributions in making adjustments to the proofread English.

Corresponding authors

Correspondence to Lei Zhang, Chuang Han or Meng Zhang.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lan, C., Zhao, S., Zhang, L. et al. DCU-Net transient noise suppression based on joint spectrum estimation. SIViP 17, 3265–3273 (2023). https://doi.org/10.1007/s11760-023-02541-y

Download citation

Received: 24 December 2022
Revised: 25 January 2023
Accepted: 25 February 2023
Published: 25 April 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11760-023-02541-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DCU-Net transient noise suppression based on joint spectrum estimation

Abstract

Access this article

Similar content being viewed by others

Real-time speech enhancement algorithm for transient noise suppression

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Variable Quantile Level Based Noise Suppression for Robust Speech Recognition

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DCU-Net transient noise suppression based on joint spectrum estimation

Abstract

Access this article

Similar content being viewed by others

Real-time speech enhancement algorithm for transient noise suppression

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Variable Quantile Level Based Noise Suppression for Robust Speech Recognition

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation