Skip to main content

Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis

  • Conference paper
  • First Online:
Complex Networks & Their Applications XII (COMPLEX NETWORKS 2023)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1141))

Included in the following conference series:

  • 964 Accesses

Abstract

Speech enhancement is a key component in voice communication technology as it serves as an important pre-processing step for systems such as acoustic echo cancellation, speech separation, speech conversions, etc. A low-latency speech enhancement algorithm is desirable since long latency means delaying the entire system’s response. In STFT-based systems, reducing algorithmic latency by using smaller STFT window sizes leads to significant degradation in speech quality. By introducing a simple additional compensation window along with the original short main window in the analysis step of STFT, we preserve signal quality – comparable to that of the original high latency system while reducing the algorithmic latency from 42 ms to 5 ms. Experiments on the full-band VCD dataset and a large full-band Microsoft’s internal dataset show the effectiveness of the proposed method.

Work performed while Minh N. Bui was an research intern at Microsoft.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allen, J.: Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 25(3), 235–238 (1977). https://doi.org/10.1109/TASSP.1977.1162950

    Article  Google Scholar 

  2. Braun, S., Gamper, H., Reddy, C.K.A., Tashev, I.: Towards efficient models for real-time deep noise suppression (2021). https://doi.org/10.48550/ARXIV.2101.09249, https://arxiv.org/abs/2101.09249

  3. Dubey, H., et al.: Deep speech enhancement challenge at ICASSP 2023. In: ICASSP (2023)

    Google Scholar 

  4. Dubey, H., et al.: ICASSP 2022 deep noise suppression challenge. In: ICASSP (2022)

    Google Scholar 

  5. Graetzer, S., et al.: Clarity-2021 challenges: machine learning challenges for advancing hearing aid processing. In: Interspeech (2021)

    Google Scholar 

  6. Li, C.Y., Vu, N.T.: Improving speech recognition on noisy speech via speech enhancement with multi-discriminators CycleGAN. In: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 830–836 (2021). https://api.semanticscholar.org/CorpusID:245123920

  7. Li, Q., Gao, F., Guan, H., Ma, K.: Real-time monaural speech enhancement with short-time discrete cosine transform (2021). https://doi.org/10.48550/ARXIV.2102.04629, https://arxiv.org/abs/2102.04629

  8. Pandey, A., Liu, C., Wang, Y., Saraf, Y.: Dual application of speech enhancement for automatic speech recognition. In: IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, 19-22 January 2021, pp. 223–228. IEEE (2021). https://doi.org/10.1109/SLT48900.2021.9383624, https://doi.org/10.1109/SLT48900.2021.9383624

  9. Rix, A.W., Beerends, J.G., Hollier, M., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), vol. 2, pp. 749–752 (2001). https://api.semanticscholar.org/CorpusID:5325454

  10. Schröter, H., Escalante, A.N., Rosenkranz, T., Maier, A.K.: DeepFilternet: a low complexity speech enhancement framework for full-band audio based on deep filtering. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7407–7411 (2021). https://api.semanticscholar.org/CorpusID:238634774

  11. Taal, C., Hendriks, R., Heusdens, R., Jensen, J.: A short-time objective intelligibility measure for time-frequency weighted noisy speech, pp. 4214 – 4217 (2010). https://doi.org/10.1109/ICASSP.2010.5495701

  12. Taherian, H., Eskimez, S.E., Yoshioka, T., Wang, H., Chen, Z., Huang, X.: One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 271–275 (2021). https://api.semanticscholar.org/CorpusID:239049883

  13. Valentini-Botinhao, C.: Noisy speech database for training speech enhancement algorithms and TTS models (2017)

    Google Scholar 

  14. Vihari, S., Murthy, A., Soni, P., Naik, D.: Comparison of speech enhancement algorithms. Procedia Comput. Sci. 89, 666–676 (2016). https://doi.org/10.1016/j.procs.2016.06.032

    Article  Google Scholar 

  15. Wang, Z.Q., Wichern, G., Watanabe, S., Roux, J.L.: STFT-domain neural speech enhancement with very low algorithmic latency. IEEE/ACM Trans. Audio Speech Lang. Process. 31, 397–410 (2022). https://api.semanticscholar.org/CorpusID:248300088

  16. Westhausen, N.L., Meyer, B.T.: Acoustic Echo Cancellation with the Dual-Signal Transformation LSTM Network. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7138–7142 (2021). https://doi.org/10.1109/ICASSP39728.2021.9413510

  17. Wisdom, S., Hershey, J.R., Wilson, K.W., Thorpe, J., Chinen, M., Patton, B., Saurous, R.A.: Differentiable consistency constraints for improved deep speech enhancement. CoRR abs/1811.08521 (2018). http://arxiv.org/abs/1811.08521

  18. Wood, S.U.N., Rouat, J.: Unsupervised low latency speech enhancement with RT-GCC-NMF. IEEE Journal of Selected Topics in Signal Processing 13(2), 332–346 (2019). https://doi.org/10.1109/jstsp.2019.2909193

  19. Zhang, G., Yu, L., Wang, C., Wei, J.: Multi-scale temporal frequency convolutional network with axial attention for speech enhancement. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9122–9126 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746610

  20. Zhang, Z., Zhang, L., Zhuang, X., Qian, Y., Li, H., Wang, M.: FB-MSTCN: a full-band single-channel speech enhancement method based on multi-scale temporal convolutional network (2022). https://doi.org/10.48550/ARXIV.2203.07684, https://arxiv.org/abs/2203.07684

  21. Zhao, S., Ma, B., Watcharasupat, K.N., Gan, W.S.: FRCRN: boosting feature representation using frequency recurrence for monaural speech enhancement (2022). https://doi.org/10.48550/ARXIV.2206.07293, https://arxiv.org/abs/2206.07293

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh N. Bui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bui, M.N., Tran, D.N., Koishida, K., Tran, T.D., Chin, P. (2024). Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis. In: Cherifi, H., Rocha, L.M., Cherifi, C., Donduran, M. (eds) Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1141. Springer, Cham. https://doi.org/10.1007/978-3-031-53468-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53468-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53467-6

  • Online ISBN: 978-3-031-53468-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics