research-article

Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement: Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement

Authors:

Dengfeng Ke,

Yanlu Xie,

Jinsong Zhang,

Liangjie HuangAuthors Info & Claims

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Pages 956 - 962

https://doi.org/10.1145/3573942.3573975

Published: 16 May 2023 Publication History

Get Access

AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement: Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement

Pages 956 - 962

Abstract
References

Abstract

Reducing parameter numbers and improving system performance is considered a dilemma problem. As is known to all, reducing parameter numbers will lead to performance degradation, while improving performance often lead to parameter numbers increasing. To solve the above dilemma, we propose a reversible and invertible recurrent (RAIR) network in this paper: Firstly, we construct a reversible dual-path architecture to avoid information loss for two arbitrary functions, F and G. That is to say, no matter what kinds of F and G are and no matter how small the model is, feature maps go through the network without any information loss. Secondly, we adopt an invertible 1x1 convolution to improve channel information remixing. Lastly, we employ a dual-path recurrences (DPR) block that operates in the frequency and the time dimensions separately for the F function and a 3x3 convolution for the G function in the above reversible architecture, which reduces parameter numbers dramatically. Although the model is tiny, experiments on Voice Bank + DEMAND show that our reversible and invertible recurrent architecture improves all the performance metrics: COVL from 3.57 to 3.78, wideband PESQ from 2.94 to 3.15, and STOI from 0.947 to 0.951. The proposed model achieves state-of-the-art results with only 190K parameters. To the best of our knowledge, it is the state-of-the-art model with the smallest size.

References

[1]

Dang, F., Chen, H., and Zhang, P. 2022. Dpt-fsnet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2022), IEEE, pp. 6857–6861.

Abstract

References

Index Terms

Recommendations

Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms

Speech Intelligibility Improvement Using the Constraints on Speech Distortion and Noise Over-estimation

Reconstruction-based speech enhancement from robust acoustic features

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations