research-article

Research on Speech Enhancement based on Full-scale Connection

Authors:
Hongyan Chen

School of Computer and Artificial Intelligence, Wuhan University of technology

School of Computer and Artificial Intelligence, Wuhan University of technology
View Profile

,
Yan Hu

School of Computer and Artificial Intelligence, Wuhan University of technology

School of Computer and Artificial Intelligence, Wuhan University of technology
View Profile

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer EngineeringOctober 2021Pages 350–354https://doi.org/10.1145/3501409.3501474

Published:31 December 2021Publication History

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

Pages 350–354

ABSTRACT

In order to solve the problem that the popular monaural speech enhancement models that based on encoder-decoder do not make full use of full-scale features, a full-scale feature connected speech enhancement model FSC-SENet is proposed. Firstly, this paper constructs a speech enhancement model based on CRN architecture. Convolutional encoder and decoder are used to extract features and recover speech signals, and LSTM modules are used to extract temporal features at the bottleneck of the model. Then a full-scale connection method and multi feature dynamic fusion mechanism are proposed, so that the decoder can make full use of the full-scale features to recover clean speech in the decoding process. Experimental results on TIMIT corpus show that compared with CRN, our FSC-SENet improves PESQ score by 0.39 and STOI score by 2.8% under seen noise cases, and PESQ score by 0.43 and STOI score by 3.1% under unseen noise cases, which proves that the proposed full-scale connection and dynamic feature fusion mechanism can make CRN have better speech enhancement performance.

References

Wang Y, Wang D. Towards scaling up classification-based speech separation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7): 1381--1390.Google ScholarDigital Library
Xu Y, Du J, Dai L-R, et al. A regression approach to speech enhancement based on deep neural networks [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 23(1): 7--19.Google Scholar
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]. MICCAI 2015, 2015: 234--241.Google Scholar
Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481--2495.Google Scholar
Jansson A, Humphrey E, Montecchio N, et al. Singing voice separation with deep u-net convolutional networks [C]. 18th International Society for Music Information Retrieval Conference, 2017: 23--27.Google Scholar
Stoller D, Ewert S, Dixon S. Wave-u-net: A multi-scale neural network for end-to-end audio source separation [C]. International Society for Music Information Retrieval (ISMIR) Conference 2018, 2018: 334--340.Google Scholar
Soni M H, Shah N, Patil H A. Time-frequency masking-based speech enhancement using generative adversarial network [C]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 5039--5043.Google Scholar
Park S R, Lee J W. A fully convolutional neural network for speech enhancement [C]. Interspeech 2017, 2017: 1993--1997.Google ScholarCross Ref
Tan K, Wang D. A convolutional recurrent neural network for real-time speech enhancement [C]. Interspeech, 2018: 3229--3233.Google Scholar
Li A, Zheng C, Fan C, et al. A recursive network with dynamic attention for monaural speech enhancement [C]. Interspeech 2020, 2020: 2422--2426.Google Scholar
Huang H, Lin L, Tong R, et al. Unet 3+: A full-scale connected unet for medical image segmentation [C]. ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 1055--1059.Google Scholar
Garofolo J S, Lamel L F, Fisher W M, et al. Darpa timit acoustic-phonetic continous speech corpus cd-rom. Nist speech disc 1--1.1 [J]. 1993, 93: 27403.Google ScholarCross Ref
Hu G, Wang D. A tandem algorithm for pitch estimation and voiced speech segregation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(8): 2067--2079.Google ScholarDigital Library
Varga A, Steeneken H J. Assessment for automatic speech recognition: Ii. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech communication, 1993, 12(3): 247--251.Google ScholarDigital Library
Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs [C]. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (Cat No 01CH37221), 2001: 749--752.Google ScholarDigital Library
Taal C H, Hendriks R C, Heusdens R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7): 2125--2136.Google ScholarDigital Library

Index Terms

Research on Speech Enhancement based on Full-scale Connection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Combined speech enhancement and auditory modelling for robust distributed speech recognition

The performance of automatic speech recognition (ASR) systems in the presence of noise is an area that has attracted a lot of research interest. Additive noise from interfering noise sources, and convolutional noise arising from transmission channel ...
Read More
Reconstruction-based speech enhancement from robust acoustic features

A method of speech enhancement that reconstructs clean speech from acoustic features.Features estimated by a statistical method incorporating noise and speaker adaptation.Listening tests find enhancement highly effective in reducing background noise. ...
Read More
Spectral-domain speech enhancement for speech recognition

Speech recognition performance deteriorates in face of unknown noise. Speech enhancement offers a solution by reducing the noise in speech at runtime. However, it also introduces artificial distortion to the speech signal. In this paper, we aim at ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering
October 2021
1723 pages
ISBN:9781450384322
DOI:10.1145/3501409

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 December 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Feature fusion
Full-scale skip connection
Speech enhancement
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
EITCE '21 Paper Acceptance Rate294of531submissions,55%Overall Acceptance Rate508of972submissions,52%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 22
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Research on Speech Enhancement based on Full-scale Connection

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Combined speech enhancement and auditory modelling for robust distributed speech recognition

Reconstruction-based speech enhancement from robust acoustic features

Spectral-domain speech enhancement for speech recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Research on Speech Enhancement based on Full-scale Connection

EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Combined speech enhancement and auditory modelling for robust distributed speech recognition

Reconstruction-based speech enhancement from robust acoustic features

Spectral-domain speech enhancement for speech recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media