skip to main content
10.1145/3501409.3501474acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Research on Speech Enhancement based on Full-scale Connection

Authors Info & Claims
Published:31 December 2021Publication History

ABSTRACT

In order to solve the problem that the popular monaural speech enhancement models that based on encoder-decoder do not make full use of full-scale features, a full-scale feature connected speech enhancement model FSC-SENet is proposed. Firstly, this paper constructs a speech enhancement model based on CRN architecture. Convolutional encoder and decoder are used to extract features and recover speech signals, and LSTM modules are used to extract temporal features at the bottleneck of the model. Then a full-scale connection method and multi feature dynamic fusion mechanism are proposed, so that the decoder can make full use of the full-scale features to recover clean speech in the decoding process. Experimental results on TIMIT corpus show that compared with CRN, our FSC-SENet improves PESQ score by 0.39 and STOI score by 2.8% under seen noise cases, and PESQ score by 0.43 and STOI score by 3.1% under unseen noise cases, which proves that the proposed full-scale connection and dynamic feature fusion mechanism can make CRN have better speech enhancement performance.

References

  1. Wang Y, Wang D. Towards scaling up classification-based speech separation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7): 1381--1390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Xu Y, Du J, Dai L-R, et al. A regression approach to speech enhancement based on deep neural networks [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 23(1): 7--19.Google ScholarGoogle Scholar
  3. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]. MICCAI 2015, 2015: 234--241.Google ScholarGoogle Scholar
  4. Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481--2495.Google ScholarGoogle Scholar
  5. Jansson A, Humphrey E, Montecchio N, et al. Singing voice separation with deep u-net convolutional networks [C]. 18th International Society for Music Information Retrieval Conference, 2017: 23--27.Google ScholarGoogle Scholar
  6. Stoller D, Ewert S, Dixon S. Wave-u-net: A multi-scale neural network for end-to-end audio source separation [C]. International Society for Music Information Retrieval (ISMIR) Conference 2018, 2018: 334--340.Google ScholarGoogle Scholar
  7. Soni M H, Shah N, Patil H A. Time-frequency masking-based speech enhancement using generative adversarial network [C]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 5039--5043.Google ScholarGoogle Scholar
  8. Park S R, Lee J W. A fully convolutional neural network for speech enhancement [C]. Interspeech 2017, 2017: 1993--1997.Google ScholarGoogle ScholarCross RefCross Ref
  9. Tan K, Wang D. A convolutional recurrent neural network for real-time speech enhancement [C]. Interspeech, 2018: 3229--3233.Google ScholarGoogle Scholar
  10. Li A, Zheng C, Fan C, et al. A recursive network with dynamic attention for monaural speech enhancement [C]. Interspeech 2020, 2020: 2422--2426.Google ScholarGoogle Scholar
  11. Huang H, Lin L, Tong R, et al. Unet 3+: A full-scale connected unet for medical image segmentation [C]. ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 1055--1059.Google ScholarGoogle Scholar
  12. Garofolo J S, Lamel L F, Fisher W M, et al. Darpa timit acoustic-phonetic continous speech corpus cd-rom. Nist speech disc 1--1.1 [J]. 1993, 93: 27403.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hu G, Wang D. A tandem algorithm for pitch estimation and voiced speech segregation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(8): 2067--2079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Varga A, Steeneken H J. Assessment for automatic speech recognition: Ii. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech communication, 1993, 12(3): 247--251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs [C]. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (Cat No 01CH37221), 2001: 749--752.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Taal C H, Hendriks R C, Heusdens R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7): 2125--2136.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Research on Speech Enhancement based on Full-scale Connection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering
      October 2021
      1723 pages
      ISBN:9781450384322
      DOI:10.1145/3501409

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 December 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      EITCE '21 Paper Acceptance Rate294of531submissions,55%Overall Acceptance Rate508of972submissions,52%
    • Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader