skip to main content
10.1145/3452940.3453037acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciteeConference Proceedingsconference-collections
research-article

Single Channel Target Speaker Extraction Based on Deep Learning

Authors Info & Claims
Published:17 May 2021Publication History

ABSTRACT

The purpose of single-channel target speaker extraction is to simulate human selective auditory attention by extracting the voice of the target speaker from multiple speaker environments. For the above actual scenario, We advocate a temporal goal speaker recognition model. It transforms the blended speech into coefficients of embedding, the speech sign does now not want to be dessicated into an amplitude spectrum and a segment spectrum. Four network components are included in the network, namely the speaker encoder, the voice encoder, the speaker extractor, and the speech decoder. In particular, the sound encoder transforms the mixed sound into parameters of embedding, and the speaker encoder knows that to represent the target speaker, the embedding of the speaker is used. The speaker extractor takes the embedding factor and the target speaker's embedding as data, estimating the receiving mask. At last, according to the masked embedding parameters, the voice decoder rebuilds the aim speaker's sound. The experimental results show, Under open evaluation conditions, This method is 45.6% and 47.5% higher than the best pipeline in the aspect of signal distortion ratio (SDR) and scale-invariant signal distortion ratio (SI-SDR).

References

  1. S. Watanabe, M. Delcroix, F. Metze, and J. R Hershey, New Era for Robust Speech Recognition: Exploiting Deep Learning, Springer, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dehak N, Kenny P J, Dehak R, et al. Front-End Factor Analysis for Speaker Verification[J]. IEEE Transactions on Audio Speech & Language Processing, 2011, 19(4):788--798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. R Hershey, Z. Chen, J. Le Roux, and S. Watanabe, "Deep clustering: Discriminative embeddings for segmentation and separation, " in Proc. of ICASSP. IEEE, 2016, pp. 31--35Google ScholarGoogle Scholar
  4. Chen Z, Luo Y, Mesgarani N. Deep attractor network for single-microphone speaker separation[J]. 2016.Google ScholarGoogle Scholar
  5. Kolbaek M, Yu D, Tan Z H, et al. Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2017:1--1.Google ScholarGoogle Scholar
  6. Luo Y, Mesgarani N. TasNet: time-domain audio separation network for realtime, single-channel speech separation[J]. 2017.Google ScholarGoogle Scholar
  7. J. Wang, J. Chen, D. Su, L. Chen, M. Yu, Y. Qian, and D. Yu, "Deep extractor network for target speaker recovery from single channel speech mixtures, " in Proc. of INTERSPEECH, 2018, pp. 307--311.Google ScholarGoogle ScholarCross RefCross Ref
  8. Snyder D, Ghahremani P, Povey D, et al. Deep neural network-based speaker embeddings for end-to-end speaker verification[C]//2016 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2016: 165--170.Google ScholarGoogle Scholar
  9. Williamson D, Wang D L. Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017:1492--1501.Google ScholarGoogle Scholar
  10. Le Roux J, Wisdom S, Erdogan H, et al. SDR-half-baked or well done?[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 626--630.Google ScholarGoogle Scholar
  11. J. Garofolo, D Graff, D Paul, and D Pallett, "Csr-i(wsj0) complete ldc93s6a, " Philadelphia: Linguistic Data Consortium, 1993.Google ScholarGoogle Scholar
  12. Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.Google ScholarGoogle Scholar
  13. Delcroix M, Zmolikova K, Kinoshita K, et al. Single channel target speaker extraction and recognition with speaker beam[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018: 5554--5558.Google ScholarGoogle Scholar
  14. Vincent E, Gribonval R, Févotte C. Performance measurement in blind audio source separation[J]. IEEE transactions on audio, speech, and language processing, 2006, 14(4): 1462--1469.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huang Z, Wang S, Yu K. Angular Softmax for Short-Duration Text-independent Speaker Verification[C]//Interspeech. 2018: 3623--3627.Google ScholarGoogle Scholar

Index Terms

  1. Single Channel Target Speaker Extraction Based on Deep Learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICITEE '20: Proceedings of the 3rd International Conference on Information Technologies and Electrical Engineering
            December 2020
            687 pages
            ISBN:9781450388665
            DOI:10.1145/3452940

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 May 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)9
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader