skip to main content
10.1145/3622896.3622923acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccrisConference Proceedingsconference-collections
research-article

A Unified Mixed-Bandwidth ASR Framework with Generative Adversarial Network

Published:03 October 2023Publication History

ABSTRACT

The recognition of mixed-bandwidth audio presents a challenge for both academic and industrial fields, with potentially greater implications for the latter. In this paper, we present a unified ASR architecture for mixed-bandwidth audio's recognition, here we innovatively propose to use the generative adversarial network and two discriminators to help achieving the ability of recognizing mixed sampling audio and guaranteeing the performance of ASR system. Through the adaptive training process of trained generator and ASR system, the performance can be further improved. We conduct experiments on the libri-speech dataset and demonstrate that our method can successfully recognize mixed-bandwidth audio and improve the accuracy of the ASR system by 3.65% in the narrowband data. Overall, the proposed unified ASR architecture provides a promising solution for the recognition of mixed-bandwidth audio in various settings.

References

  1. M. Song, Q. Zhang, J. Pan and Y. Yan, "Improving HMM/DNN in ASR of under-resourced languages using probabilistic sampling," 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), Chengdu, China, 2015, pp. 20-24, doi: 10.1109/ChinaSIP.2015.7230354.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. -T. Do, D. Pastor and A. Goalic, "On the Recognition of Cochlear Implant-Like Spectrally Reduced Speech With MFCC and HMM-Based ASR," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 1065-1068, July 2010, doi: 10.1109/TASL.2009.2032945.Google ScholarGoogle ScholarCross RefCross Ref
  3. Moreno, P. J., & Stern, R. M. (n.d.). Sources of degradation of speech recognition in the Telephone Network. Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.Google ScholarGoogle Scholar
  4. N. Morales, D. T. Toledano, J. H. L. Hansen and J. Garrido, "Feature Compensation Techniques for ASR on Band-Limited Speech," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 4, pp. 758-774, May 2009, doi: 10.1109/TASL.2008.2012321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Avendano, C., Hermansky, H., & Wan, E. A. (1995). Beyond nyquist: Towards the recovery of broad-bandwidth speech from narrow-bandwidth speech. 4th European Conference on Speech Communication and Technology (Eurospeech 1995).Google ScholarGoogle ScholarCross RefCross Ref
  6. Bansal, D., Raj, B., & Smaragdis, P. (2005). Bandwidth expansion of narrowband speech using non-negative matrix factorization. Interspeech 2005.Google ScholarGoogle Scholar
  7. Laaksonen, L., Kontio, J., & Alku, P. (n.d.). Artificial bandwidth expansion method to improve intelligibility and quality of amr-coded narrowband speech. Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kun-Youl Park, & Hyung Soon Kim. (n.d.). Narrowband to wideband conversion of speech using GMM based transformation. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).Google ScholarGoogle ScholarCross RefCross Ref
  9. Bauer, P., & Fingscheidt, T. (2008). An HMM-based artificial bandwidth extension evaluated by cross-language training and test. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kontio, J., Laaksonen, L., & Alku, P. (2007). Neural network-based artificial bandwidth expansion of speech. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 873–881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Li, K., & Lee, C.-H. (2015). A deep neural network approach to speech bandwidth expansion. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarGoogle ScholarCross RefCross Ref
  12. Nidadavolu, P. S., Iglesias, V., Villalba, J., & Dehak, N. (2019). Investigation on neural bandwidth extension of telephone speech for improved speaker recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarGoogle ScholarCross RefCross Ref
  13. Haws, D., & Cui, X. (2019). Cyclegan bandwidth extension acoustic modeling for Automatic Speech recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarGoogle ScholarCross RefCross Ref
  14. Eskimez, S. E., & Koishida, K. (2019). Speech super resolution generative adversarial network. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2019.8682215.Google ScholarGoogle ScholarCross RefCross Ref
  15. Wang, Y., Yu, G., Wang, J., Wang, H., & Zhang, Q. (2020). Improved relativistic cycle-consistent gan with dilated residual network and multi-attention for speech enhancement. IEEE Access, 8, 183272–183285.Google ScholarGoogle ScholarCross RefCross Ref
  16. Liu, G., Gong, K., Liang, X., & Chen, Z. (2020). CP-GAN: Context pyramid generative adversarial network for speech enhancement. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarGoogle ScholarCross RefCross Ref
  17. Sheng, P., Yang, Z., Hu, H., Tan, T., & Qian, Y. (2018). Data augmentation using conditional generative adversarial networks for robust speech recognition. 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).Google ScholarGoogle ScholarCross RefCross Ref
  18. Toumpanakis, D., & Adams, M. (2019). Generative Adversarial Network. Radiopaedia.org. https://doi.org/10.53347/rid-69034Google ScholarGoogle ScholarCross RefCross Ref
  19. Haidar, M. A., & Rezagholizadeh, M. (2021). Fine-tuning of pre-trained end-to-end speech recognition with generative adversarial networks. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp39728.2021.9413703.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.Google ScholarGoogle Scholar
  21. Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2015.7178964.Google ScholarGoogle ScholarCross RefCross Ref
  22. Gulati, Anmol, "Conformer: Convolution-augmented transformer for speech recognition." arXiv preprint arXiv:2005.08100 (2020).Google ScholarGoogle Scholar

Index Terms

  1. A Unified Mixed-Bandwidth ASR Framework with Generative Adversarial Network
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System
            August 2023
            215 pages
            ISBN:9798400708190
            DOI:10.1145/3622896

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 October 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)8
            • Downloads (Last 6 weeks)2

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format