research-article

A Unified Mixed-Bandwidth ASR Framework with Generative Adversarial Network

Author:
Lijuan Shi

China Telecom Research Institute, China

China Telecom Research Institute, China

0009-0002-7013-4541
View Profile

CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent SystemAugust 2023Pages 160–165https://doi.org/10.1145/3622896.3622923

Published:03 October 2023Publication History

CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System

Pages 160–165

ABSTRACT

The recognition of mixed-bandwidth audio presents a challenge for both academic and industrial fields, with potentially greater implications for the latter. In this paper, we present a unified ASR architecture for mixed-bandwidth audio's recognition, here we innovatively propose to use the generative adversarial network and two discriminators to help achieving the ability of recognizing mixed sampling audio and guaranteeing the performance of ASR system. Through the adaptive training process of trained generator and ASR system, the performance can be further improved. We conduct experiments on the libri-speech dataset and demonstrate that our method can successfully recognize mixed-bandwidth audio and improve the accuracy of the ASR system by 3.65% in the narrowband data. Overall, the proposed unified ASR architecture provides a promising solution for the recognition of mixed-bandwidth audio in various settings.

References

M. Song, Q. Zhang, J. Pan and Y. Yan, "Improving HMM/DNN in ASR of under-resourced languages using probabilistic sampling," 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), Chengdu, China, 2015, pp. 20-24, doi: 10.1109/ChinaSIP.2015.7230354.Google ScholarCross Ref
C. -T. Do, D. Pastor and A. Goalic, "On the Recognition of Cochlear Implant-Like Spectrally Reduced Speech With MFCC and HMM-Based ASR," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 1065-1068, July 2010, doi: 10.1109/TASL.2009.2032945.Google ScholarCross Ref
Moreno, P. J., & Stern, R. M. (n.d.). Sources of degradation of speech recognition in the Telephone Network. Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.Google Scholar
N. Morales, D. T. Toledano, J. H. L. Hansen and J. Garrido, "Feature Compensation Techniques for ASR on Band-Limited Speech," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 4, pp. 758-774, May 2009, doi: 10.1109/TASL.2008.2012321.Google ScholarDigital Library
Avendano, C., Hermansky, H., & Wan, E. A. (1995). Beyond nyquist: Towards the recovery of broad-bandwidth speech from narrow-bandwidth speech. 4th European Conference on Speech Communication and Technology (Eurospeech 1995).Google ScholarCross Ref
Bansal, D., Raj, B., & Smaragdis, P. (2005). Bandwidth expansion of narrowband speech using non-negative matrix factorization. Interspeech 2005.Google Scholar
Laaksonen, L., Kontio, J., & Alku, P. (n.d.). Artificial bandwidth expansion method to improve intelligibility and quality of amr-coded narrowband speech. Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.Google ScholarCross Ref
Kun-Youl Park, & Hyung Soon Kim. (n.d.). Narrowband to wideband conversion of speech using GMM based transformation. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).Google ScholarCross Ref
Bauer, P., & Fingscheidt, T. (2008). An HMM-based artificial bandwidth extension evaluated by cross-language training and test. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.Google ScholarCross Ref
Kontio, J., Laaksonen, L., & Alku, P. (2007). Neural network-based artificial bandwidth expansion of speech. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 873–881.Google ScholarDigital Library
Li, K., & Lee, C.-H. (2015). A deep neural network approach to speech bandwidth expansion. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarCross Ref
Nidadavolu, P. S., Iglesias, V., Villalba, J., & Dehak, N. (2019). Investigation on neural bandwidth extension of telephone speech for improved speaker recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarCross Ref
Haws, D., & Cui, X. (2019). Cyclegan bandwidth extension acoustic modeling for Automatic Speech recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarCross Ref
Eskimez, S. E., & Koishida, K. (2019). Speech super resolution generative adversarial network. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2019.8682215.Google ScholarCross Ref
Wang, Y., Yu, G., Wang, J., Wang, H., & Zhang, Q. (2020). Improved relativistic cycle-consistent gan with dilated residual network and multi-attention for speech enhancement. IEEE Access, 8, 183272–183285.Google ScholarCross Ref
Liu, G., Gong, K., Liang, X., & Chen, Z. (2020). CP-GAN: Context pyramid generative adversarial network for speech enhancement. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarCross Ref
Sheng, P., Yang, Z., Hu, H., Tan, T., & Qian, Y. (2018). Data augmentation using conditional generative adversarial networks for robust speech recognition. 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).Google ScholarCross Ref
Toumpanakis, D., & Adams, M. (2019). Generative Adversarial Network. Radiopaedia.org. https://doi.org/10.53347/rid-69034Google ScholarCross Ref
Haidar, M. A., & Rezagholizadeh, M. (2021). Fine-tuning of pre-trained end-to-end speech recognition with generative adversarial networks. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp39728.2021.9413703.Google ScholarCross Ref
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.Google Scholar
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2015.7178964.Google ScholarCross Ref
Gulati, Anmol, "Conformer: Convolution-augmented transformer for speech recognition." arXiv preprint arXiv:2005.08100 (2020).Google Scholar

Index Terms

A Unified Mixed-Bandwidth ASR Framework with Generative Adversarial Network

Index terms have been assigned to the content through auto-classification.

Recommendations

Emotions, speech and the ASR framework
Special issue on speech and emotion

Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of ...
Read More
Pitch adaptive MFCC features for improving children's mismatched ASR

A pitch normalization algorithm is proposed for addressing the pitch mismatch between adults' and children's speech for children's automatic speech recognition (ASR). Motivated by the appearance of pitch-dependent distortions in the smoothed mel ...
Read More
A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

We propose a novel framework for noise robust automatic speech recognition (ASR) based on cochlear implant-like spectrally reduced speech (SRS). Two experimental protocols (EPs) are proposed in order to clarify the advantage of using SRS for noise ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System
August 2023
215 pages
ISBN:9798400708190
DOI:10.1145/3622896

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 8
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Unified Mixed-Bandwidth ASR Framework with Generative Adversarial Network

CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System

ABSTRACT

References

Cited By

Index Terms

Recommendations

Emotions, speech and the ASR framework

Pitch adaptive MFCC features for improving children's mismatched ASR

A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Unified Mixed-Bandwidth ASR Framework with Generative Adversarial Network

CCRIS '23: Proceedings of the 2023 4th International Conference on Control, Robotics and Intelligent System

ABSTRACT

References

Cited By

Index Terms

Recommendations

Emotions, speech and the ASR framework

Pitch adaptive MFCC features for improving children's mismatched ASR

A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media