Speech Audio Super-Resolution for Speech Recognition

Li, Xinyu; Chebiyyam, Venkata; Kirchhoff, Katrin

doi:10.21437/Interspeech.2019-3043

Speech Audio Super-Resolution for Speech Recognition

Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff

Automatic bandwidth extension (restoring high-frequency information from low sample rate audio) has a number of applications in speech processing. We introduce an end-to-end deep learning based system for speech bandwidth extension for use in a downstream automatic speech recognition (ASR) system. Specifically we propose a conditional generative adversarial network enriched with ASR-specific loss functions designed to upsample the speech audio while maintaining good ASR performance. Evaluations on the speech commands dataset and the LibriSpeech corpus show that our approach outperforms a number of traditional bandwidth extension methods with respect to word error rate.

doi: 10.21437/Interspeech.2019-3043

Cite as: Li, X., Chebiyyam, V., Kirchhoff, K. (2019) Speech Audio Super-Resolution for Speech Recognition. Proc. Interspeech 2019, 3416-3420, doi: 10.21437/Interspeech.2019-3043

@inproceedings{li19q_interspeech,
  author={Xinyu Li and Venkata Chebiyyam and Katrin Kirchhoff},
  title={{Speech Audio Super-Resolution for Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3416--3420},
  doi={10.21437/Interspeech.2019-3043}
}