ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition

Li Chai, Jun Du, Chin-Hui Lee

One challenging problem of robust automatic speech recognition (ASR) is how to measure the goodness of a speech enhancement algorithm without calculating word error rate (WER) due to the high costs of manual transcriptions, language modeling and decoding process. In this study, a novel cross-entropy-guided (CEG) measure is proposed for assessing if enhanced speech predicted by a speech enhancement algorithm would produce a good performance for robust ASR. CEG consists of three consecutive steps, namely the low-level representations via the feature extraction, high-level representations via the nonlinear mapping with the acoustic model, and the final CEG calculation between the high-level representations of clean and enhanced speech. Specifically, state posterior probabilities from the output of the neural network for the acoustic model are adopted as the high-level representations and a cross-entropy criterion is used to calculate CEG. Experimental results show that CEG could consistently yield the highest correlations with WER and achieve the most accurate assessment of the ASR performance when compared to distortion measures based on human auditory perception and an acoustic confidence measure. Potentially, CEG could be adopted to guide the parameter optimization of deep learning based speech enhancement algorithms to further improve the ASR performance.


doi: 10.21437/Interspeech.2019-2511

Cite as: Chai, L., Du, J., Lee, C.-H. (2019) A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition. Proc. Interspeech 2019, 3431-3435, doi: 10.21437/Interspeech.2019-2511

@inproceedings{chai19b_interspeech,
  author={Li Chai and Jun Du and Chin-Hui Lee},
  title={{A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3431--3435},
  doi={10.21437/Interspeech.2019-2511}
}