Abstract:
We present a new method to augment the correct transcript from automatic speech recognition (ASR) output containing multiple hypotheses. The error-prone ASR process is ta...Show MoreMetadata
Abstract:
We present a new method to augment the correct transcript from automatic speech recognition (ASR) output containing multiple hypotheses. The error-prone ASR process is taken as black box and modeled as a noisy channel on phoneme level. The probabilities of the individual phoneme errors are assigned according to phonetic confusability. We score potential candidate hypotheses by their posterior probability of being the channel input given the competing ASR hypotheses as observed output. The resulting scores provide useful information not included in traditional confidence measures. We investigated the usefulness of the method for rescoring, re-ranking and word error detection. The method alone is not powerful enough to improve the recognition results, but by employing a decision tree classifier it is possible to isolate cases where the method works very well. Our results show that the combination with other knowledge sources and postprocessing techniques can lead to promising improvements.
Published in: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-09 May 2014
Date Added to IEEE Xplore: 14 July 2014
Electronic ISBN:978-1-4799-2893-4