Abstract:
We present new acoustic confidence scores for utterance verification based on novel combinations of phone-level posterior probability statistics. A common utterance acous...Show MoreMetadata
Abstract:
We present new acoustic confidence scores for utterance verification based on novel combinations of phone-level posterior probability statistics. A common utterance acoustic confidence score used in the literature is the arithmetic mean (computed over the utterance) of the phone log posterior probabilities. This approach can be problematic when a large part of the utterance is in-grammar (IG), but a small part is out-of-grammar (OOG). For example, a caller says an OOG name "Larry" and is incorrectly recognized as an IG name "Harry". Since most phones were correctly recognized, the mean of the phone posteriors gives a high utterance level score even though the recognition result should ideally be rejected. We introduce additional statistics, such as the variance and low percentile points of the phone-posterior scores over the utterance, that help in capturing the deviation of otherwise good recognition matches. We report on our experiments on combining these statistics. In particular, by normalizing the mean with the standard deviation, we achieved a 10-20% relative improvement in performance for alpha-digit test sets where OOG utterances are often incorrectly recognized as very similar IG ones.
Published in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
Date of Conference: 06-10 April 2003
Date Added to IEEE Xplore: 21 May 2003
Print ISBN:0-7803-7663-3
Print ISSN: 1520-6149