As voice-driven intelligent assistants become commonplace, adaptation
to user context becomes critical for Automatic Speech Recognition (ASR)
systems. For example, ASR systems may be expected to recognize a user’s
contact names containing improbable or out-of-vocabulary (OOV) words.
We introduce a method to identify contextual cues in a first-pass
ASR system’s output and to recover out-of-lattice hypotheses
that are contextually relevant. Our proposed module is agnostic to
the architecture of the underlying recognizer, provided it generates
a word lattice of hypotheses; it is sufficiently compact for use on
device. The module identifies subgraphs in the lattice likely to contain
named entities (NEs), recovers phoneme hypotheses over corresponding
time spans, and inserts NEs that are phonetically close to those hypotheses.
We measure a decrease in the mean word error rate (WER) of word lattices
from 11.5% to 4.9% on a test set of NEs.
Cite as: Serrino, J., Velikovich, L., Aleksic, P., Allauzen, C. (2019) Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition. Proc. Interspeech 2019, 3830-3834, doi: 10.21437/Interspeech.2019-2962
@inproceedings{serrino19_interspeech, author={Jack Serrino and Leonid Velikovich and Petar Aleksic and Cyril Allauzen}, title={{Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={3830--3834}, doi={10.21437/Interspeech.2019-2962} }