Latency is a crucial metric for streaming speech recognition systems.
In this paper, we reduce latency by fetching responses early based
on the partial recognition results and refer to it as prefetching.
Specifically, prefetching works by submitting partial recognition results
for subsequent processing such as obtaining assistant server responses
or second-pass rescoring before the recognition result is finalized.
If the partial result matches the final recognition result, the early
fetched response can be delivered to the user instantly. This effectively
speeds up the system by saving the execution latency that typically
happens after recognition is completed.
Prefetching can be
triggered multiple times for a single query, but this leads to multiple
rounds of downstream processing and increases the computation costs.
It is hence desirable to fetch the result sooner but meanwhile limiting
the number of prefetches. To achieve the best trade-off between latency
and computation cost, we investigated a series of prefetching decision
models including decoder silence based prefetching, acoustic silence
based prefetching and end-to-end prefetching.
In this paper, we
demonstrate the proposed prefetching mechanism reduced latency by ~200
ms for a system that consists of a streaming first pass model using
recurrent neural network transducer and a non-streaming second pass
rescoring model using Listen, Attend and Spell. We observe that the
end-to-end prefetching provides the best trade-off between cost and
latency and is 120 ms faster compared to silence based prefetching
at a fixed prefetch rate.
Cite as: Chang, S.-Y., Li, B., Rybach, D., He, Y., Li, W., Sainath, T.N., Strohman, T. (2020) Low Latency Speech Recognition Using End-to-End Prefetching. Proc. Interspeech 2020, 1962-1966, doi: 10.21437/Interspeech.2020-1898
@inproceedings{chang20b_interspeech, author={Shuo-Yiin Chang and Bo Li and David Rybach and Yanzhang He and Wei Li and Tara N. Sainath and Trevor Strohman}, title={{Low Latency Speech Recognition Using End-to-End Prefetching}}, year=2020, booktitle={Proc. Interspeech 2020}, pages={1962--1966}, doi={10.21437/Interspeech.2020-1898} }