Abstract:
The modeling of text queries as sequences of embeddings for conducting similarity matching based search within speech features has been recently shown to improve keyword ...Show MoreMetadata
Abstract:
The modeling of text queries as sequences of embeddings for conducting similarity matching based search within speech features has been recently shown to improve keyword search (KWS) performance, especially for the out-of-vocabulary (OOV) terms. This technique uses a dynamic time warping based search methodology, converting the KWS problem into a pattern search problem by artificially modeling the text queries as pronunciation-based embedding sequences. This query modeling is done by concatenating and repeating frame representations for each phoneme in the keyword's pronunciation. In this letter, we propose a query model that incorporates temporal context information using recurrent neural networks (RNN) trained to generate realistic query representations. With experiments conducted on the IARPA Babel Program's Turkish and Zulu datasets, we show that the proposed RNN-based query generation yields significant improvements over the statistical query models of earlier work, and yields a comparable performance to the state-of-the-art techniques for OOV KWS.
Published in: IEEE Signal Processing Letters ( Volume: 26, Issue: 1, January 2019)