ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Deep Template Matching for Small-Footprint and Configurable Keyword Spotting

Peng Zhang, Xueliang Zhang

Keyword spotting (KWS) is a very important technique for human–machine interaction to detect a trigger phrase and voice commands. In practice, a popular demand for KWS is to conveniently define the keywords by consumers or device vendors. In this paper, we propose a novel template matching approach for KWS based on end-to-end deep learning method, which utilizes an attention mechanism to match the input voice to the keyword templates in high-level feature space. The proposed approach only requires very limited voice samples (at least only one sample) to register a new keyword without any retraining. We conduct experiments on the publicly available Google speech commands dataset. The experimental results demonstrate that our method outperforms baseline methods while allowing for a flexible configuration.


doi: 10.21437/Interspeech.2020-1761

Cite as: Zhang, P., Zhang, X. (2020) Deep Template Matching for Small-Footprint and Configurable Keyword Spotting. Proc. Interspeech 2020, 2572-2576, doi: 10.21437/Interspeech.2020-1761

@inproceedings{zhang20v_interspeech,
  author={Peng Zhang and Xueliang Zhang},
  title={{Deep Template Matching for Small-Footprint and Configurable Keyword Spotting}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2572--2576},
  doi={10.21437/Interspeech.2020-1761}
}