ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting

Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao, Jie Gao

Due to limited resource on devices and complicated scenarios, a compact model with high precision, low computational cost and latency is expected for small-footprint keyword spotting tasks. To fulfill these requirements, in this paper, compact Feed-forward Sequential Memory Network (cFSMN) which combines low-rank matrix factorization with conventional FSMN is investigated for a far-field keyword spotting task. The effect of its architecture parameters is analyzed. Towards achieving lower computational cost, multiframe prediction (MFP) is applied to cFSMN. For enhancing the modeling capacity, an advanced MFP is attempted by inserting small DNN layers before output layers. The performance is measured by area under the curve (AUC) for detection error tradeoff (DET) curves. The experiments show that compared with a well-tuned long short-term memory (LSTM) which needs the same latency and twofold computational cost, the cFSMN achieves 18.11% and 29.21% AUC relative decreases on the test sets which are recorded in quiet and noisy environment respectively. After applying advanced MFP, the system gets 0.48% and 20.04% AUC relative decrease over conventional cFSMN on the quiet and noisy test sets respectively, while the computational cost relatively reduces 46.58%.


doi: 10.21437/Interspeech.2018-1204

Cite as: Chen, M., Zhang, S., Lei, M., Liu, Y., Yao, H., Gao, J. (2018) Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting. Proc. Interspeech 2018, 2663-2667, doi: 10.21437/Interspeech.2018-1204

@inproceedings{chen18c_interspeech,
  author={Mengzhe Chen and ShiLiang Zhang and Ming Lei and Yong Liu and Haitao Yao and Jie Gao},
  title={{Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2663--2667},
  doi={10.21437/Interspeech.2018-1204}
}