Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing

Garg, Abhinav; Vadisetti, Gowtham P.; Gowda, Dhananjaya; Jin, Sichen; Jayasimha, Aditya; Han, Youngho; Kim, Jiyeon; Park, Junmo; Kim, Kwangyoun; Kim, Sooyeon; Lee, Young-yoon; Min, Kyungbo; Kim, Chanwoo

doi:10.21437/Interspeech.2020-3172

Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing

Abhinav Garg, Gowtham P. Vadisetti, Dhananjaya Gowda, Sichen Jin, Aditya Jayasimha, Youngho Han, Jiyeon Kim, Junmo Park, Kwangyoun Kim, Sooyeon Kim, Young-yoon Lee, Kyungbo Min, Chanwoo Kim

In this paper, we present our streaming on-device end-to-end speech recognition solution for a privacy sensitive voice-typing application which primarily involves typing user private details and passwords. We highlight challenges specific to voice-typing scenario in the Korean language and propose solutions to these problems within the framework of a streaming attention-based speech recognition system. Some important challenges in voice-typing are the choice of output units, coupling of multiple characters into longer byte-pair encoded units, lack of sufficient training data. Apart from customizing a high accuracy open domain streaming speech recognition model for voice-typing applications, we retain the performance of the model for open domain tasks without significant degradation. We also explore domain biasing using a shallow fusion with a weighted finite state transducer (WFST). We obtain approximately 13% relative word error rate (WER) improvement on our internal Korean voice-typing dataset without a WFST and about 30% additional WER improvement with a WFST fusion.

doi: 10.21437/Interspeech.2020-3172

Cite as: Garg, A., Vadisetti, G.P., Gowda, D., Jin, S., Jayasimha, A., Han, Y., Kim, J., Park, J., Kim, K., Kim, S., Lee, Y.-y., Min, K., Kim, C. (2020) Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing. Proc. Interspeech 2020, 3371-3375, doi: 10.21437/Interspeech.2020-3172

@inproceedings{garg20b_interspeech,
  author={Abhinav Garg and Gowtham P. Vadisetti and Dhananjaya Gowda and Sichen Jin and Aditya Jayasimha and Youngho Han and Jiyeon Kim and Junmo Park and Kwangyoun Kim and Sooyeon Kim and Young-yoon Lee and Kyungbo Min and Chanwoo Kim},
  title={{Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3371--3375},
  doi={10.21437/Interspeech.2020-3172}
}