Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

You, Kisun; Choi, Jungwook; Sung, Wonyong

doi:10.1007/s11265-011-0587-9

Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Published: 15 May 2011

Volume 66, pages 235–244, (2012)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Kisun You¹,
Jungwook Choi¹ &
Wonyong Sung¹

325 Accesses
5 Citations
Explore all metrics

Abstract

Hardware implementation of speech recognition can not only accelerate decoding speed for real-time processing but also reduce the power consumption. Recently the weighted finite state transducer (WFST) has emerged as a major recognition network representation because it reduces the algorithmic complexity of decoding procedures by applying many optimizations on the network in offline. However, hardware implementation of continuous speech recognition (CSR) with the WFST network is challenging, mainly because Viterbi search should traverse a large sized network with limited hardware resources. This paper presents two hardware speech recognition systems with the WFST network. The first one, which is called the SRAM-oriented system, utilizes the internal SRAM as a hash table to efficiently manage active working set. This system is flexible because it can easily accommodate different speech recognition tasks as long as the SRAM space is allowed. For easy expansion, we also propose the DRAM-oriented system where the active working set is stored in the external DRAM. To hide long latency of DRAM access, a split DRAM hash table is employed, which stores active working set in the opened rows of DRAM to reduce the number of row misses. Experimental results show that the SRAM-oriented system decodes the 5k-word CSR task 4.93 times faster than real-time, while the DRAM-oriented system runs 4.48 times faster than real-time with only about a half SRAM capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A survey on LSTM memristive neural network architectures and applications

Article 14 October 2019

Kamilya Smagulova & Alex Pappachen James

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Yogesh Kumar, Apeksha Koul & Chamkaur Singh

References

Mathew, B., Davis, A., & Fang, Z. (2003). A low-power accelerator for the SPHINX 3 speech recognition system. In CASES ’03: Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems (pp. 210–219).
Pazhayaveetil, U., Chandra, D., & Franzon, P. (2007). Flexible low power probability density estimation unit for speech recognition. In IEEE international symposium on circuits and systems, 2007. ISCAS 2007 (pp. 1117–1120).
Lin, E. C., Yu, K., & Rutenbar, R. A. (2007). A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In FPGA ’07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on field programmable gate arrays (pp. 60–68).
Schuster, J., Gupta, K., Hoare, R., & Jones, A. K. (2006). Speech silicon: An FPGA architecture for real-time hidden Markov-model-based speech recognition. EURASIP Journal on Embedded Systems, 2006, 1–19.
Article Google Scholar
Choi, Y., You, K., & Sung, W. (2008). FPGA-based implementation of a real-time 5000-word continuous speech recognizer. In 16th European signal processing conference.
Choi, Y., You, K., Choi, J., & Sung, W. (2009). VLSI for 5000-word continuous speech recognition. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 557–560).
You, K., Choi, Y., Choi, J., & Sung, W. (2011). Memory access optimized VLSI for 5000-word continuous speech recognition. Journal of Signal Processing Systems, 63(1), 95–105.
Article Google Scholar
Lin, E. C., & Rutenbar, R. A. (2009). A multi-FPGA 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer. In FPGA ’09: Proceeding of the ACM/SIGDA international symposium on field programmable gate arrays (pp. 83–92).
Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite state transducers in speech recognition. Computer Speech and Language, 16, 69–88.
Article Google Scholar
Kanthak, S., Ney, H., Riley, M., & Mohri, M. (2002). A comparison of two LVR search optimization techniques. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 1309–1312).
You, K., Chong, J., Yi, Y., Gonina, E., Hughes, C., Chen, Y.-K., Sung, W., et al. (2009). Parallel scalability in speech recognition. IEEE Signal Processing Magazine, 26(6), 124–135.
Article Google Scholar
Ravishankar, M., Bisiani, R., & Thayer, E. (1997). Sub-vector clustering to improve the memory and speed performance of the acoustic likelihood computation. In Proc. Eurospeech ’97, Rhodes, Greece, pp. 151–154.
Google Scholar
Bocchieri, E., & Mak, B. K.-W. (2001). Subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing, 9(3), 264–275.
Article Google Scholar
Choi, Y., You, K., Choi, J., & Sung, W. (2010). A real-time FPGA-based 20000-word speech recognizer with optimized DRAM access. IEEE Transactions on Circuits and Systems Part I: Regular Papers, 57(8), 2119–2131.
Article MathSciNet Google Scholar
Xilinx. ML505/ML506/ML507 evaluation platform, user guide 347. http://www.xilinx.com/support/documentation/boards_and_kits/ug347.pdf.
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al. (2005). The HTK book version 3.3.
Allauzen, C., Mohri, M., Riley, M., & Roark, B. (2004). A generalized construction of integrated speech recognition transducers. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 761–764).
Choi, J., You, K., & Sung, W. (2010). An FPGA implementation of speech recognition with weighted finite state transducers. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1602–1605).

Download references

Acknowledgements

This work was supported in part by the Brain Korea 21 Project and the National Research Foundation of Korea(NRF) grants funded by the Korea government(MEST) (No. 20090075770 and 20090084804).

Author information

Authors and Affiliations

School of Electrical Engineering, Seoul National University, Seoul, 151-744, Korea
Kisun You, Jungwook Choi & Wonyong Sung

Authors

Kisun You
View author publications
You can also search for this author in PubMed Google Scholar
Jungwook Choi
View author publications
You can also search for this author in PubMed Google Scholar
Wonyong Sung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kisun You.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, K., Choi, J. & Sung, W. Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers. J Sign Process Syst 66, 235–244 (2012). https://doi.org/10.1007/s11265-011-0587-9

Download citation

Received: 03 October 2010
Revised: 11 April 2011
Accepted: 11 April 2011
Published: 15 May 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s11265-011-0587-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A survey on LSTM memristive neural network architectures and applications

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A survey on LSTM memristive neural network architectures and applications

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation