Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

You, Kisun; Choi, Young-kyu; Choi, Jungwook; Sung, Wonyong

doi:10.1007/s11265-009-0436-2

Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Published: 27 November 2009

Volume 63, pages 95–105, (2011)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Kisun You¹,
Young-kyu Choi¹,
Jungwook Choi¹ &
…
Wonyong Sung¹

242 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

We have developed a memory access reduced VLSI chip for 5,000 word speaker-independent continuous speech recognition. This chip employs a context-dependent HMM (hidden Markov model) based speech recognition algorithm, and contains parallel and pipelined hardware units for emission probability computation and Viterbi beam search. To maximize the performance, we adopted several memory access reduction techniques such as sub-vector clustering and multi-block processing for the emission probability computation. We also employed a custom DRAM controller for efficient access of consecutive data. Moreover, we analyzed the access pattern of data to minimize the internal SRAM size while maintaining high performance. The experimental results show that the implemented system performs speech recognition 2.4 and 1.8 times faster than real-time utilizing 32-bit DDR SDRAM and SDR SDRAM, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Article Open access 21 July 2021

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Article 25 September 2019

References

Pazhayaveetil, U., Chandra, D., & Franzon, P. (2007). Flexible low power probability density estimation unit for speech recognition. In Proc. IEEE international symposium on circuits and systems (pp. 1117–1120).
Mathew, B., Davis, A., & Fang, Z. (2003). A low-power accelerator for the SPHINX 3 speech recognition system. In CASES ’03: Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems (pp. 210–219).
Nedevschi, S., Patra, R. K., & Brewer, E. A. (2005). Hardware speech recognition for user interfaces in low cost, low power devices. In DAC ’05: Proceedings of the 42nd annual design automation conference (pp. 684–689).
Schuster, J., Gupta, K., Hoare, R., & Jones, A. K. (2006). Speech silicon: An FPGA architecture for real-time hidden Markov-model-based speech recognition. EURASIP Journal on Embedded Systems.
Lin, E. C., Yu, K., Rutenbar, R. A., & Chen, T. (2007). A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In FPGA ’07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on field programmable gate arrays (pp. 60–68).
Choi, Y., You, K., & Sung, W. (2008). FPGA-based implementation of a real-time 5000-word continuous speech recognizer. In Proc. 16th European signal processing conference.
Lin, E. C., & Rutenbar, R. A. (2009). A multi-FPGA 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer. In FPGA ’09: Proceeding of the ACM/SIGDA international symposium on field programmable gate arrays (pp. 83–92).
Choi, Y., You, K., Choi, J., & Sung, W. (2009). VLSI for 5000-word continuous speech recognition. In Proc. IEEE international conference on acoustics, speech, and signal processing (pp. 557–560).
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Englewood Cliffs: Prentice-Hall.
Google Scholar
Pellom, B. L., Sarikaya, R., & Hansen, J. H. L. (2001). Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition. IEEE Signal Processing Letters, 8(8), 221–224.
Article Google Scholar
Ney, H., & Ortmanns, S. (1999). Dynamic programming search for continuous speech recognition. IEEE Signal Processing Magazine, 16(5), 64–83.
Article Google Scholar
Bocchieri, E., & Mak, B. K.-W. (2001). Subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing, 9(3), 264–275.
Article Google Scholar
Ravishankar, M., Bisiani, R., & Thayer, E. (1997). Sub-vector clustering to improve the memory and speed performance of the acoustic likelihood computation. In Proc. eurospeech ’97, Rhodes, Greece (pp. 151–154).
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–94.
Article Google Scholar
Li, X.-B., Soong, F. K., Myrvoll, T. A., & Wang, R.-H. (2005). Optimal clustering and non-uniform allocation of Gaussian kernels in scalar dimension for HMM compression. In Proc. IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 669–672).
Toradex (2006). Colibri XScale PXA270 data sheet. http://www.toradex.com.
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2005). The HTK book version 3.3. Cambridge: Cambridge University Engineering Department.
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Human Resource Development Project for IT SoC Architect and the Brain Korea 21 Project. This work was also supported in part by the Korea Science and Engineering Foundation(KOSEF) grant funded by the Korea government(MEST) (No. 0414-20090017).

Author information

Authors and Affiliations

School of Electrical Engineering, Seoul National University, Seoul, 151-744, South Korea
Kisun You, Young-kyu Choi, Jungwook Choi & Wonyong Sung

Authors

Kisun You
View author publications
You can also search for this author in PubMed Google Scholar
Young-kyu Choi
View author publications
You can also search for this author in PubMed Google Scholar
Jungwook Choi
View author publications
You can also search for this author in PubMed Google Scholar
Wonyong Sung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kisun You.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, K., Choi, Yk., Choi, J. et al. Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition. J Sign Process Syst 63, 95–105 (2011). https://doi.org/10.1007/s11265-009-0436-2

Download citation

Received: 14 July 2009
Revised: 22 October 2009
Accepted: 10 November 2009
Published: 27 November 2009
Issue Date: April 2011
DOI: https://doi.org/10.1007/s11265-009-0436-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Abstract

Access this article

Similar content being viewed by others

Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

Abstract

Access this article

Similar content being viewed by others

Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation