Skip to main content
Log in

Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

We have developed a memory access reduced VLSI chip for 5,000 word speaker-independent continuous speech recognition. This chip employs a context-dependent HMM (hidden Markov model) based speech recognition algorithm, and contains parallel and pipelined hardware units for emission probability computation and Viterbi beam search. To maximize the performance, we adopted several memory access reduction techniques such as sub-vector clustering and multi-block processing for the emission probability computation. We also employed a custom DRAM controller for efficient access of consecutive data. Moreover, we analyzed the access pattern of data to minimize the internal SRAM size while maintaining high performance. The experimental results show that the implemented system performs speech recognition 2.4 and 1.8 times faster than real-time utilizing 32-bit DDR SDRAM and SDR SDRAM, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Pazhayaveetil, U., Chandra, D., & Franzon, P. (2007). Flexible low power probability density estimation unit for speech recognition. In Proc. IEEE international symposium on circuits and systems (pp. 1117–1120).

  2. Mathew, B., Davis, A., & Fang, Z. (2003). A low-power accelerator for the SPHINX 3 speech recognition system. In CASES ’03: Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems (pp. 210–219).

  3. Nedevschi, S., Patra, R. K., & Brewer, E. A. (2005). Hardware speech recognition for user interfaces in low cost, low power devices. In DAC ’05: Proceedings of the 42nd annual design automation conference (pp. 684–689).

  4. Schuster, J., Gupta, K., Hoare, R., & Jones, A. K. (2006). Speech silicon: An FPGA architecture for real-time hidden Markov-model-based speech recognition. EURASIP Journal on Embedded Systems.

  5. Lin, E. C., Yu, K., Rutenbar, R. A., & Chen, T. (2007). A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In FPGA ’07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on field programmable gate arrays (pp. 60–68).

  6. Choi, Y., You, K., & Sung, W. (2008). FPGA-based implementation of a real-time 5000-word continuous speech recognizer. In Proc. 16th European signal processing conference.

  7. Lin, E. C., & Rutenbar, R. A. (2009). A multi-FPGA 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer. In FPGA ’09: Proceeding of the ACM/SIGDA international symposium on field programmable gate arrays (pp. 83–92).

  8. Choi, Y., You, K., Choi, J., & Sung, W. (2009). VLSI for 5000-word continuous speech recognition. In Proc. IEEE international conference on acoustics, speech, and signal processing (pp. 557–560).

  9. Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  10. Pellom, B. L., Sarikaya, R., & Hansen, J. H. L. (2001). Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition. IEEE Signal Processing Letters, 8(8), 221–224.

    Article  Google Scholar 

  11. Ney, H., & Ortmanns, S. (1999). Dynamic programming search for continuous speech recognition. IEEE Signal Processing Magazine, 16(5), 64–83.

    Article  Google Scholar 

  12. Bocchieri, E., & Mak, B. K.-W. (2001). Subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing, 9(3), 264–275.

    Article  Google Scholar 

  13. Ravishankar, M., Bisiani, R., & Thayer, E. (1997). Sub-vector clustering to improve the memory and speed performance of the acoustic likelihood computation. In Proc. eurospeech ’97, Rhodes, Greece (pp. 151–154).

  14. Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 84–94.

    Article  Google Scholar 

  15. Li, X.-B., Soong, F. K., Myrvoll, T. A., & Wang, R.-H. (2005). Optimal clustering and non-uniform allocation of Gaussian kernels in scalar dimension for HMM compression. In Proc. IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 669–672).

  16. Toradex (2006). Colibri XScale PXA270 data sheet. http://www.toradex.com.

  17. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2005). The HTK book version 3.3. Cambridge: Cambridge University Engineering Department.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Human Resource Development Project for IT SoC Architect and the Brain Korea 21 Project. This work was also supported in part by the Korea Science and Engineering Foundation(KOSEF) grant funded by the Korea government(MEST) (No. 0414-20090017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kisun You.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, K., Choi, Yk., Choi, J. et al. Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition. J Sign Process Syst 63, 95–105 (2011). https://doi.org/10.1007/s11265-009-0436-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-009-0436-2

Keywords

Navigation