Skip to main content
Log in

Speech recognition for mobile devices

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This article presents an overview of different approaches for providing automatic speech recognition (ASR) technology to mobile users. Three principal system architectures with respect to the employment of a wireless communication link are analyzed: Embedded Speech Recognition Systems, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR). An overview of the solutions having been standardized so far as well as a critical analysis of the latest developments in the field of speech recognition in mobile environments is given. Open issues, pros and cons of the different methodologies and techniques are highlighted. Special emphasis is placed on the constraints and limitations ASR applications are confronted with under different architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • 3GPP (2004). Recognition performance evaluations of codecs for speech enabled services (SES) (3GPP TR 26.943).

  • Bocchieri, E. (2008). In Automatic speech recognition on mobile devices and over communication networks (advances in pattern recognition), Fixed-point arithmetic (pp. 255–274). Berlin: Springer.

    Chapter  Google Scholar 

  • ETSI (2002). Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm (ETSI Standard ES 202 050).

  • Fingscheidt, T., & Vary, P. (2001). Softbit speech decoding: A new approach to error concealment. IEEE Transactions on Speech and Audio Processing, 9(3), 240–251.

    Article  Google Scholar 

  • Gartner (2009). Gartner says worldwide smartphone sales reached its lowest growth rate with 3.7 per cent increase in fourth quarter of 2008. Press release.

  • Hacioglu, K., & Pellom, B. (2003). A distributed architecture for robust automatic speech recognition. In Proc. ICASSP (Vol. 1, pp. 328–331).

  • Hagen, A., Pellom, B., & Connors, D. A. (2003). Analysis and design of architecture systems for speech recognition on modern handheld-computing devices. In Proc. of the 11th international symposium on hardware/software codesign.

  • Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proc. ISCA ITRW ASR2000 (pp. 181–188), Paris, France.

  • Huerta, J. M. (2000). Speech recognition in mobile environments. PhD thesis, Carnegie Mellon University.

  • Informa Telecoms & Media (2007). Super 3g mobile handsets set to top global market share by 2012. Press release.

  • Intel (2006). Intel performance libraries. http://www.intel.com/cd/software/products/asmo-na/eng/perflib/index.htm.

  • Ion, V., & Haeb-Umbach, R. (2005). A unified probabilistic approach to error concealment for distributed speech recognition. In Proc. interspeech 2005 ICSLP.

  • James, A., & Milner, B. (2005). Soft decoding of temporal derivatives for robust distributed speech recognition in packet loss. In Proc. ICASSP (Vol. 1, pp. 345–348).

  • Köhler, T. W., Fügen, C., Stüker, S., & Waibel, A. (2005). Rapid porting of ASR-systems to mobile devices. In Proc. of the 9th European conference on speech communication and technology (pp. 233–236).

  • Market Intelligence Center (2008). Global mobile phone subscribers forecasted to reach 4.5 billion by 2012. Press Release.

  • Novak, M. (2004). Towards large vocabulary ASR on embedded platforms. In Proc. interspeech 2004 ICSLP.

  • Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proc. ICASSP (Vol. 1, pp. 200–203).

  • Odell, J., Ollason, D., Woodland, P., Young, S., & Jansen, J. (1995). The HTK book for HTK V2.0. Cambridge: Cambridge University Press.

    Google Scholar 

  • Ortmanns, S., Firzlaff, T., & Ney, H. (1997). Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition. In Proc. Eurospeech’97 (pp. 139–142), Rhodes, Greece.

  • Paliwal, K. K., & So, S. (2004). Scalable distributed speech recognition using multi-frame GMM-based block quantization. In Proc. interspeech 2004 ICSLP.

  • Peláez-Moreno, C., Gallardo-Antolín, A., & Díaz-de-María, F. (2001). Recognizing voice over IP: A robust front-end for speech recognition on the world wide web. IEEE Transactions on Multimedia 3(2).

  • Pellom, B., & Hacioglu, K. (2001). Sonic: The university of Colorado continuous speech recognition system (Technical Report TR-CSLR-2001-01). University of Colorado.

  • Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Raj, B., Migdal, J., & Singh, R. (2001). Distributed speech recognition with codec parameters. In Proc. ASRU’2001.

  • Rose, R. C., & Partharathy, S. (2002). A tutorial on ASR for wireless mobile devices. In ICSLP.

  • Rose, R., Arizmendi, I., & Parthasarathy, S. (2003). An efficient framework for robust mobile speech recognition services. In Proc. ICASSP (Vol. 1, pp. 316–319).

  • Schmitt, A., Hank, C., & Liscombe, J. (2008). Detecting problematic calls with automated agents. In 4th IEEE tutorial and research workshop perception and interactive technologies for speech-based systems, Irsee, Germany.

  • So, S., & Paliwal, K. K. (2004). Scalable distributed speech recognition using multi-frame gmm-based block quantization. In Proc. int. conf. spoken language processing, Jeju, Korea.

  • Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. In Proc. ICASSP (Vol. 5, pp. 113–116).

  • Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., & Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition (Technical Report TR-2004-139). Sun Microsystems Laboratories.

  • Zaykovskiy, D., & Schmitt, A. (2007). Java to micro edition front-end for distributed speech recognition systems. In The 2007 IEEE international symposium on ubiquitous computing and intelligence (UCI’07), Niagara Falls, Canada.

  • Zaykovskiy, D., & Schmitt, A. (2008). Java vs. Symbian: A comparison of software-based DSR implementations on mobile phones. In 4th IET international conference on intelligent environments, Seattle, USA.

  • Zaykovskiy, D., Schmitt, A., & Lutz, M. (2007). New use of mobile phones: Towards multimodal information access systems. In 3rd IET international conference on intelligent environments, Ulm, Germany.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Schmitt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmitt, A., Zaykovskiy, D. & Minker, W. Speech recognition for mobile devices. Int J Speech Technol 11, 63–72 (2008). https://doi.org/10.1007/s10772-009-9036-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9036-6

Keywords

Navigation