Speech recognition for mobile devices

Schmitt, Alexander; Zaykovskiy, Dmitry; Minker, Wolfgang

doi:10.1007/s10772-009-9036-6

Speech recognition for mobile devices

Published: 29 July 2009

Volume 11, pages 63–72, (2008)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Alexander Schmitt¹,
Dmitry Zaykovskiy¹ &
Wolfgang Minker¹

457 Accesses
8 Citations
Explore all metrics

Abstract

This article presents an overview of different approaches for providing automatic speech recognition (ASR) technology to mobile users. Three principal system architectures with respect to the employment of a wireless communication link are analyzed: Embedded Speech Recognition Systems, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR). An overview of the solutions having been standardized so far as well as a critical analysis of the latest developments in the field of speech recognition in mobile environments is given. Open issues, pros and cons of the different methodologies and techniques are highlighted. Special emphasis is placed on the constraints and limitations ASR applications are confronted with under different architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

3GPP (2004). Recognition performance evaluations of codecs for speech enabled services (SES) (3GPP TR 26.943).
Bocchieri, E. (2008). In Automatic speech recognition on mobile devices and over communication networks (advances in pattern recognition), Fixed-point arithmetic (pp. 255–274). Berlin: Springer.
Chapter Google Scholar
ETSI (2002). Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm (ETSI Standard ES 202 050).
Fingscheidt, T., & Vary, P. (2001). Softbit speech decoding: A new approach to error concealment. IEEE Transactions on Speech and Audio Processing, 9(3), 240–251.
Article Google Scholar
Gartner (2009). Gartner says worldwide smartphone sales reached its lowest growth rate with 3.7 per cent increase in fourth quarter of 2008. Press release.
Hacioglu, K., & Pellom, B. (2003). A distributed architecture for robust automatic speech recognition. In Proc. ICASSP (Vol. 1, pp. 328–331).
Hagen, A., Pellom, B., & Connors, D. A. (2003). Analysis and design of architecture systems for speech recognition on modern handheld-computing devices. In Proc. of the 11th international symposium on hardware/software codesign.
Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proc. ISCA ITRW ASR2000 (pp. 181–188), Paris, France.
Huerta, J. M. (2000). Speech recognition in mobile environments. PhD thesis, Carnegie Mellon University.
Informa Telecoms & Media (2007). Super 3g mobile handsets set to top global market share by 2012. Press release.
Intel (2006). Intel performance libraries. http://www.intel.com/cd/software/products/asmo-na/eng/perflib/index.htm.
Ion, V., & Haeb-Umbach, R. (2005). A unified probabilistic approach to error concealment for distributed speech recognition. In Proc. interspeech 2005 ICSLP.
James, A., & Milner, B. (2005). Soft decoding of temporal derivatives for robust distributed speech recognition in packet loss. In Proc. ICASSP (Vol. 1, pp. 345–348).
Köhler, T. W., Fügen, C., Stüker, S., & Waibel, A. (2005). Rapid porting of ASR-systems to mobile devices. In Proc. of the 9th European conference on speech communication and technology (pp. 233–236).
Market Intelligence Center (2008). Global mobile phone subscribers forecasted to reach 4.5 billion by 2012. Press Release.
Novak, M. (2004). Towards large vocabulary ASR on embedded platforms. In Proc. interspeech 2004 ICSLP.
Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proc. ICASSP (Vol. 1, pp. 200–203).
Odell, J., Ollason, D., Woodland, P., Young, S., & Jansen, J. (1995). The HTK book for HTK V2.0. Cambridge: Cambridge University Press.
Google Scholar
Ortmanns, S., Firzlaff, T., & Ney, H. (1997). Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition. In Proc. Eurospeech’97 (pp. 139–142), Rhodes, Greece.
Paliwal, K. K., & So, S. (2004). Scalable distributed speech recognition using multi-frame GMM-based block quantization. In Proc. interspeech 2004 ICSLP.
Peláez-Moreno, C., Gallardo-Antolín, A., & Díaz-de-María, F. (2001). Recognizing voice over IP: A robust front-end for speech recognition on the world wide web. IEEE Transactions on Multimedia 3(2).
Pellom, B., & Hacioglu, K. (2001). Sonic: The university of Colorado continuous speech recognition system (Technical Report TR-CSLR-2001-01). University of Colorado.
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Raj, B., Migdal, J., & Singh, R. (2001). Distributed speech recognition with codec parameters. In Proc. ASRU’2001.
Rose, R. C., & Partharathy, S. (2002). A tutorial on ASR for wireless mobile devices. In ICSLP.
Rose, R., Arizmendi, I., & Parthasarathy, S. (2003). An efficient framework for robust mobile speech recognition services. In Proc. ICASSP (Vol. 1, pp. 316–319).
Schmitt, A., Hank, C., & Liscombe, J. (2008). Detecting problematic calls with automated agents. In 4th IEEE tutorial and research workshop perception and interactive technologies for speech-based systems, Irsee, Germany.
So, S., & Paliwal, K. K. (2004). Scalable distributed speech recognition using multi-frame gmm-based block quantization. In Proc. int. conf. spoken language processing, Jeju, Korea.
Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. In Proc. ICASSP (Vol. 5, pp. 113–116).
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., & Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition (Technical Report TR-2004-139). Sun Microsystems Laboratories.
Zaykovskiy, D., & Schmitt, A. (2007). Java to micro edition front-end for distributed speech recognition systems. In The 2007 IEEE international symposium on ubiquitous computing and intelligence (UCI’07), Niagara Falls, Canada.
Zaykovskiy, D., & Schmitt, A. (2008). Java vs. Symbian: A comparison of software-based DSR implementations on mobile phones. In 4th IET international conference on intelligent environments, Seattle, USA.
Zaykovskiy, D., Schmitt, A., & Lutz, M. (2007). New use of mobile phones: Towards multimodal information access systems. In 3rd IET international conference on intelligent environments, Ulm, Germany.

Download references

Author information

Authors and Affiliations

Institute of Information Technology, University of Ulm, Ulm/Donau, Germany
Alexander Schmitt, Dmitry Zaykovskiy & Wolfgang Minker

Authors

Alexander Schmitt
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Zaykovskiy
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Minker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Schmitt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmitt, A., Zaykovskiy, D. & Minker, W. Speech recognition for mobile devices. Int J Speech Technol 11, 63–72 (2008). https://doi.org/10.1007/s10772-009-9036-6

Download citation

Received: 18 February 2006
Accepted: 09 July 2009
Published: 29 July 2009
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10772-009-9036-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech recognition for mobile devices

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Speaker age and gender recognition using 1D and 2D convolutional neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech recognition for mobile devices

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Speaker age and gender recognition using 1D and 2D convolutional neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation