Abstract
The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within command and control, text entry and search are presented with an emphasis on mobile text entry.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Tan, Z.-H., Lindberg, B. (eds.): Automatic Speech Recognition on Mobile Devices and Over Communication Networks. Springer, London (2008)
Bailey, A.: Challenges and Opportunities for Intearction on Mobile Devices. In: Proc. COLING 2004 Robust and adaptive information processing for mobile speech interfaces, Geneva, Switzerland, August 2004, pp. 9–14 (2004)
Tan, Z.-H., Novak, M.: Speech Recognition on Mobile Devices: Distributed and Embedded Solutions. In: Tutorial at Interspeech 2008, Brisbane, Australia (September 2008)
Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J.G., Beaugeant, C., Geissler, C., Hoge, H.: ASR in Mobile Phones - an Industrial Approach. IEEE Transactions on Speech and Audio Processing 10(8), 562–569 (2002)
Novak, M.: Towards Large Vocabulary ASR on Embedded Platforms. In: Proc. ICSLP, Jeju Island, Korea (2004)
Kim, H.K., Cox, R.V.: A Bitstream-Based Front-End for Wireless Speech Recognition on IS-136 Communications System. IEEE Trans. Speech and Audio Processing 9(5), 558–568 (2001)
Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F.: Recognizing Voice over IP Networks: a Robust Front-End for Speech Recognition on the world wide web. IEEE Transactions on Multimedia 3(2), 209–218 (2001)
Pearce, D.: Robustness to Transmission Channel – the DSR Approach. In: Proc. COST278 & ISCA Research Workshop on Robustness Issues in Conversational Interaction, Norwich, UK (2004)
Tan, Z.-H., Dalsgaard, P., Lindberg, B.: Automatic Speech Recognition over Error-Prone Wireless Networks. Speech Communication 47(1-2), 220–242 (2005)
Cohen, J.: Is Embedded Speech Recognition Disruptive Technology? Information Quarterly 3(5), 14–17 (2004)
http://www.vlingo.com/ (accesed July 4, 2009)
Cohen, J.: Embedded Speech Recognition Applications in Mobile Phones: Status, Trends, and Challenges. In: Proceedings of ICASSP 2008, Las Vegas, USA (2008)
http://www.nuance.com/mobilesearch/ (accessed July 4, 2009)
http://www.thefreelibrary.com/ (accessed July 4, 2009)
Wachter, M.D., Matton, M., Demuynck, K., Wambacq, P., Cools, R., Compernolle, D.V.: Template-Based Continuous Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1377–1390 (2007)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
http://www.nuance.com/devicecontrol/ (accessed July 4, 2009)
http://www.gartner.com/ (accessed July 4, 2009)
Delaney, B.: Reduced Energy Consumption and Improved Accuracy for Distributed Speech Recognition in Wireless Environments. Ph.D. Thesis, Georgia Institute of Technology (2004)
Suhm, B., Myers, B., Waibel, A.: Multi-Modal Error Correction for Speech User Interfaces. ACM Transactions on Computer Human Interaction 8(1), 60–98 (2001)
Zhou, B., Dechelotte, D., Gao, Y.: Two-Way Speech-to-Speech Translation on Handheld Devices. In: Proceedings of ICSLP 2004, Jeju Island, Korea (2004)
Hsu, B.-J., Mahajan, M., Acero, A.: Multimodal Text Entry on Mobile Devices. In: Automatic Speech Recognition and Understanding (ASRU), San Juan, Puerto Rico (2005)
Tan, Z.-H., Varga, I.: Networked, Distributed and Embedded Speech Recognition: An Overview. In: Tan, Z.-H., Lindberg, B. (eds.) Automatic Speech Recognition on Mobile Devices and Over Communication Networks, pp. 1–23. Springer, London (2008)
Peinado, A., Segura, J.C.: Speech Recognition Over Digital Channels. Wiley, Chichester (2006)
Bernard, A., Alwan, A.: Low-Bitrate Distributed Speech Recognition for Packet-Based and Wireless Communication. IEEE Trans. on Speech and Audio Processing 10(8), 570–579 (2002)
Ion, V., Haeb-Umbach, R.: Uncertainty Decoding for Distributed Speech Recognition over Error-Prone Networks. Speech Communication 48, 1435–1446 (2006)
James, A.B., Milner, B.P.: An Analysis of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss. In: Proc. ICASSP, Montreal, Canada (2004)
Tan, Z.-H., Dalsgaard, P., Lindberg, B.: Exploiting Temporal Correlation of Speech for Error-Robust and Bandwidth-Flexible Distributed Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1391–1403 (2007)
Milner, B., Shao, X.: Prediction of Fundamental Frequency and Voicing from Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction. IEEE Transactions on Audio, Speech and Language Processing 15(1), 24–33 (2007)
ETSI Standard ES 201 108; Distributed Speech Recognition; Front-end Feature Extraction Algorithm; Compression Algorithm, v1.1.2 (2000)
ETSI Standard ES 202 050: Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm (2002)
ETSI Standard ES 202 211: Distributed Speech Recognition; Extended Front-End Feature Extraction Algorithm; Compression Algorithm, Back-End Speech Reconstruction Algorithm (2003)
ETSI Standard ES 202 212: Distributed Speech Recognition; Extended Advanced Front-End Feature Extraction Algorithm; Compression Algorithm, Back-End Speech Reconstruction Algorithm (2003)
3GPP TS 26.243: ANSI C Code for the Fixed-Point Distributed Speech Recognition Extended Advanced Front-End (2004)
Zouari, L., Chollet, G.: Efficient Codebooks for Fast and Accurate Low Resource ASR Systems. Speech Communication 51, 732–743 (2009)
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: POCKETSPHINX: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices. In: Proc. ICASSP 2006, Toulouse, France (May 2006)
Etoh, M.: Cellular Phones as Information Hubs. In: Proc. Of ACM SIGIR Workshop on Mobile Information Retrieval, Singapore (2008)
Xu, H., Tan, Z.-H., Dalsgaard, P., Mattethat, R., Lindberg, B.: A Configurable Distributed Speech Recognition System. In: Abut, H., Hansen, J.H.L., Takeda, K. (eds.) Digital Signal Processing for In-Vehicle and Mobile Systems 2. Springer, New York (2006)
Zaykovskiy, D., Schmitt, A.: Deploying DSR Technology on Today’s Mobile Phones: A Feasibility Study. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 145–155. Springer, Heidelberg (2008)
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.P.: Design of the CMU Sphinx-4 Decoder. In: Proc. of Eurospeech (2003)
http://www.voicesearchconference.com/ (accessed July 4, 2009)
James, C.L., Reischel, K.M.: Text Input for Mobile Devices: Comparing Model Prediction to Actual Performance. In: Proceedings of the SIGCHI conference on Human factors in computing systems (2001)
Kolsch, M., Turk, M.: Keyboards without Keyboards: A Survey of Virtual Keyboards. University of California at Santa Barbara Technical Report (2002)
Vertanen, K., Kristensson, P.O.: Parakeet: a continuous speech recognition system for mobile touch-screen devices. In: ACM IUI 2009, Sanibel Island, Florida, USA (2009)
MacKenzie, I.S., Soukoreff, R.W.: Text Entry for Mobile Computing: Models and Methods, Theory and Practice. Human Computer Interaction 17(2), 147–198 (2002)
Silfverberg, M., MacKenzie, I.S., Korhonen, P.: Predicting Text Entry Speed on Mobile Phones. In: Proceedings of the CHI 2000 Conference on Human Factors in Computing Systems (2000)
Brown, C.M.: Human-computer interface design guidelines. Ablex Publishing, Norwood (1988)
Karat, C.M., Halverson, C., Horn, D., Karat, J.: Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In: CHI 1999 Conference Proceedings, pp. 568–575 (1999)
Besacier, L., Bergamini, C., Vaufreydaz, D., Castelli, E.: The Effect of Speech and Audio Compression on Speech Recognition Performance. In: IEEE Multimedia Signal Processing Workshop, Cannes, France (2001)
Hirsch, H.G., Pearce, D.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, Paris, France (2000)
Kiss, I.: A Comparison of Distributed and Network Speech Recognition for Mobile Communication Systems. In: Proc. ICSLP, Beijing, China (2000)
Ion, V., Haeb-Umbach, R.: A Novel Uncertainty Decoding Rule with Applications to Transaction Error Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(5), 1047–1060 (2008)
Wan, C.-Y., Lee, L.-S.: Histogram-Based Quantization for Robust and/or Distributed Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(4), 859–873 (2008)
Tan, Z.-H., Lindberg, B.: A Posteriori SNR Weighted Energy Based Variable Frame Rate Analysis for Speech Recognition. In: Proc. Interspeech, Brisbane, Australia (2008)
Peinado, A., Sanchez, V., Perez-Cordoba, J., de la Torre, A.: HMM-based channel error mitigation and its application to distributed speech recognition. Speech Communication 41, 549–561 (2003)
Chung, H., Chung, I.: Memory Efficient and Fast Speech Recognition System for Low Resource Mobile Devices. IEEE Transactions on Consumer Electronics 52(3), 792–796 (2006)
Giammarini, M., Orcioni, S., Conti, M.: Computational Complexity Estimate of a DSR Front-End compliant to ETSI Standard ES 202 212. In: WISES 2009, Seventh Workshop on Intelligent Solutions in Embedded Systems, Ancona, Italy (2009)
Bacchiani, M., Beaufays, F., Schalkwyk, J., Schuster, M., Strope, B.: Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing. In: Proceedings of ICASSP 2008, Las Vegas, USA (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Tan, ZH., Lindberg, B. (2010). Speech Recognition on Mobile Devices. In: Jiang, X., Ma, M.Y., Chen, C.W. (eds) Mobile Multimedia Processing. WMMP 2008. Lecture Notes in Computer Science, vol 5960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12349-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-12349-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12348-1
Online ISBN: 978-3-642-12349-8
eBook Packages: Computer ScienceComputer Science (R0)