Speech Recognition on Mobile Devices

Tan, Zheng-Hua; Lindberg, Børge

doi:10.1007/978-3-642-12349-8_13

Speech Recognition on Mobile Devices

Zheng-Hua Tan¹⁹ &
Børge Lindberg¹⁹

Chapter

1709 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5960))

Abstract

The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within command and control, text entry and search are presented with an emphasis on mobile text entry.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tan, Z.-H., Lindberg, B. (eds.): Automatic Speech Recognition on Mobile Devices and Over Communication Networks. Springer, London (2008)
MATH Google Scholar
Bailey, A.: Challenges and Opportunities for Intearction on Mobile Devices. In: Proc. COLING 2004 Robust and adaptive information processing for mobile speech interfaces, Geneva, Switzerland, August 2004, pp. 9–14 (2004)
Google Scholar
Tan, Z.-H., Novak, M.: Speech Recognition on Mobile Devices: Distributed and Embedded Solutions. In: Tutorial at Interspeech 2008, Brisbane, Australia (September 2008)
Google Scholar
Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J.G., Beaugeant, C., Geissler, C., Hoge, H.: ASR in Mobile Phones - an Industrial Approach. IEEE Transactions on Speech and Audio Processing 10(8), 562–569 (2002)
Article Google Scholar
Novak, M.: Towards Large Vocabulary ASR on Embedded Platforms. In: Proc. ICSLP, Jeju Island, Korea (2004)
Google Scholar
Kim, H.K., Cox, R.V.: A Bitstream-Based Front-End for Wireless Speech Recognition on IS-136 Communications System. IEEE Trans. Speech and Audio Processing 9(5), 558–568 (2001)
Article Google Scholar
Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F.: Recognizing Voice over IP Networks: a Robust Front-End for Speech Recognition on the world wide web. IEEE Transactions on Multimedia 3(2), 209–218 (2001)
Article Google Scholar
Pearce, D.: Robustness to Transmission Channel – the DSR Approach. In: Proc. COST278 & ISCA Research Workshop on Robustness Issues in Conversational Interaction, Norwich, UK (2004)
Google Scholar
Tan, Z.-H., Dalsgaard, P., Lindberg, B.: Automatic Speech Recognition over Error-Prone Wireless Networks. Speech Communication 47(1-2), 220–242 (2005)
Article Google Scholar
Cohen, J.: Is Embedded Speech Recognition Disruptive Technology? Information Quarterly 3(5), 14–17 (2004)
Google Scholar
http://www.vlingo.com/ (accesed July 4, 2009)
Cohen, J.: Embedded Speech Recognition Applications in Mobile Phones: Status, Trends, and Challenges. In: Proceedings of ICASSP 2008, Las Vegas, USA (2008)
Google Scholar
http://www.nuance.com/mobilesearch/ (accessed July 4, 2009)
http://www.thefreelibrary.com/ (accessed July 4, 2009)
Wachter, M.D., Matton, M., Demuynck, K., Wambacq, P., Cools, R., Compernolle, D.V.: Template-Based Continuous Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1377–1390 (2007)
Article Google Scholar
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
http://www.nuance.com/devicecontrol/ (accessed July 4, 2009)
http://www.gartner.com/ (accessed July 4, 2009)
Delaney, B.: Reduced Energy Consumption and Improved Accuracy for Distributed Speech Recognition in Wireless Environments. Ph.D. Thesis, Georgia Institute of Technology (2004)
Google Scholar
Suhm, B., Myers, B., Waibel, A.: Multi-Modal Error Correction for Speech User Interfaces. ACM Transactions on Computer Human Interaction 8(1), 60–98 (2001)
Article Google Scholar
Zhou, B., Dechelotte, D., Gao, Y.: Two-Way Speech-to-Speech Translation on Handheld Devices. In: Proceedings of ICSLP 2004, Jeju Island, Korea (2004)
Google Scholar
Hsu, B.-J., Mahajan, M., Acero, A.: Multimodal Text Entry on Mobile Devices. In: Automatic Speech Recognition and Understanding (ASRU), San Juan, Puerto Rico (2005)
Google Scholar
Tan, Z.-H., Varga, I.: Networked, Distributed and Embedded Speech Recognition: An Overview. In: Tan, Z.-H., Lindberg, B. (eds.) Automatic Speech Recognition on Mobile Devices and Over Communication Networks, pp. 1–23. Springer, London (2008)
Chapter Google Scholar
Peinado, A., Segura, J.C.: Speech Recognition Over Digital Channels. Wiley, Chichester (2006)
Book Google Scholar
Bernard, A., Alwan, A.: Low-Bitrate Distributed Speech Recognition for Packet-Based and Wireless Communication. IEEE Trans. on Speech and Audio Processing 10(8), 570–579 (2002)
Article Google Scholar
Ion, V., Haeb-Umbach, R.: Uncertainty Decoding for Distributed Speech Recognition over Error-Prone Networks. Speech Communication 48, 1435–1446 (2006)
Article Google Scholar
James, A.B., Milner, B.P.: An Analysis of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss. In: Proc. ICASSP, Montreal, Canada (2004)
Google Scholar
Tan, Z.-H., Dalsgaard, P., Lindberg, B.: Exploiting Temporal Correlation of Speech for Error-Robust and Bandwidth-Flexible Distributed Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1391–1403 (2007)
Article Google Scholar
Milner, B., Shao, X.: Prediction of Fundamental Frequency and Voicing from Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction. IEEE Transactions on Audio, Speech and Language Processing 15(1), 24–33 (2007)
Article Google Scholar
ETSI Standard ES 201 108; Distributed Speech Recognition; Front-end Feature Extraction Algorithm; Compression Algorithm, v1.1.2 (2000)
Google Scholar
ETSI Standard ES 202 050: Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm (2002)
Google Scholar
ETSI Standard ES 202 211: Distributed Speech Recognition; Extended Front-End Feature Extraction Algorithm; Compression Algorithm, Back-End Speech Reconstruction Algorithm (2003)
Google Scholar
ETSI Standard ES 202 212: Distributed Speech Recognition; Extended Advanced Front-End Feature Extraction Algorithm; Compression Algorithm, Back-End Speech Reconstruction Algorithm (2003)
Google Scholar
3GPP TS 26.243: ANSI C Code for the Fixed-Point Distributed Speech Recognition Extended Advanced Front-End (2004)
Google Scholar
Zouari, L., Chollet, G.: Efficient Codebooks for Fast and Accurate Low Resource ASR Systems. Speech Communication 51, 732–743 (2009)
Article Google Scholar
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: POCKETSPHINX: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices. In: Proc. ICASSP 2006, Toulouse, France (May 2006)
Google Scholar
Etoh, M.: Cellular Phones as Information Hubs. In: Proc. Of ACM SIGIR Workshop on Mobile Information Retrieval, Singapore (2008)
Google Scholar
Xu, H., Tan, Z.-H., Dalsgaard, P., Mattethat, R., Lindberg, B.: A Configurable Distributed Speech Recognition System. In: Abut, H., Hansen, J.H.L., Takeda, K. (eds.) Digital Signal Processing for In-Vehicle and Mobile Systems 2. Springer, New York (2006)
Google Scholar
Zaykovskiy, D., Schmitt, A.: Deploying DSR Technology on Today’s Mobile Phones: A Feasibility Study. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 145–155. Springer, Heidelberg (2008)
Chapter Google Scholar
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.P.: Design of the CMU Sphinx-4 Decoder. In: Proc. of Eurospeech (2003)
Google Scholar
http://www.voicesearchconference.com/ (accessed July 4, 2009)
James, C.L., Reischel, K.M.: Text Input for Mobile Devices: Comparing Model Prediction to Actual Performance. In: Proceedings of the SIGCHI conference on Human factors in computing systems (2001)
Google Scholar
Kolsch, M., Turk, M.: Keyboards without Keyboards: A Survey of Virtual Keyboards. University of California at Santa Barbara Technical Report (2002)
Google Scholar
Vertanen, K., Kristensson, P.O.: Parakeet: a continuous speech recognition system for mobile touch-screen devices. In: ACM IUI 2009, Sanibel Island, Florida, USA (2009)
Google Scholar
MacKenzie, I.S., Soukoreff, R.W.: Text Entry for Mobile Computing: Models and Methods, Theory and Practice. Human Computer Interaction 17(2), 147–198 (2002)
Article Google Scholar
Silfverberg, M., MacKenzie, I.S., Korhonen, P.: Predicting Text Entry Speed on Mobile Phones. In: Proceedings of the CHI 2000 Conference on Human Factors in Computing Systems (2000)
Google Scholar
Brown, C.M.: Human-computer interface design guidelines. Ablex Publishing, Norwood (1988)
Google Scholar
Karat, C.M., Halverson, C., Horn, D., Karat, J.: Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In: CHI 1999 Conference Proceedings, pp. 568–575 (1999)
Google Scholar
Besacier, L., Bergamini, C., Vaufreydaz, D., Castelli, E.: The Effect of Speech and Audio Compression on Speech Recognition Performance. In: IEEE Multimedia Signal Processing Workshop, Cannes, France (2001)
Google Scholar
Hirsch, H.G., Pearce, D.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, Paris, France (2000)
Google Scholar
Kiss, I.: A Comparison of Distributed and Network Speech Recognition for Mobile Communication Systems. In: Proc. ICSLP, Beijing, China (2000)
Google Scholar
Ion, V., Haeb-Umbach, R.: A Novel Uncertainty Decoding Rule with Applications to Transaction Error Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(5), 1047–1060 (2008)
Article Google Scholar
Wan, C.-Y., Lee, L.-S.: Histogram-Based Quantization for Robust and/or Distributed Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(4), 859–873 (2008)
Article Google Scholar
Tan, Z.-H., Lindberg, B.: A Posteriori SNR Weighted Energy Based Variable Frame Rate Analysis for Speech Recognition. In: Proc. Interspeech, Brisbane, Australia (2008)
Google Scholar
Peinado, A., Sanchez, V., Perez-Cordoba, J., de la Torre, A.: HMM-based channel error mitigation and its application to distributed speech recognition. Speech Communication 41, 549–561 (2003)
Article Google Scholar
Chung, H., Chung, I.: Memory Efficient and Fast Speech Recognition System for Low Resource Mobile Devices. IEEE Transactions on Consumer Electronics 52(3), 792–796 (2006)
Article Google Scholar
Giammarini, M., Orcioni, S., Conti, M.: Computational Complexity Estimate of a DSR Front-End compliant to ETSI Standard ES 202 212. In: WISES 2009, Seventh Workshop on Intelligent Solutions in Embedded Systems, Ancona, Italy (2009)
Google Scholar
Bacchiani, M., Beaufays, F., Schalkwyk, J., Schuster, M., Strope, B.: Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing. In: Proceedings of ICASSP 2008, Las Vegas, USA (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Information and Signal Processing (MISP), Department of Electronic Systems, Aalborg University, Aalborg, Denmark
Zheng-Hua Tan & Børge Lindberg

Authors

Zheng-Hua Tan
View author publications
You can also search for this author in PubMed Google Scholar
Børge Lindberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Münster, Einsteinstrasse 62, 48149, Münster, Germany
Xiaoyi Jiang
Scientific Works, 6 Tiffany Court, 08550, Princeton Junction, NJ, USA
Matthew Y. Ma
Department of Computer Science and Engineering, State University of New York at Buffalo, 201 Bell Hall, 14260-2000, Buffalo, NY, USA
Chang Wen Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tan, ZH., Lindberg, B. (2010). Speech Recognition on Mobile Devices. In: Jiang, X., Ma, M.Y., Chen, C.W. (eds) Mobile Multimedia Processing. WMMP 2008. Lecture Notes in Computer Science, vol 5960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12349-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-12349-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12348-1
Online ISBN: 978-3-642-12349-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics