Skip to main content

Speech Recognition on Mobile Devices

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5960))

Abstract

The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within command and control, text entry and search are presented with an emphasis on mobile text entry.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tan, Z.-H., Lindberg, B. (eds.): Automatic Speech Recognition on Mobile Devices and Over Communication Networks. Springer, London (2008)

    MATH  Google Scholar 

  2. Bailey, A.: Challenges and Opportunities for Intearction on Mobile Devices. In: Proc. COLING 2004 Robust and adaptive information processing for mobile speech interfaces, Geneva, Switzerland, August 2004, pp. 9–14 (2004)

    Google Scholar 

  3. Tan, Z.-H., Novak, M.: Speech Recognition on Mobile Devices: Distributed and Embedded Solutions. In: Tutorial at Interspeech 2008, Brisbane, Australia (September 2008)

    Google Scholar 

  4. Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J.G., Beaugeant, C., Geissler, C., Hoge, H.: ASR in Mobile Phones - an Industrial Approach. IEEE Transactions on Speech and Audio Processing 10(8), 562–569 (2002)

    Article  Google Scholar 

  5. Novak, M.: Towards Large Vocabulary ASR on Embedded Platforms. In: Proc. ICSLP, Jeju Island, Korea (2004)

    Google Scholar 

  6. Kim, H.K., Cox, R.V.: A Bitstream-Based Front-End for Wireless Speech Recognition on IS-136 Communications System. IEEE Trans. Speech and Audio Processing 9(5), 558–568 (2001)

    Article  Google Scholar 

  7. Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F.: Recognizing Voice over IP Networks: a Robust Front-End for Speech Recognition on the world wide web. IEEE Transactions on Multimedia 3(2), 209–218 (2001)

    Article  Google Scholar 

  8. Pearce, D.: Robustness to Transmission Channel – the DSR Approach. In: Proc. COST278 & ISCA Research Workshop on Robustness Issues in Conversational Interaction, Norwich, UK (2004)

    Google Scholar 

  9. Tan, Z.-H., Dalsgaard, P., Lindberg, B.: Automatic Speech Recognition over Error-Prone Wireless Networks. Speech Communication 47(1-2), 220–242 (2005)

    Article  Google Scholar 

  10. Cohen, J.: Is Embedded Speech Recognition Disruptive Technology? Information Quarterly 3(5), 14–17 (2004)

    Google Scholar 

  11. http://www.vlingo.com/ (accesed July 4, 2009)

  12. Cohen, J.: Embedded Speech Recognition Applications in Mobile Phones: Status, Trends, and Challenges. In: Proceedings of ICASSP 2008, Las Vegas, USA (2008)

    Google Scholar 

  13. http://www.nuance.com/mobilesearch/ (accessed July 4, 2009)

  14. http://www.thefreelibrary.com/ (accessed July 4, 2009)

  15. Wachter, M.D., Matton, M., Demuynck, K., Wambacq, P., Cools, R., Compernolle, D.V.: Template-Based Continuous Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1377–1390 (2007)

    Article  Google Scholar 

  16. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  17. http://www.nuance.com/devicecontrol/ (accessed July 4, 2009)

  18. http://www.gartner.com/ (accessed July 4, 2009)

  19. Delaney, B.: Reduced Energy Consumption and Improved Accuracy for Distributed Speech Recognition in Wireless Environments. Ph.D. Thesis, Georgia Institute of Technology (2004)

    Google Scholar 

  20. Suhm, B., Myers, B., Waibel, A.: Multi-Modal Error Correction for Speech User Interfaces. ACM Transactions on Computer Human Interaction 8(1), 60–98 (2001)

    Article  Google Scholar 

  21. Zhou, B., Dechelotte, D., Gao, Y.: Two-Way Speech-to-Speech Translation on Handheld Devices. In: Proceedings of ICSLP 2004, Jeju Island, Korea (2004)

    Google Scholar 

  22. Hsu, B.-J., Mahajan, M., Acero, A.: Multimodal Text Entry on Mobile Devices. In: Automatic Speech Recognition and Understanding (ASRU), San Juan, Puerto Rico (2005)

    Google Scholar 

  23. Tan, Z.-H., Varga, I.: Networked, Distributed and Embedded Speech Recognition: An Overview. In: Tan, Z.-H., Lindberg, B. (eds.) Automatic Speech Recognition on Mobile Devices and Over Communication Networks, pp. 1–23. Springer, London (2008)

    Chapter  Google Scholar 

  24. Peinado, A., Segura, J.C.: Speech Recognition Over Digital Channels. Wiley, Chichester (2006)

    Book  Google Scholar 

  25. Bernard, A., Alwan, A.: Low-Bitrate Distributed Speech Recognition for Packet-Based and Wireless Communication. IEEE Trans. on Speech and Audio Processing 10(8), 570–579 (2002)

    Article  Google Scholar 

  26. Ion, V., Haeb-Umbach, R.: Uncertainty Decoding for Distributed Speech Recognition over Error-Prone Networks. Speech Communication 48, 1435–1446 (2006)

    Article  Google Scholar 

  27. James, A.B., Milner, B.P.: An Analysis of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss. In: Proc. ICASSP, Montreal, Canada (2004)

    Google Scholar 

  28. Tan, Z.-H., Dalsgaard, P., Lindberg, B.: Exploiting Temporal Correlation of Speech for Error-Robust and Bandwidth-Flexible Distributed Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing 15(4), 1391–1403 (2007)

    Article  Google Scholar 

  29. Milner, B., Shao, X.: Prediction of Fundamental Frequency and Voicing from Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction. IEEE Transactions on Audio, Speech and Language Processing 15(1), 24–33 (2007)

    Article  Google Scholar 

  30. ETSI Standard ES 201 108; Distributed Speech Recognition; Front-end Feature Extraction Algorithm; Compression Algorithm, v1.1.2 (2000)

    Google Scholar 

  31. ETSI Standard ES 202 050: Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm (2002)

    Google Scholar 

  32. ETSI Standard ES 202 211: Distributed Speech Recognition; Extended Front-End Feature Extraction Algorithm; Compression Algorithm, Back-End Speech Reconstruction Algorithm (2003)

    Google Scholar 

  33. ETSI Standard ES 202 212: Distributed Speech Recognition; Extended Advanced Front-End Feature Extraction Algorithm; Compression Algorithm, Back-End Speech Reconstruction Algorithm (2003)

    Google Scholar 

  34. 3GPP TS 26.243: ANSI C Code for the Fixed-Point Distributed Speech Recognition Extended Advanced Front-End (2004)

    Google Scholar 

  35. Zouari, L., Chollet, G.: Efficient Codebooks for Fast and Accurate Low Resource ASR Systems. Speech Communication 51, 732–743 (2009)

    Article  Google Scholar 

  36. Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: POCKETSPHINX: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices. In: Proc. ICASSP 2006, Toulouse, France (May 2006)

    Google Scholar 

  37. Etoh, M.: Cellular Phones as Information Hubs. In: Proc. Of ACM SIGIR Workshop on Mobile Information Retrieval, Singapore (2008)

    Google Scholar 

  38. Xu, H., Tan, Z.-H., Dalsgaard, P., Mattethat, R., Lindberg, B.: A Configurable Distributed Speech Recognition System. In: Abut, H., Hansen, J.H.L., Takeda, K. (eds.) Digital Signal Processing for In-Vehicle and Mobile Systems 2. Springer, New York (2006)

    Google Scholar 

  39. Zaykovskiy, D., Schmitt, A.: Deploying DSR Technology on Today’s Mobile Phones: A Feasibility Study. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 145–155. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  40. Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.P.: Design of the CMU Sphinx-4 Decoder. In: Proc. of Eurospeech (2003)

    Google Scholar 

  41. http://www.voicesearchconference.com/ (accessed July 4, 2009)

  42. James, C.L., Reischel, K.M.: Text Input for Mobile Devices: Comparing Model Prediction to Actual Performance. In: Proceedings of the SIGCHI conference on Human factors in computing systems (2001)

    Google Scholar 

  43. Kolsch, M., Turk, M.: Keyboards without Keyboards: A Survey of Virtual Keyboards. University of California at Santa Barbara Technical Report (2002)

    Google Scholar 

  44. Vertanen, K., Kristensson, P.O.: Parakeet: a continuous speech recognition system for mobile touch-screen devices. In: ACM IUI 2009, Sanibel Island, Florida, USA (2009)

    Google Scholar 

  45. MacKenzie, I.S., Soukoreff, R.W.: Text Entry for Mobile Computing: Models and Methods, Theory and Practice. Human Computer Interaction 17(2), 147–198 (2002)

    Article  Google Scholar 

  46. Silfverberg, M., MacKenzie, I.S., Korhonen, P.: Predicting Text Entry Speed on Mobile Phones. In: Proceedings of the CHI 2000 Conference on Human Factors in Computing Systems (2000)

    Google Scholar 

  47. Brown, C.M.: Human-computer interface design guidelines. Ablex Publishing, Norwood (1988)

    Google Scholar 

  48. Karat, C.M., Halverson, C., Horn, D., Karat, J.: Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In: CHI 1999 Conference Proceedings, pp. 568–575 (1999)

    Google Scholar 

  49. Besacier, L., Bergamini, C., Vaufreydaz, D., Castelli, E.: The Effect of Speech and Audio Compression on Speech Recognition Performance. In: IEEE Multimedia Signal Processing Workshop, Cannes, France (2001)

    Google Scholar 

  50. Hirsch, H.G., Pearce, D.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR 2000, Paris, France (2000)

    Google Scholar 

  51. Kiss, I.: A Comparison of Distributed and Network Speech Recognition for Mobile Communication Systems. In: Proc. ICSLP, Beijing, China (2000)

    Google Scholar 

  52. Ion, V., Haeb-Umbach, R.: A Novel Uncertainty Decoding Rule with Applications to Transaction Error Robust Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(5), 1047–1060 (2008)

    Article  Google Scholar 

  53. Wan, C.-Y., Lee, L.-S.: Histogram-Based Quantization for Robust and/or Distributed Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 16(4), 859–873 (2008)

    Article  Google Scholar 

  54. Tan, Z.-H., Lindberg, B.: A Posteriori SNR Weighted Energy Based Variable Frame Rate Analysis for Speech Recognition. In: Proc. Interspeech, Brisbane, Australia (2008)

    Google Scholar 

  55. Peinado, A., Sanchez, V., Perez-Cordoba, J., de la Torre, A.: HMM-based channel error mitigation and its application to distributed speech recognition. Speech Communication 41, 549–561 (2003)

    Article  Google Scholar 

  56. Chung, H., Chung, I.: Memory Efficient and Fast Speech Recognition System for Low Resource Mobile Devices. IEEE Transactions on Consumer Electronics 52(3), 792–796 (2006)

    Article  Google Scholar 

  57. Giammarini, M., Orcioni, S., Conti, M.: Computational Complexity Estimate of a DSR Front-End compliant to ETSI Standard ES 202 212. In: WISES 2009, Seventh Workshop on Intelligent Solutions in Embedded Systems, Ancona, Italy (2009)

    Google Scholar 

  58. Bacchiani, M., Beaufays, F., Schalkwyk, J., Schuster, M., Strope, B.: Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing. In: Proceedings of ICASSP 2008, Las Vegas, USA (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tan, ZH., Lindberg, B. (2010). Speech Recognition on Mobile Devices. In: Jiang, X., Ma, M.Y., Chen, C.W. (eds) Mobile Multimedia Processing. WMMP 2008. Lecture Notes in Computer Science, vol 5960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12349-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12349-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12348-1

  • Online ISBN: 978-3-642-12349-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics