Skip to main content
Log in

Vouch: multimodal touch-and-voice input for smart watches under difficult operating conditions

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

We consider a multimodal method for smart-watch text entry, called “Vouch,” which combines touch and voice input. Touch input is familiar and has good ergonomic accessibility, but is limited by the fat-finger problem (or equivalently, the screen size) and is sensitive to user motion. Voice input is mostly immune to slow user motion, but its reliability may suffer from environmental noise. Together, however, such characteristics can complement each other when coping with the difficult smart-watch operating conditions. With Vouch, the user makes an approximate touch among the densely distributed alphabetic keys; the accompanying voice input can be used to effectively disambiguate the target from among possible candidates, if not identify the target outright. We present a prototype implementation of the proposed multimodal input method and compare its performance and usability to the conventional unimodal method. We focus particularly on the potential improvement under difficult operating conditions, such as when the user is in motion. The comparative experiment validates our hypothesis that the Vouch multimodal approach would show more reliable recognition performance and higher usability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Albinsson PA, Zhai S (2003) High precision touch screen interaction. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’03). ACM, pp 105–112

  2. Baudisch P, Chu G (2009) Back-of-device interaction allows creating very small touch devices. In: Proceedings of the SIGCHI conference on human factors in computing systems(CHI ’09). ACM, pp 1923–1932

  3. Bolt RA (1980) “Put-that-there”: voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on computer graphics and interactive techniques (SIGGRAPH ’80). ACM, pp 262–270

  4. Brewster S (2002) Overcoming the lack of screen space on mobile computers. Pers Ubiquitous Comput 6(3):188–205

    Article  Google Scholar 

  5. Bubalo N, Honold F, Schüssel F, Weber M, Huckauf A (2016) User expertise in multimodal HCI. In: Proceedings of the European conference on cognitive ergonomics. ACM, p 10

  6. Cha JM, Choi E, Lim J (2015) Virtual sliding QWERTY: a new text entry method for smartwatches using Tap-N-Drag. Appl Ergon 51:263–272

    Article  Google Scholar 

  7. Chen XA, Grossman T, Fitzmaurice G (2014) Swipeboard: a text entry technique for ultra-small interfaces that supports novice to expert transitions. In: Proceedings of the 27th annual ACM symposium on user interface software and technology (UIST ’14). ACM, pp 615–620

  8. Cho H, Kim M, Seo K (2014) A text entry technique for wrist-worn watches with tiny touchscreens. In: Proceedings of the adjunct publication of the 27th Annual ACM symposium on user interface software and technology (UIST ’14). ACM, pp 79–80

  9. Colle HA, Hiszem KJ (2004) Standing at a kiosk: effects of key size and spacing on touch screen numeric keypad performance and user preference. Ergonomics 47(13):1406–1423

    Article  Google Scholar 

  10. Dahl D (ed) (2017) Multimodal interaction with W3C standards: toward natural user interfaces to everything. Springer, Berlin

  11. Darbara R, Senb PK, Dasha P, Samantaa D Using hall effect sensors for 3D space text entry on smartwatches. In: Proceedings of the 7th international conference on intelligent human computer interaction (IHCI ’15)

  12. Dunlop MD, Komninos A, Durga N (2014) Towards high quality text entry on smartwatches. In: Extended abstracts on human factors in computing systems (CHI ’14). ACM, pp 2365–2370

  13. Dusan S, Gadbois GJ, Flanagan JL (2003) Multimodal interaction on PDA’s integrating speech and pen inputs. In: INTERSPEECH. pp 2225–2228

  14. Funk M, Sahami A, Henze N, Schmidt A (2014) Using a touch-sensitive wristband for text entry on smart watches. In: Extended abstracts on human factors in computing systems (CHI ’14). ACM, pp 2305–2310

  15. Haas EC, Pillalamarri KS, Stachowiak CC, McCullough G (2011) Temporal binding of multimodal controls for dynamic map displays: a systems approach. In: Proceedings of the 13th international conference on multimodal interfaces. ACM, pp 409-416

  16. Haseloff S (2001) Designing adaptive mobile applications. In: Distributed processing proceedings ninth euromicro workshop on. IEEE, pp 131–138

  17. Hincapié-Ramos JD, Irani P (2013) CrashAlert: enhancing peripheral alertness for eyes-busy mobile interaction while walking. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 3385–3388

  18. Holzinger A (2002) Finger instead of mouse: touch screens as a means of enhancing universal access. In: Universal access: theoretical perspectives, practice, and experience. Springer Berlin, pp 387–397

  19. Hong J, Heo S, Isokoski P, Lee G (2015) SplitBoard: a simple split soft keyboard for wristwatch-sized touch screens. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’15). ACM, pp 1233–1236

  20. Hwang S, Lee G (2005) Qwerty-like \(3 \times 4\) keypad layouts for mobile phone. In: Extended abstracts on human factors in computing systems (CHI ’05). ACM, pp 1479–1482

  21. Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Feiner S (2003) Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Proceedings of the 5th international conference on Multimodal interfaces. ACM, pp 12-19

  22. Kane SK, Wobbrock JO, Smith IE (2008) Getting off the treadmill: evaluating walking user interfaces for mobile devices in public spaces. In: Proceedings of the 10th international conference on human computer interaction with mobile devices and services. ACM, pp 109–118

  23. Kjeldskov J, Stage J (2004) New techniques for usability evaluation of mobile systems. J Human Comput Stud 60(5):599–620

    Article  Google Scholar 

  24. Kurihara K, Goto M, Ogata J, Igarashi T (2006) Speech pen: predictive handwriting based on ambient multimodal recognition. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’06). ACM, pp 851–860

  25. Kwon S, Choi E, Chung MK (2011) Effect of control-to-display gain and movement direction of information spaces on the usability of navigation on small touch-screen interfaces using tap-n-drag. J Ind Ergon 41(3):322–330

    Article  Google Scholar 

  26. Lee M, Billinghurst M (2008) A Wizard of Oz study for an AR multimodal interface. In: Proceedings of the 10th international conference on Multimodal interfaces. ACM, pp 249–256

  27. Leiva LA, Sahami A, Catalá A, Henze N, Schmidt A (2015) Text Entry on Tiny QWERTY Soft Keyboards. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’15). ACM, pp 669–678

  28. Marshall J, Dancu A, Mueller FF (2016) Interaction in motion: designing truly mobile interaction. In: Proceedings of the 2016 ACM conference on designing interactive systems. ACM, pp 215–228

  29. Mizobuchi S, Chignell M, Newton D (2005) Mobile text entry: relationship between walking speed and text input task difficulty. In: Proceedings of the 7th international conference on human computer interaction with mobile devices and services. ACM, pp 122–128

  30. Nebeling M, Guo A, Murray K, Tostengard A, Giannopoulos A, Mihajlov M, Bigham JP (2015) WearWrite: orchestrating the crowd to complete complex tasks from wearables. In: Proceedings of the 28th annual ACM symposium on user interface software and technology (UIST ’15). ACM, pp 39–40

  31. Oney S, Harrison C, Ogan A, Wiese J (2013) ZoomBoard: a diminutive qwerty soft keyboard using iterative zooming for ultra-small devices. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’13). ACM, pp 2799–2802

  32. Oviatt S (1999) Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’99). ACM, pp 576–583

  33. Paliwal K, Basu A (1987) A speech enhancement method based on Kalman filtering. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP). IEEE, pp 177–180

  34. Price KJ, Lin M, Feng J, Goldman R, Sears A, Jacko JA (2006) Motion does matter: an examination of speech-based text entry on the move. Univ Access Inf Soc 4(3):246–257

    Article  Google Scholar 

  35. Roudaut A, Huot S, Lecolinet E (2008) TapTap and MagStick: improving one-handed target acquisition on small touch-screens. In: Proceedings of the working conference on advanced visual interfaces. ACM, pp 146–153

  36. Schüssel F, Honold F, Schmidt M, Bubalo N, Huckauf A, Weber M (2014) Multimodal interaction history and its use in error detection and recovery. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 164–171

  37. Shinoda K, Watanabe Y, Iwata K, Liang Y, Nakagawa R, Furui S (2011) Semi-synchronous speech and pen input for mobile user interfaces. Speech Commun 53(3):283–291

  38. Siek KA, Rogers Y, Connelly KH (2005) Fat finger worries: howolder and younger users physically interact with PDAs. In: Human–computer interaction (INTERACT 2005). Springer Berlin, pp 267–280

  39. Tsourakis N (2014) Using hand gestures to control mobile spoken dialogue systems. Univ Access Inf Soc 13(3):257–275

    Article  Google Scholar 

  40. Turunen M, Hurtig T, Hakulinen J, Virtanen A, Koskinen S (2006) Mobile speech-based and multimodal public transport information services. In: Proceedings of MobileHCI 2006 workshop on speech in mobile and pervasive environments

  41. Turunen M, Kallinen A, Sànchez I, Riekki J, Hella J, Olsson T, Valkama P (2009) Multimodal interaction with speech and physical touch interface in a media center application. In: Proceedings of the international conference on advances in computer entertainment technology. ACM, pp 19–26

  42. Yatani K, Truong KN (2007) An evaluation of stylus-based text entry methods on handheld devices in stationary and mobile settings. In: Proceedings of the 9th International conference on human computer interaction with mobile devices and services. ACM, pp 487–494

  43. CMUSphinx. Open source toolkit for speech recognition. Carnegie Mellon University. http://cmusphinx.sourceforge.net/. Accessed 16 Jan 2016

  44. G Watch. LG. http://www.lg.com/us/smart-watches/lg-W100-g-watch/. Accessed 21 Dec 2015

  45. Google Now. The right information at just the right time. Google. https://www.google.co.uk/landing/now/. Accessed 21 Jan 2016

  46. Siri. Your wish is its command, Apple. https://www.apple.com/uk/ios/siri/. Accessed 21 Jan 2016

Download references

Acknowledgements

This research was supported in part by the Basic Science Research Program funded through the National Research Foundation of Korea (NRF) and Ministry of Science, ICT & Future Planning (No. 2011-0030079), and also the Forensic Research Program of the National Forensic Service (NFS) and Ministry of Government Administration and Home Affairs (NFS-2017-DIGITAL-06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerard Jounghyun Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Lee, C. & Kim, G.J. Vouch: multimodal touch-and-voice input for smart watches under difficult operating conditions. J Multimodal User Interfaces 11, 289–299 (2017). https://doi.org/10.1007/s12193-017-0246-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-017-0246-y

Keywords

Navigation