Abstract
We consider a multimodal method for smart-watch text entry, called “Vouch,” which combines touch and voice input. Touch input is familiar and has good ergonomic accessibility, but is limited by the fat-finger problem (or equivalently, the screen size) and is sensitive to user motion. Voice input is mostly immune to slow user motion, but its reliability may suffer from environmental noise. Together, however, such characteristics can complement each other when coping with the difficult smart-watch operating conditions. With Vouch, the user makes an approximate touch among the densely distributed alphabetic keys; the accompanying voice input can be used to effectively disambiguate the target from among possible candidates, if not identify the target outright. We present a prototype implementation of the proposed multimodal input method and compare its performance and usability to the conventional unimodal method. We focus particularly on the potential improvement under difficult operating conditions, such as when the user is in motion. The comparative experiment validates our hypothesis that the Vouch multimodal approach would show more reliable recognition performance and higher usability.









Similar content being viewed by others
References
Albinsson PA, Zhai S (2003) High precision touch screen interaction. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’03). ACM, pp 105–112
Baudisch P, Chu G (2009) Back-of-device interaction allows creating very small touch devices. In: Proceedings of the SIGCHI conference on human factors in computing systems(CHI ’09). ACM, pp 1923–1932
Bolt RA (1980) “Put-that-there”: voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on computer graphics and interactive techniques (SIGGRAPH ’80). ACM, pp 262–270
Brewster S (2002) Overcoming the lack of screen space on mobile computers. Pers Ubiquitous Comput 6(3):188–205
Bubalo N, Honold F, Schüssel F, Weber M, Huckauf A (2016) User expertise in multimodal HCI. In: Proceedings of the European conference on cognitive ergonomics. ACM, p 10
Cha JM, Choi E, Lim J (2015) Virtual sliding QWERTY: a new text entry method for smartwatches using Tap-N-Drag. Appl Ergon 51:263–272
Chen XA, Grossman T, Fitzmaurice G (2014) Swipeboard: a text entry technique for ultra-small interfaces that supports novice to expert transitions. In: Proceedings of the 27th annual ACM symposium on user interface software and technology (UIST ’14). ACM, pp 615–620
Cho H, Kim M, Seo K (2014) A text entry technique for wrist-worn watches with tiny touchscreens. In: Proceedings of the adjunct publication of the 27th Annual ACM symposium on user interface software and technology (UIST ’14). ACM, pp 79–80
Colle HA, Hiszem KJ (2004) Standing at a kiosk: effects of key size and spacing on touch screen numeric keypad performance and user preference. Ergonomics 47(13):1406–1423
Dahl D (ed) (2017) Multimodal interaction with W3C standards: toward natural user interfaces to everything. Springer, Berlin
Darbara R, Senb PK, Dasha P, Samantaa D Using hall effect sensors for 3D space text entry on smartwatches. In: Proceedings of the 7th international conference on intelligent human computer interaction (IHCI ’15)
Dunlop MD, Komninos A, Durga N (2014) Towards high quality text entry on smartwatches. In: Extended abstracts on human factors in computing systems (CHI ’14). ACM, pp 2365–2370
Dusan S, Gadbois GJ, Flanagan JL (2003) Multimodal interaction on PDA’s integrating speech and pen inputs. In: INTERSPEECH. pp 2225–2228
Funk M, Sahami A, Henze N, Schmidt A (2014) Using a touch-sensitive wristband for text entry on smart watches. In: Extended abstracts on human factors in computing systems (CHI ’14). ACM, pp 2305–2310
Haas EC, Pillalamarri KS, Stachowiak CC, McCullough G (2011) Temporal binding of multimodal controls for dynamic map displays: a systems approach. In: Proceedings of the 13th international conference on multimodal interfaces. ACM, pp 409-416
Haseloff S (2001) Designing adaptive mobile applications. In: Distributed processing proceedings ninth euromicro workshop on. IEEE, pp 131–138
Hincapié-Ramos JD, Irani P (2013) CrashAlert: enhancing peripheral alertness for eyes-busy mobile interaction while walking. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 3385–3388
Holzinger A (2002) Finger instead of mouse: touch screens as a means of enhancing universal access. In: Universal access: theoretical perspectives, practice, and experience. Springer Berlin, pp 387–397
Hong J, Heo S, Isokoski P, Lee G (2015) SplitBoard: a simple split soft keyboard for wristwatch-sized touch screens. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’15). ACM, pp 1233–1236
Hwang S, Lee G (2005) Qwerty-like \(3 \times 4\) keypad layouts for mobile phone. In: Extended abstracts on human factors in computing systems (CHI ’05). ACM, pp 1479–1482
Kaiser E, Olwal A, McGee D, Benko H, Corradini A, Li X, Feiner S (2003) Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Proceedings of the 5th international conference on Multimodal interfaces. ACM, pp 12-19
Kane SK, Wobbrock JO, Smith IE (2008) Getting off the treadmill: evaluating walking user interfaces for mobile devices in public spaces. In: Proceedings of the 10th international conference on human computer interaction with mobile devices and services. ACM, pp 109–118
Kjeldskov J, Stage J (2004) New techniques for usability evaluation of mobile systems. J Human Comput Stud 60(5):599–620
Kurihara K, Goto M, Ogata J, Igarashi T (2006) Speech pen: predictive handwriting based on ambient multimodal recognition. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’06). ACM, pp 851–860
Kwon S, Choi E, Chung MK (2011) Effect of control-to-display gain and movement direction of information spaces on the usability of navigation on small touch-screen interfaces using tap-n-drag. J Ind Ergon 41(3):322–330
Lee M, Billinghurst M (2008) A Wizard of Oz study for an AR multimodal interface. In: Proceedings of the 10th international conference on Multimodal interfaces. ACM, pp 249–256
Leiva LA, Sahami A, Catalá A, Henze N, Schmidt A (2015) Text Entry on Tiny QWERTY Soft Keyboards. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’15). ACM, pp 669–678
Marshall J, Dancu A, Mueller FF (2016) Interaction in motion: designing truly mobile interaction. In: Proceedings of the 2016 ACM conference on designing interactive systems. ACM, pp 215–228
Mizobuchi S, Chignell M, Newton D (2005) Mobile text entry: relationship between walking speed and text input task difficulty. In: Proceedings of the 7th international conference on human computer interaction with mobile devices and services. ACM, pp 122–128
Nebeling M, Guo A, Murray K, Tostengard A, Giannopoulos A, Mihajlov M, Bigham JP (2015) WearWrite: orchestrating the crowd to complete complex tasks from wearables. In: Proceedings of the 28th annual ACM symposium on user interface software and technology (UIST ’15). ACM, pp 39–40
Oney S, Harrison C, Ogan A, Wiese J (2013) ZoomBoard: a diminutive qwerty soft keyboard using iterative zooming for ultra-small devices. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’13). ACM, pp 2799–2802
Oviatt S (1999) Mutual disambiguation of recognition errors in a multimodal architecture. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ’99). ACM, pp 576–583
Paliwal K, Basu A (1987) A speech enhancement method based on Kalman filtering. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP). IEEE, pp 177–180
Price KJ, Lin M, Feng J, Goldman R, Sears A, Jacko JA (2006) Motion does matter: an examination of speech-based text entry on the move. Univ Access Inf Soc 4(3):246–257
Roudaut A, Huot S, Lecolinet E (2008) TapTap and MagStick: improving one-handed target acquisition on small touch-screens. In: Proceedings of the working conference on advanced visual interfaces. ACM, pp 146–153
Schüssel F, Honold F, Schmidt M, Bubalo N, Huckauf A, Weber M (2014) Multimodal interaction history and its use in error detection and recovery. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 164–171
Shinoda K, Watanabe Y, Iwata K, Liang Y, Nakagawa R, Furui S (2011) Semi-synchronous speech and pen input for mobile user interfaces. Speech Commun 53(3):283–291
Siek KA, Rogers Y, Connelly KH (2005) Fat finger worries: howolder and younger users physically interact with PDAs. In: Human–computer interaction (INTERACT 2005). Springer Berlin, pp 267–280
Tsourakis N (2014) Using hand gestures to control mobile spoken dialogue systems. Univ Access Inf Soc 13(3):257–275
Turunen M, Hurtig T, Hakulinen J, Virtanen A, Koskinen S (2006) Mobile speech-based and multimodal public transport information services. In: Proceedings of MobileHCI 2006 workshop on speech in mobile and pervasive environments
Turunen M, Kallinen A, Sànchez I, Riekki J, Hella J, Olsson T, Valkama P (2009) Multimodal interaction with speech and physical touch interface in a media center application. In: Proceedings of the international conference on advances in computer entertainment technology. ACM, pp 19–26
Yatani K, Truong KN (2007) An evaluation of stylus-based text entry methods on handheld devices in stationary and mobile settings. In: Proceedings of the 9th International conference on human computer interaction with mobile devices and services. ACM, pp 487–494
CMUSphinx. Open source toolkit for speech recognition. Carnegie Mellon University. http://cmusphinx.sourceforge.net/. Accessed 16 Jan 2016
G Watch. LG. http://www.lg.com/us/smart-watches/lg-W100-g-watch/. Accessed 21 Dec 2015
Google Now. The right information at just the right time. Google. https://www.google.co.uk/landing/now/. Accessed 21 Jan 2016
Siri. Your wish is its command, Apple. https://www.apple.com/uk/ios/siri/. Accessed 21 Jan 2016
Acknowledgements
This research was supported in part by the Basic Science Research Program funded through the National Research Foundation of Korea (NRF) and Ministry of Science, ICT & Future Planning (No. 2011-0030079), and also the Forensic Research Program of the National Forensic Service (NFS) and Ministry of Government Administration and Home Affairs (NFS-2017-DIGITAL-06).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, J., Lee, C. & Kim, G.J. Vouch: multimodal touch-and-voice input for smart watches under difficult operating conditions. J Multimodal User Interfaces 11, 289–299 (2017). https://doi.org/10.1007/s12193-017-0246-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-017-0246-y