Skip to main content
Log in

Consistent categorization of multimodal integration patterns during human–computer interaction

  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Multimodal interaction represents a more natural style of human-computer interaction permitting our developed communicative skills to interact with computer systems. It remains a challenging task to design reliable multimodal systems. Employing advanced methods providing optimal performance depends on precise modeling of integration patterns that allows adapting to preferences and differences of individual users. While basic foundation and empirical evidence around these differences has already been described and confirmed in previous research works, introduced measures and classifications seem oversimplified and insufficiently precise to design reliable and robust interaction models. In this paper, results of our study of multimodal integration patterns in systems combining speech and gesture input are presented. Important interaction differences of subjects and their specific multimodal integration patterns were confirmed and completed with our own findings. Based on the obtained results, a new integration pattern categorization is defined and analyzed. The introduced categorization provides more reliable and consistent results in comparison with classifications presented in related literature. Moreover, its generality means it is applicable on other input modality combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The original algorithm is rotation invariant.

  2. http://www.anvil-software.org

  3. In order to distinguish between the two definitions, the one from Oviatt et al. will be denoted as SEQ\(_O\)/SIM\(_O\) and our redefined as SEQ\(_R\)/SIM\(_R\) in the rest of the work.

References

  1. Bangalore S, Johnston M (2009) Robust understanding in multimodal interfaces. Comput Linguist 35(3):345–397. doi:10.1162/coli.08-022-R2-06-26

    Article  Google Scholar 

  2. Billinghurst M, Lee M (2012) Multimodal interfaces for augmented reality, expanding the frontiers of visual analytics and visualization. pp 449–465 doi:10.1007/978-1-4471-2804-5

  3. Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on computer graphics and interactive techniques - SIGGRAPH ’80, vol 32. ACM Press, pp 262–270. doi:10.1145/800250.807503

  4. Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. In: Proceedings of the fifth ACM international conference on multimedia-MULTIMEDIA ’97, ACM Press, pp 31–40. doi:10.1145/266180.266328

  5. Cohen PR, Kaiser EC, Buchanan MC, Lind S, Corrigan MJ, Wesson RM (2015) Sketch-Thru-Plan: a multimodal interface for command and control. Commun of the ACM 58(4):56–65. doi:10.1145/2735589

    Article  Google Scholar 

  6. Dumas B, Lalanne D, Oviatt S (2009) Multimodal interfaces: a durvey of principles, models and frameworks. In: Lalanne D, Kohlas J (eds) Human machine interaction, Lecture notes in computer science, vol 5440. Springer, Berlin, pp 3–26. doi:10.1007/978-3-642-00437-7_1

    Chapter  Google Scholar 

  7. Ehlen P, Johnston M (2012) Multimodal interaction patterns in mobile local search. In: Proceedings of the 2012 ACM international conference on intelligent user interfaces - IUI ’12, pp 21–24 . doi:10.1145/2166966.2166970

  8. Haas EC, Pillalamarri KS, Stachowiak CC, McCullough G (2011) Temporal binding of multimodal controls for dynamic map displays. In: Proceedings of the 13th international conference on multimodal interfaces - ICMI ’11, ACM Press. p 409. doi:10.1145/2070481.2070558

  9. Huang X, Oviatt S (2006) Toward adaptive information fusion in multimodal systems. In: Renals S, Bengio S (eds) Machine learning for multimodal interaction, Lecture notes in computer science, vol 3869. Springer, Berlin, pp 15–27. doi:10.1007/11677482_2

    Chapter  Google Scholar 

  10. Huang X, Oviatt S, Lunsford R (2006) Combining user modeling and machine learning to predict users multimodal integration patterns. In: Renals S, Bengio S, Fiscus JG (eds) Machine Learning for multimodal interaction, Lecture notes in computer science, vol 4299. Springer, Berlin, pp 50–62. doi:10.1007/11965152_5

    Chapter  Google Scholar 

  11. Huggins-Daines D, Kumar M, Chan A, Black A, Ravishankar M, Rudnicky A (2006) Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proceedings of IEEE international conference on acoustics speech and signal processing. pp 185–188. doi:10.1109/ICASSP.2006.1659988

  12. Johnston M, Bangalore S (2005) Finite-state multimodal integration and understanding. Nat Lang Eng 11(2):159–187. doi:10.1017/S1351324904003572

    Article  Google Scholar 

  13. Johnston M, Bangalore S, Vasireddy G, Stent A, Ehlen P, Walker M, Whittaker S, Maloor P (2002) MATCH: an architecture for multimodal dialogue systems. In: Proceedings of the 40th annual meeting on association for computational linguistics - ACL ’02, July, pp 376–383. doi:10.3115/1073083.1073146

  14. Lee M, Billinghurst M, Baek W, Green R, Woo W (2013) A usability study of multimodal input in an augmented reality environment. Virtual Real 17(4):293–305. doi:10.1007/s10055-013-0230-0

    Article  Google Scholar 

  15. Lewis JR (2012) Usability testing. In: Handbook of human factors and ergonomics. Wiley, pp 1267–1312. doi:10.1002/9781118131350.ch46

  16. Oviatt S (1999) Ten myths of multimodal interaction. Commun of the ACM 42(11):74–81. doi:10.1145/319382.319398

    Article  Google Scholar 

  17. Oviatt S (2003) User-centered modeling and evaluation of multimodal interfaces. Proc of the IEEE 91(9):1457–1468. doi:10.1109/JPROC.2003.817127

    Article  Google Scholar 

  18. Oviatt S, Coulston R, Lunsford R (2004) When do we interact multimodally?. In: Proceedings of the 6th international conference on multimodal interfaces - ICMI ’04, ACM Press, pp 129–136. doi:10.1145/1027933.1027957

  19. Oviatt S, Coulston R, Tomko S, Xiao B, Lunsford R, Wesson M, Carmichael L (2003) Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of the 5th international conference on multimodal interfaces - ICMI ’03, ACM Press, pp 44–51. doi:10.1145/958432.958443

  20. Oviatt S, DeAngeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of the SIGCHI conference on human factors in computing systems - CHI ’97, ACM Press, pp 415–422. doi:10.1145/258549.258821

  21. Oviatt S, Lunsford R, Coulston R (2005) Individual differences in multimodal integration patterns: what are they and why do they exist?. In: Proceedings of the SIGCHI conference on human factors in computing systems - CHI ’05, ACM Press, pp 241–249. doi:10.1145/1054972.1055006

  22. Schüssel F, Honold F, Schmidt M, Bubalo N, Huckauf A, Weber M (2014) Multimodal interaction history and its use in error detection and recovery. In: Proceedings of the 16th international conference on multimodal interaction - ICMI ’14, ACM Press, pp 164–171. doi:10.1145/2663204.2663255

  23. Serrano M, Nigay L (2010) A wizard of oz component-based approach for rapidly prototyping and testing input multimodal interfaces. J Multimodal User Interfaces 3(3):215–225. doi:10.1007/s12193-010-0042-4

    Article  Google Scholar 

  24. Wobbrock JO, Wilson AD, Li Y (2007) Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Proceedings of the 20th annual ACM symposium on user interface software and technology - UIST ’07, ACM Press, pp 159–169. doi:10.1145/1294211.1294238

  25. Xiao B, Girand C, Oviatt S (2002) Multimodal integration patterns in children. In: Proceedings of international conference on spoken language processing, pp 629–632

  26. Xiao B, Oviatt S (2003) Modeling multimodal integration patterns and performance in seniors : toward adaptive processing of individual differences. In: Proceedings of the 5th international conference on multimodal interfaces - ICMI ’03, pp 256–272. doi:10.1145/958432.958480

Download references

Acknowledgements

We would like to thank Michal Vondra for providing an initial feedback during a pilot test, and all volunteers for participating in the study. Thanks also to the anonymous reviewers for their helpful comments and suggestions. This work has been supported by the Grant Agency of the Czech Technical University in Prague, Grant No. SGS16/156/OHK3/2T/13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Hak.

Appendices

Appendix

Testing scenarios

The following listing contains a complete set of objectives as introduced to tested subjects:

  1. 1.

    Zoom in and out the map view.

  2. 2.

    Get your current location and find the nearest petrol station.

  3. 3.

    Get detail information about two gas stations.

  4. 4.

    Get directions between Oloumouc and Liberec.

  5. 5.

    Get information about cinemas in your location.

  6. 6.

    Find estimated travel time between an airport near Prague and a theatre in the downtown of Prague.

  7. 7.

    Get coordinates of at least 3 hospitals in Pilsen.

  8. 8.

    Find the nearest police and emergency.

  9. 9.

    Find a travel distance between a railway station in Brno and the closest airport.

  10. 10.

    Find a name of the nearest bus and subway station.

  11. 11.

    Find names of some pubs and restaurants in the downtown of Ceske Budejovice.

  12. 12.

    Find phone numbers of libraries in the surrounding area.

  13. 13.

    Get a postal address of a coffeehouse around a museum in Cesky Krumlov.

  14. 14.

    Get phone numbers and postal addresses of churches in the surrounding area of Brno.

  15. 15.

    Get details of the two nearest restaurants in your current location.

  16. 16.

    Find a travel distance from the westernmost to the easternmost point and then from the northernmost to the southernmost point of Czech Republic.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hak, R., Zeman, T. Consistent categorization of multimodal integration patterns during human–computer interaction. J Multimodal User Interfaces 11, 251–265 (2017). https://doi.org/10.1007/s12193-017-0243-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-017-0243-1

Keywords

Navigation