Abstract
Multimodal interfaces are expected to improve input and output capabilities of increasingly sophisticated applications. Several approaches are aimed at formally describing multimodal interaction. However, they rarely treat it as a continuous flow of actions, preserving its dynamic nature and considering modalities at the same level. This work proposes a model-based approach called Practice-oriented Analysis and Description of Multimodal Interaction (PALADIN) aimed at describing sequential multimodal interaction beyond such problems. It arranges a set of parameters to quantify multimodal interaction as a whole, in order to minimise the existing differences between modalities. Furthermore, interaction is described stepwise to preserve the dynamic nature of the dialogue process. PALADIN defines a common notation to describe interaction in different multimodal contexts, providing a framework to assess and compare the usability of systems. Our approach was integrated into four real applications to conduct two experiments with users. The experiments show the validity and prove the effectiveness of the proposed model for analysing and evaluating multimodal interaction.







Similar content being viewed by others
Notes
A barge-in attempt occurs when the user intentionally addresses the system while the system is still speaking, displaying the information of a GUI, performing a gesture or sending information using another modality.
A facade is an object that provides a simplified interface to a larger body of code, such as a class library or a software framework.
An open-source implementation of the Android HCI Extractor [36] can be downloaded from http://code.google.com/p/android-hci-extractor. More information related to this tool and its integration with the model and the framework described above can be found in [35].
References
Araki M, Kouzawa A, Tachibana K (2005) Proposal of a multimodal interaction description language for various interactive agents. Trans Inf Syst E88-D(11):2469–2476
Balbo S, Coutaz J, Salber D (1993) Towards automatic evaluation of multimodal user interfaces. In: Proceedings of the 1st international conference on intelligent user interfaces, IUI ’93ACM, New York, NY, USA, pp 201–208
Balme L, Demeure A, Barralon N, Coutaz J, Calvary G (2004) Cameleon-rt: a software architecture reference model for distributed, migratable, and plastic user interfaces. In: Markopoulos P, Eggen B, Aarts EHL, Crowley JL (eds) EUSAI. Lecture Notes in Computer Science, vol 3295. Springer, Berlin, pp 291–302. http://dblp.uni-trier.de/db/conf/eusai/eusai2004.html#BalmeDBCC04
Bayer S, Damianos LE, Kozierok R, Mokwa J (1999) The MITRE multi-modal logger: its use in evaluation of collaborative systems. ACM Comput Surv 31(2es):17
Beringer N, Kartal U, Louka K, Schiel F, Türk U (2002) PROMISE—a procedure for multimodal interactive system evaluation. In: Proceedings of multimodal resources and multimodal systems evaluation workshop (LREC 2002), pp 77–80
Bernsen NO, Dybkjær L (2009) Multimodal usability. Springer, Berlin
Bourguet ML (2003) Designing and prototyping multimodal commands. In: Rauterberg M, Menozzi M, Wesson J (eds) INTERACT. IOS Press, Amsterdam. http://dblp.uni-trier.de/db/conf/interact/interact2003.html#Bourguet03
Carey R, Bell G (1997) The annotated VRML 2.0 reference manual. Addison-Wesley, Boston
Cohen PR, McGee DR (2004) Tangible multimodal interfaces for safety–critical applications. Commun ACM 47(1):41–46
Coutaz J, Nigay L, Salber D, Blandford A, May J, Young RM (1995) Four easy pieces for assessing the usability of multimodal interaction: the CARE properties. In: Arnesen SA, Gilmore D (eds) Proceedings of INTERACT’95 conference. Chapman & Hall, London, pp 115–120
Damianos LE, Drury J, Fanderclai T, Hirschman L, Kurtz J, Oshika B (2000) Evaluating multi-party multimodal systems. In: Proceedings of the second international conference on language resources and evaluation, vol 3. MIT Media Laboratory, pp 1361–1368
Diefenbach S, Hassenzahl M (2011) Handbuch zur Fun-ni Toolbox. manual, Folkwang Universität der Künste (2011). Retrieved at 16.10.2013. http://fun-ni.org/wp-content/uploads/Diefenbach+Hassenzahl_2010_HandbuchFun-niToolbox.pdf
Ergonomics of human-system interaction (2006) Part 110: Dialogue principles (ISO 9241-110:2006)
Dumas B, Lalanne D, Ingold R (2010) Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML. J Multimodal User Interfaces 3:237–247
Dybkjær L, Bernsen NO, Minker W (2004) Evaluation and usability of multimodal spoken language dialogue systems. Speech Commun 43:33–54
Engelbrecht KP, Kruppa M, Möller S, Quade M (2008) MeMo Workbench for semi-automated usability testing. In: Proceedings of Interspeech 2008 incorporation SST 2008. International Symposium on Computer Architecture, Brisbane, Australia, pp 1662–1665
Fraser NM (1997) Spoken dialogue system evaluation: a first framework for reporting results. In: EUROSPEECH-1997, pp 1907–1910
Fraser NM, Gilbert G (1991) Simulating speech systems. Comput Speech Lang 5(1):81–99
Göbel S, Hartmann F, Kadner K, Pohl C (2006) A device-independent multimodal mark-up language. In: Hochberger C, Liskowsky R (eds) INFORMATIK 2006. Informatik für Menschen, LNI, vol 94. Gesellschaft für Informatik, pp 170–177
Gong XG, Engelbrecht KP (2013) The influence of user characteristics on the quality of judgment prediction models for tablet applications. In: 10. Berliner Werkstatt, pp 198–204
GNU general public license. http://www.gnu.org/licenses/gpl.html
Grice HP (1975) Logic and conversation. Syntax Semant 3:41–58
Johnston M (2009) EMMA: extensible multimodal annotation markup language. W3C recommendation, W3C (2009) http://www.w3.org/TR/2009/REC-emma-20090210/
Jöst M, Häußler J, Merdes M, Malaka R (2005) Multimodal interaction for pedestrians: an evaluation study. In: Amant RS, Riedl J, Jameson A (eds) Proceedings of the 10th international conference on intelligent user interfaces. ACM, New York, pp 59–66
Jouault F, Allilaire F, Bézivin J, Kurtev I (2008) ATL: a model transformation tool. Sci Comput Program 72(1–2):31–39
Kranstedt A, Kopp S, Wachsmuth I (2002) MURML: a multimodal utterance representation markup language for conversational agents. In: Proceedings of AAMAS02 workshop on embodied conversational agents—let’s specify and evaluate them
Kühnel C, Weiss B, Möller S (2010) Parameters describing multimodal interaction—definitions and three usage scenarios. In: Kobayashi T, Hirose K, Nakamura S (eds) Proceedings of 11th annual conference on ISCA (Interspeech 2010). ISCA, Makuhari, pp 2014–2017
Larson JA, Raggett D, Raman TV (2003) W3C multimodal interaction framework. W3C note, W3C (2003). http://www.w3.org/TR/2003/NOTE-mmi-framework-20030506/
Leech G, Wilson A (1996) EAGLES. recommendations for the morphosyntactic annotation of corpora. http://www.ilc.cnr.it/EAGLES96/annotate/annotate.html
Lemmelä S, Vetek A, Mäkelä K, Trendafilov D (2008) Designing and evaluating multimodal interaction for mobile contexts. In: Digalakis V, Potamianos A, Turk M, Pieraccini R, Ivanov Y (eds) Proceedings of the 10th international conference on multimodal interfaces. ACM, New York, pp 265–272
Limbourg Q, Vanderdonckt J, Michotte B, Bouillon L, López-Jaquero V (2005) USIXML: a language supporting multi-path development of user interfaces. In: Bastide R, Palanque P, Roth J (eds) Engineering human computer interaction and interactive systems. Lecture Notes in Computer Science, vol 3425, chap. 12. Springer, Berlin, pp 134–135. doi:10.1007/11431879_12
Malhotra A, Biron PV (2004) XML schema part 2: datatypes second edition. W3C recommendation, W3C. http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
Manca M, Paternó F (2010) Supporting multimodality in service-oriented model-based development environments. In: Bernhaupt R, Forbrig P, Gulliksen J, Lárusdóttir M (eds) HCSE. Lecture Notes in Computer Science, vol 6409. Springer, Berlin, pp 135–148. http://dblp.uni-trier.de/db/conf/hcse/hcse2010.html#MancaP10
Martin JC, Kipp M (2002) Annotating and measuring multimodal behaviour—tycoon metrics in the anvil tool. In: LREC. European Language Resources Association. http://dblp.uni-trier.de/db/conf/lrec/lrec2002.html#MartinK02
Mateo P (2012) Android HCI extractor and the MIM project: integration and usage tutorial. http://www.catedrasaes.org/wiki/MIM. Accessed 04 Nov 2013
Mateo P, Hillmann S (2012) Android HCI Extractor. http://code.google.com/p/android-hci-extractor. Accessed 04 Nov 2013
Mateo P, Hillmann S (2013) Instantiation framework for the PALADIN interaction model. https://github.com/pedromateo/paladin_instantiation. Accessed 04 Nov 2013
Mateo P, Hillmann S (2013) PALADIN: a run-time model for automatic evaluation of multimodal interfaces. https://github.com/pedromateo/paladin. Accessed 04 Nov 2013
Mateo Navarro PL, Martínez Pérez G, Sevilla Ruiz D (2014) A context-aware interaction model for the analysis of users QoE in mobile environments. Int J Hum Comput Interact, Taylor & Francis (in press)
Möller S (2005) Parameters describing the interaction with spoken dialogue systems. ITU-T Recommendation Supplement 24 to P-Series, International Telecommunication Union, Geneva, Switzerland. Based on ITU-T Contr. COM 12–17 (2009)
Möller S (2005) Quality of telephone-based spoken dialogue systems. Springer, New York
Möller S (2011) Parameters describing the interaction with multimodal dialogue systems. ITU-T Recommendation Supplement 25 to P-Series Rec., International Telecommunication Union, Geneva, Switzerland
Nigay L, Coutaz J (1993) A design space for multimodal systems: concurrent processing and data fusion. In: Ashlund S, Mullet K, Henderson A, Hollnagel E, White TN (eds) Proceedings of INTERACT ’93 and CHI ’93 conference on human factors in computing system. ACM, New York, pp 172–178
Olmedo-Rodríguez H, Escudero-Mancebo D, Cardeñoso Payo V (2009) Evaluation proposal of a framework for the integration of multimodal interaction in 3D worlds. In: Proceedings of the 13th international conference on human–computer interactions. Part II: Novel Interaction Methods and Techniques. Springer, Berlin, pp 84–92
Oshry M, Baggia P, Rehor K, Young M, Akolkar R, Yang X, Barnett J, Hosn R, Auburn R, Carter J, McGlashan S, Bodell M, Burnett DC (2009) Voice extensible markup language (VoiceXML) 3.0. W3C working draft, W3C. http://www.w3.org/TR/2009/WD-voicexml30-20091203/
Oviatt S (1999) Ten myths of multimodal interaction. Commun ACM 42:74–81
Oviatt S (2003) Advances in robust multimodal interface design. IEEE Comput Graph Appl 23:62–68
Palanque PA, Schyn A (2003) A model-based approach for engineering multimodal interactive systems. In: Rauterberg M, Menozzi M, Wesson J (eds) INTERACT’03. IOS Press, Amsterdam, pp 543–550
Paternò F, Santoro C, Spano LD (2009) MARIA: a universal, declarative, multiple abstraction-level language for service-oriented applications in ubiquitous environments. ACM Trans Comput Hum Interact 16(4):1–30. doi:10.1145/1614390.1614394
Pelachaud C (2005) Multimodal expressive embodied conversational agents. In: Zhang H, Chua TS, Steinmetz R, Kankanhalli MS, Wilcox L (eds) ACM multimedia. ACM, New York, pp 683–689
Perakakis M, Potamianos A (2007) The effect of input mode on inactivity and interaction times of multimodal systems. In: Massaro DW, Takeda K, Roy D, Potamianos A (eds) Proceedings of the 9th international conference on multimodal interfaces (ICMI 2007). ACM, New York, pp 102–109
Perakakis M, Potamianos A (2008) Multimodal system evaluation using modality eficiency and synergy metrics. In: Proceedings of the 10th international conference on multimodal interfaces (ICMI’08). ACM, New York, pp 9–16
Schatzmann J, Georgila K, Young S (2005) Quantitative evaluation of user simulation techniques for spoken dialogue systems. In: Dybkjær L, Minker W (eds) Proceedings of the 6th SIGdial workshop discourse dialogue. Special Interest Group on Discourse and Dialogue (SIGdial), Associtation for Computational Linguistics (ACL), pp 45–54
Schatzmann J, Young S (2009) The hidden agenda user simulation model. IEEE Trans Audio Speech Lang Process 17(4):733–747
Schmidt S, Engelbrecht KP, Schulz M, Meister M, Stubbe J, Töppel M, Möller S (2010) Identification of interactivity sequences in interactions with spoken dialog systems. In: Proceedings of the 3rd international workshop on perception quality system. Chair of Communication Acoustics TU Dresden, pp 109–114
Serrano M, Nigay L (2010) A wizard of Oz component-based approach for rapidly prototyping and testing input multimodal interfaces. J Multimodal User Interfaces 3(3):215–225. doi:10.1007/s12193-010-0042-4
Serrano M, Nigay L, Demumieux R, Descos J, Losquin P (2006) Multimodal interaction on mobile phones: development and evaluation using ACICARE. In: Nieminen M, Röykkee M (eds) MobileHCI ’06: Proceedings of the 8th conference on human–computer interactions mob. devices serv.,. ACM, New York, pp 129–136
Sonntag D (2012) Collaborative multimodality. KI 26(2):161–168 http://dblp.uni-trier.de/db/journals/ki/ki26.html#Sonntag12
Steinberg D, Budinsky F, Paternostro M, Ed M (2009) EMF. Eclipse Modeling Framwork, 2 edn. Addison-Wesley, Upper Saddle River
Sturm J, Bakx I, Cranen B, Terken J, Wang F (2002) Usability evaluation of a dutch multimodal system for train timetable information. In: Rodriguez MG, Araujo CS (eds) Proceedings of LREC 2002. 3rd Intenational conference on language resources and evaluation, pp 255–261
Sutcliffe A (2008) Multimedia user interface design, chap. 20. Lawrence Erlbaum Associates, New Jersey, pp 393–410
Thompson HS, Maloney M, Beech D, Mendelsohn N (2004) XML schema part 1: Structures second edition. W3C recommendation, W3C. http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/
Vanacken D, Boeck JD, Raymaekers C, Coninx K (2006) NIMMIT: a notation for modeling multimodal interaction techniques. In: Braz J, Jorge JA, Dias M, Marcos A (eds) GRAPP. INSTICC—Institute for Systems and Technologies of Information, Control and Communication, pp 224–231. http://dblp.uni-trier.de/db/conf/grapp/grapp2006.html#VanackenBRC06
Walker M, Litman D, Kamm C, Abella A (1997) PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of the 35th annual meeting of the association for computational linguistics, ACL 97, pp 262–270
Wechsung I, Engelbrecht KP, Kühnel C, Möller S, Weiss B (2012) Measuring the quality of service and quality of experience of multimodal humanmachine interaction. J Multimodal User Interfaces 73(6):73–85
Acknowledgments
This work has been supported by the Cátedra SAES (http://www.catedrasaes.org), a private initiative of the University of Murcia (http://www.um.es) and SAES (Sociedad Anónima de Electrónica Submarina) (http://www.electronica-submarina.com), as well as by the Telekom Innovation Laboratories (http://www.laboratories.telekom.com) within Technische Universität Berlin (http://www.tu-berlin.de).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Translations
Appendix B: Guidelines on features of multimodal description languages
See Table 10.
Appendix C: Parameters used in the model
The tables in this section give an overview about all parameters which are modified or newly introduce in PALADIN compared to ITU-T Suppl. 25 to P-Series Rec. [42]. Table 12 explains the abbreviations which are used in the subsequent tables. Furthermore, Table 11 provides an index containing each parameters (by its abbreviation) and the table or reference describing it.
See Tables 11, 12, 13, 14, 15 and 16.
Rights and permissions
About this article
Cite this article
Mateo Navarro, P.L., Hillmann, S., Möller, S. et al. Run-time model based framework for automatic evaluation of multimodal interfaces. J Multimodal User Interfaces 8, 399–427 (2014). https://doi.org/10.1007/s12193-014-0170-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-014-0170-3