Abstract
Contemplating the concept of universal-access multi-modal browsing comes as one of the emerging “killer” technologies that promises broader and more flexible access to information, faster task completion, and advanced user experience. Inheriting the best from GUI and speech, based on the circumstances, hardware capabilities, and environment, multi-modality’s great advantage is to provide application developers with a scalable blend of input and output channels that may accommodate any user, device, and platform. This article describes a flexible multi-modal browser architecture, named Ferda the Ant, which reuses uni-modal browser technologies available for VoiceXML, WML, and HTML browsing. A central component, the Virtual Proxy, acts as a synchronization coordinator. This browser architecture can be implemented in either a single client configuration, or by distributing the browser components across the network. We have defined and implemented a synchronization protocol to communicate the changes occurring in the context of a component browser to the other browsers participating in the multi-modal browser framework. Browser wrappers implement the required synchronization protocol functionality at each of the component browsers. The component browsers comply with existing content authoring standards, and we have designed a set of markup-level authoring conventions that facilitate maintaining the browser synchronization .
Similar content being viewed by others
References
CATCH 2004 EU Project. Available at: http://www.catch2004.org
W3C Recommendation (13 November, 2000) Document Object Model (DOM) Level 2 Core Specification, Version 1.0. Available at: http://www.w3.org/TR/DOM-Level-2-Core/
W3C Working Draft (Oct 2001) VoiceXML 2.0. Available at: http://www.w3.org/TR/2001/WD-voicexml20-20011023
W3C Working Draft (June 2000) Multimodal Requirements for Voice Markup Languages. Available at: http://www.w3.org/TR/multimodal-reqs
Vergo J (1998) A statistical approach to multimodal natural language interaction. Proc 15th National Conference on Artificial Intelligence (AAAI’98), Madison, Wisconsin
Oviatt S (2000) Ten myths of multimodal interaction. Available at: http://www.cse.ogi.edu/CHCC/Papers/sharonPaper/Myths/myths.html
Oviatt S (2000) Taming recognition errors with a multimodal interface. Available at: http://www.cse.ogi.edu/CHCC/Publications/cacm9-2000/cacm9-2000.htm
IBM Websphere Voice Server SDK 2.0. Available at: http://www-3.ibm.com/software/speech/enterprise/ep_11.html
Microsoft Internet Explorer. Available at: http://www.microsoft.com
Nokia WAP Toolkit. Available at: http://www.nokia.com
Maes SH, Hosn R, Kleindienst J, Macek T, Raman TV, Seredi L (2001) A DOM-based MVC multi-modal e-business. IEEE Int Conf Multimedia and Expo (ICME2001), Tokyo, Japan
Ramaswamy G, Kleindienst J, Coffman D, Gopalakrishnan P, Neti C (1999) A pervasive conversational interface for information interaction. Eurospeech 99, Budapest, Hungary
Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. Proc 5th Int Multimedia Conf (Multimedia ’97), ACM Press, pp 31–40
House D (1995) Spoken language access to multimedia (SLAM): A multimodal interface to the World-Wide Web. Master’s thesis, Department of Computer Science and Engineering, Oregon Graduate Institute of Science & Technology, Portland, OR
Généreux M, Klein A, Schwank I, Trost H (2000) Evaluating multi-modal input modes in a Wizard-of-Oz study for the domain of Web search. Presented at HCI-IHM2001, Lille, Franc. Available at: http://www.ai.univie.ac.at/∼michel/pub/IHM-HCL2001.pdf
Rössler H, Sienel J, Wajda W, Hoffmann J, Kostrzewa M (2001) Multimodal interaction for mobile environments. Int Workshop on Information Presentation and Natural Multimodal Dialogue, Verona, Italy
Fischer V, Günther C, Ivanecky J, Kunzmann S, Sedivy J, Ures L (2002) Towards multi-modal interfaces for embedded devices. In: Hoffmann R (ed) Elektronische Spachsignalverarbeitung – Tagungsband der 13. Konferenz, Reihe: Studientexte der Sprachkommunikation, Bd 24. w.e.b. Universitätsverlag, Dresden, pp 154–160
Demesticha V, Gergic J, Kleindienst J, Mast M, Polymenakos L, Schulz H, Seredi L (2001) Aspects of design and implementation of multi-channel and multi-modal information system. ICSM2001, Italy
Despotopoulos Y, Patikis G, Soldatos J, Polymenakos L, Kleindienst J, Gergic J (2001) Accessing and transforming dynamic content based on XML: alternative techniques and a practical implementation. IIWAS 2001, Linz
IBM, Motorola and Opera Software (2001) XHTML+Voice. Submission to W3C, November 2001. Available at: http://www.w3.org/Submission/2001/13/
Microsoft (2002) SALT 1.0 Specification Contributed to W3C. Available at: http://www.saltforum.org
Niklfeld G, Pucher M, Finan R, Eckhart W (2002) Mobile multi-modal data services for GPRS phones and beyond. ICMI 2002, Pittsburgh, USA
Azzini I, Giorgino T, Nardelli L, Orlandi M, Rognoni C (2002) An architecture for a multi-modal Web browser. IDS 2002, Kloster Irsee, Germany.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kleindienst , J., Seredi , L., Kapanen , P. et al. Loosely-coupled approach towards multi-modal browsing. UAIS 2, 173–188 (2003). https://doi.org/10.1007/s10209-003-0047-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10209-003-0047-9
Keywords
- Multi-modal
- Browser
- VoiceXML
- HTML
- WML
- MM, multi-modal
- DOM, Document Object Model
- VP, Virtual Proxy
- GUI, Graphical User Interface
- NLU, Natural Language Understanding
- WML,Wireless Markup Language
- HTML, HyperText Markup Language
- WWW, World-Wide Web
- WAP, Wireless Application Protocol
- W3C, World-Wide Web Consortium
- VoiceXML, Voice eXtensible Markup Language
- COM, Component Object Model
- HTTP, HyperText Transfer Protocol
- API, Application Programming Interface
- UI, User Interface
- FIA, Form Interpretation Algorithm