research-article

Vision-Based Technique and Issues for Multimodal Interaction in Augmented Reality

Authors:

Ajune Wanis Ismail,

Mark Billinghurst,

Mohd Shahrizal SunarAuthors Info & Claims

VINCI '15: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction

Pages 75 - 82

https://doi.org/10.1145/2801040.2801058

Published: 24 August 2015 Publication History

Abstract

Although many progresses have been accomplished in multimodal interaction, most researchers still treat each modality such as vision and speech, separately. They integrate the results at the application stage. This is because the roles of multiple modalities and their interactions continue to be quantified and precisely understood. However, there are many remaining issues in combining each modality individually. This paper will highlight the main vision problems based on our review for multimodal applications. This review paper will give an overview of the Augmented Reality (AR) technologies which are contributing in most of recent multimodal applications. We cluster vision techniques according to the natural human senses such as face, gesture, and speech that are frequently used in multimodal applications. The main contribution of this paper is to consolidate some of the main issues and approaches in vision-based technique, and to study some of the applications in AR that have been developed within the context of multimodal interaction. We conclude this paper with the future directions.

References

[1]

Ismail, A. W., & Sunar, M. S. (2014, November). Multimodal Fusion: Gesture and Speech Input in Augmented Reality Environment. In Computational Intelligence in Information Systems: Proceedings of the Fourth INNS Symposia Series on Computational Intelligence in Information Systems (INNS-CIIS 2014) (Vol. 331, p. 245). Springer.

[2]

Azuma R., Baillot Y., Behringer R., Feiner S., Julier S. and Blair M., "Recent advances in augmented reality. IEEE Computer Graphics and Applications", 20--38. (2001).

Digital Library

[3]

Zhou, Feng, Henry Been-Lirn Duh, and Mark Billinghurst. "Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR."Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, 2008.

Digital Library

[4]

Mitra, Sushmita, and TinkuAcharya. "Gesture recognition: A survey." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 37.3 (2007): 311--324.

Digital Library

[5]

Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., and Feiner, S.: 2003, Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality, ICMI '03: International Conference on Multimodal Interfaces (Aug 2003), 12--19.

Digital Library

[6]

Corradini, A. and Cohen, P.: 2002, On the Relationships among Speech, Gestures, and Object Manipulation in Virtual Environments: Initial Evidence, Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, 52--61.

[7]

Lim, C. J., Younghwan Pan, and Jane Lee. "Human Factors and Design Issues in Multimodal (Speech/Gesture) Interface." JDCTA 2.1 (2008): 67--77.

[8]

Jewitt, Carey. Technology, literacy and learning: A multimodal approach. Psychology Press, 2006.

[9]

Haller, M., Billinghurst, M., & Thomas, B. H. (Eds.). (2007). Emerging technologies of augmented reality: interfaces and design. IGI Global.

Digital Library

[10]

Richard, A.: Bolt: Put-That-There: Voice and Gesture at the Graphics Interface. In: Proceedings of the International conference on Computer graphics and interactive techniques, vol. 14, pp. 262--270 (1980)

Digital Library

[11]

Oviatt, S.L.: Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th Annual ACM international Conference on Multimedia (Santa Barbara, CA, USA, October 23--27, 2006), ACM, New York, NY, pp. 871--880 (2006).

Digital Library

[12]

Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. In: CVPR, vol. 2, pp. 407--412 (1999).

[13]

Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Pittman, J.A., Smith, I., Chen, L., Clow, J.: Quickset: Multimodal Interaction for Distributed Applications. In: Proceedings of the Fifth Annual International Multimodal Conference, pp. 31--40 (1997)

Digital Library

[14]

Sharma, Rajeev, Vladimir I. Pavlovic, and Thomas S. Huang. "Toward multimodal human-computer interface." Proceedings of the IEEE 86.5 (1998): 853--869.

[15]

Chu, C., Dani, T., and Gadh, R.: 1997, Multimodal Interface for a virtual reality based computer aided design system. Proceedings of 1997 IEEE International Conference on Robotics and Automation, 2, 1329--1334.

[16]

Nicholson, Mark, and Paul Vickers. "Pen-Based gestures: An approach to reducing screen clutter in mobile computing." Mobile Human-Computer Interaction-MobileHCI 2004. Springer Berlin Heidelberg, 2004. 320--324.

[17]

S. Oviatt and P. Cohen. Multimodal Interfaces That Process What Comes Naturally. Communications of the ACM, 43(3):45:53, 2000.

Digital Library

[18]

ARToolKit, http://www.hitl.washington.edu/artoolkit

[19]

Kölsch, M. Vision Based Hand Gesture Interfaces for Wearable Computing and Virtual Environments. PhD thesis, University of California, Santa Barbara, 2004

[20]

Olwal, A., Benko, H., Feiner, S. SenseShapes: Using Statistical Geometry for Object Selection in a Multimodal Augmented Reality System. In Proceedings of The Second IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2003). Tokyo, Japan. October 7-10, 2003. p. 300--301.

Digital Library

[21]

Heidemann, G., Bax, I., and Bekel, H.: 2004, Multimodal Interaction in an Augmented Reality Scenario, ICMI '04: Proceedings of International Conference on Multimodal Interfaces, 53--60.

Digital Library

[22]

Dierker, Angelika, et al. "Mediated attention with multimodal augmented reality." Proceedings of the 2009 international conference on Multimodal interfaces. ACM, 2009.

Digital Library

[23]

Irawati, S., Green, S., Billinghurst, M., Duenser, A., & Ko, H. (2006). An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In Advances in Artificial Reality and Tele-Existence (pp. 272--283). Springer Berlin Heidelberg.

Digital Library

[24]

Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., Tachibana, K.: Virtual Object Manipulation on a Table-Top AR Environment. In: Proceedings of the International Symposium on Augmented Reality, pp. 111--119 (2000)

[25]

T. Lee and T. Höllerer. Handy AR: markerless inspection of Augmented Reality objects using fingertip tracking. In Proceedings of the 11th IEEE International Symposium on Wearable Computers, ISWC '07, pages 83--90, Boston, MA, USA, October, 2007.

Digital Library

[26]

Krum, D. M., Omotesto, O., Ribarsky, W., Starner, T., and Hodges, L. F.: 2002, Speech and gesture control of a whole earth 3D visualization environment, Proceedings of Joint Eurographics-IEEE TCVG Symposium on visualization, 195--200.

Digital Library

[27]

Billinghurst, Mark. "Hands and speech in space: multimodal interaction with augmented reality interfaces." Proceedings of the 15th ACM on International conference on multimodal interaction. ACM, 2013.

Digital Library

[28]

Lee, Minkyung. "Multimodal Speech-Gesture Interaction with 3D Objects in Augmented Reality Environments." (2010).

[29]

Neumann A. Design and implementation of multi-modal AR-based interaction for cooperative planning tasks. Bielefeld: Bielefeld University; 2011.

[30]

Neumann, A., Schnier, C., Hermann, T., & Pitsch, K. (2013). Interaction Analysis and Joint Attention Tracking In Augmented Reality. Proceedings of the 15th ACM International Conference on Multimodal Interaction, 165--172.

Digital Library

[31]

Pitsch K, Neumann A, Schnier C, Hermann T. Augmented reality as a tool for linguistic research: Intercepting and manipulating multimodal interaction. In: Multimodal corpora: Beyond audio and video (IVA 2013 workshop).; 2013: 23--29.

[32]

Ismail, A. W., & Sunar, M. S. (2013). Intuitiveness 3D objects Interaction in Augmented Reality Using S-PI Algorithm. TELKOMNIKA Indonesian Journal of Electrical Engineering, 11(7), 3561--3567.

[33]

Bai, H., Gao, L., El-Sana, J., & Billinghurst, M. (2013, November). Free-hand interaction for handheld augmented reality using an RGB-depth camera. In SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive Applications (p. 22). ACM.

Digital Library

[34]

Piumsomboon, T., Clark, A., Billinghurst, M., & Cockburn, A. (2013). User-defined gestures for augmented reality. In Human-Computer Interaction--INTERACT 2013 (pp. 282--299). Springer Berlin Heidelberg

Digital Library

[35]

Lee, Minkyung, Mark Billinghurst, Woonhyuk Baek, Richard Green, and Woontack Woo. "A usability study of multimodal input in an augmented reality environment." Virtual Reality 17, no. 4 (2013): 293--305

Digital Library

[36]

Bai, Huidong, Lei Gao, Jihad El-Sana, and Mark Billinghurst. "Markerless 3D gesture-based interaction for handheld augmented reality interfaces." In Mixed and Augmented Reality (ISMAR), 2013 IEEE International Symposium on, pp. 1--6. IEEE, 2013

[37]

Alexander. G. Hauptmann: Speech and Gestures for Graphic Image Manipulation. In Proceedings of ACM Conference on Human Factors in Computing Systems (1989) 241--245

Digital Library

[38]

Fröhlich, C., Biermann, P., Latoschik, M. E., & Wachsmuth, I. (2009). Processing Iconic Gestures in a Multimodal Virtual Construction Environment. In: M. Dias, S. Gibet, M. M. Wanderley, R. Bastos (Eds.), Gesture-Based Human-Computer Interaction and Simulation. Springer LNAI 5085, 187--192.

[39]

Rautaray, Siddharth S., and AnupamAgrawal. "Vision based hand gesture recognition for human computer interaction: a survey." Artificial Intelligence Review (2012): 1--54.

Digital Library

[40]

Ren, Z., Yuan, J., Meng, J., & Zhang, Z. (2013). Robust part-based hand gesture recognition using kinect sensor. Multimedia, IEEE Transactions on, 15(5), 1110--1120.

Digital Library

[41]

Rozan, M. R., Sidik, M. K. M., Sunar, M. S., & Omar, A. H. (2015). KIHECT©©: Reliability of Hand-Eye Coordination among Rugby Players Using Consumer Depth Camera. In Computational Intelligence in Information Systems (pp. 201--210). Springer International Publishing.

[42]

M. Pantic and L.J.M. Rothkrantz, "Automatic analysis of facial expressions: The state of the art," IEEE Trans. on PAMI, 22(12):1424--1445, 2000.

Digital Library

[43]

Paris, Sylvain, Samuel W. Hasinoff, and Jan Kautz. "Local Laplacian filters: edge-aware image processing with a Laplacian pyramid." ACM Trans. Graph. 30.4 (2011): 68.

Digital Library

[44]

N. Sebe, I. Cohen, and T.S. Huang, Multimodal emotion recognition, Handbook of Pattern Recognition and Computer Vision, World Scientific, 2005.

Digital Library

[45]

G. Bailly, E. Vatikiotis, and P. Perrier, Issues in Visual and Audio-Visual Speech Processing, MIT Press, 2004.

[46]

R. Navarathna, P. Lucey, D. Dean, C. Fookes, and S. Sridharan, "Lip Detection for Audio-Visual Speech Recognition In-Car Environment," in Proc. 10th International Conference on Information Science, Signal Processing and their Applications, 2010, pp. 598--601.

[47]

P. Shen, S. Tamura, and S. Hayamizu, "Evaluation of Real-time Audio-Visual Speech Recognition," presented at International Conference on Audio-Visual Speech Processing, Japan, 2010.

[48]

Z. Handley, "Is Text-to-Speech Synthesis Ready for Use in Computer-Assisted Language Learning?," The International Journal of Speech Communication, Elsevier, vol. 51, no. 10, pp. 906--919, 2009.

Digital Library

[49]

M.Z. Rashad, H.M. El-Bakry, I.R. Isma'il, and N. Mastorakis, "An Overview of Text-to-Speech Synthesis Techniques," in Proc. of the 4th International Conference on Communications and Information Technology, USA, 2010, pp. 84--89.

Digital Library

[50]

S. Goose, S. Sudarsky, X. Zhang, and N. Navab, "Speech-Enabled Augmented Reality Supporting Mobile Industrial Maintenance," The IEEE Journal of Pervasive Computing, vol. 2, no. 1, pp. 65--70, 2003.

Digital Library

[51]

Furlan, Rod. "Build your own google glass {Resources_Hands On}." Spectrum, IEEE 50.1 (2013): 20--21

[52]

Nogueira, Pedro Alves, Luís Filipe Teöfilo, and Pedro Brandão Silva. "Multi-modal natural interaction in game design: a comparative analysis of player experience in a large scale role-playing game." Journal on Multimodal User Interfaces 9(2), pp. 105--119, 2015.

Cited By

Hu YHu WQuigley A(2024)Towards Enhanced Context Awareness with Vision-based Multimodal InterfacesAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3686646(1-3)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3686646
Dongye XWeng DJiang HTian ZBao YChen P(2024)Personalized decision-making for agents in face-to-face interaction in virtual realityMultimedia Systems10.1007/s00530-024-01591-731:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s00530-024-01591-7
Ismail AAladin MHalim N(2024)Digital Twin in Extended Reality Applications for Industry 4.0Renewable Power for Sustainable Growth10.1007/978-981-99-6749-0_58(867-880)Online publication date: 3-Jan-2024
https://doi.org/10.1007/978-981-99-6749-0_58
Show More Cited By

Index Terms

Vision-Based Technique and Issues for Multimodal Interaction in Augmented Reality
1. Computing methodologies
  1. Computer graphics
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality
ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces

We describe an approach to 3D multimodal interaction in immersive augmented and virtual reality environments that accounts for the uncertain nature of the information sources. The resulting multimodal system fuses symbolic and statistical information ...
Multimodal augmented reality: the norm rather than the exception
MVAR '16: Proceedings of the 2016 workshop on Multimodal Virtual and Augmented Reality

Augmented reality (AR) is commonly seen as a technology that overlays virtual imagery onto a participant's view of the world. In line with this, most AR research is focused on what we see. In this paper, we challenge this focus on vision and make a case ...
Mind the Mix: Exploring the Cognitive Underpinnings of Multimodal Interaction in Augmented Reality Systems
CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Exploring the intricate dynamics of Multimodal Interaction (MMI) in Augmented Reality (AR), this study presents a novel conceptual framework, crafted from a review of cognitive theories. Our research delves into how input modalities, output modalities, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

VINCI '15: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction

August 2015

185 pages

ISBN:9781450334822

DOI:10.1145/2801040

General Chair:
Takayuki Itoh,
Program Chairs:
Paolo Bottoni,
Shigeo Takahashi

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

VINCI '15

VINCI '15: The 8th International Symposium on Visual Information Communication and Interaction

August 24 - 26, 2015

AA, Tokyo, Japan

Acceptance Rates

VINCI '15 Paper Acceptance Rate 12 of 32 submissions, 38%;

Overall Acceptance Rate 71 of 193 submissions, 37%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
449
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu YHu WQuigley A(2024)Towards Enhanced Context Awareness with Vision-based Multimodal InterfacesAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3686646(1-3)Online publication date: 21-Sep-2024
https://dl.acm.org/doi/10.1145/3640471.3686646
Dongye XWeng DJiang HTian ZBao YChen P(2024)Personalized decision-making for agents in face-to-face interaction in virtual realityMultimedia Systems10.1007/s00530-024-01591-731:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s00530-024-01591-7
Ismail AAladin MHalim N(2024)Digital Twin in Extended Reality Applications for Industry 4.0Renewable Power for Sustainable Growth10.1007/978-981-99-6749-0_58(867-880)Online publication date: 3-Jan-2024
https://doi.org/10.1007/978-981-99-6749-0_58
Gardony AOkano KWhitig ASmith M(2024)Selection in Stride: Comparing Button- and Head-Based Augmented Reality Interaction During LocomotionHCI International 2024 Posters10.1007/978-3-031-61950-2_3(22-32)Online publication date: 7-Jun-2024
https://doi.org/10.1007/978-3-031-61950-2_3
Aladin MIsmail AYusof CSafiee N(2024)Object Manipulation Using Real Hand Gesture for Augmented Reality Interior DesignEncyclopedia of Computer Graphics and Games10.1007/978-3-031-23161-2_366(1267-1275)Online publication date: 5-Jan-2024
https://doi.org/10.1007/978-3-031-23161-2_366
He ZCai Z(2023)Enhancing Augmented Reality Dialogue Systems with Multi-Modal Referential Information2023 China Automation Congress (CAC)10.1109/CAC59555.2023.10450983(6838-6843)Online publication date: 17-Nov-2023
https://doi.org/10.1109/CAC59555.2023.10450983
Wu YWang YLou XZhang M(2023)An empirical practice of design and evaluation of freehand interaction gestures in virtual realityMultimedia Tools and Applications10.1007/s11042-023-17640-883:17(52481-52507)Online publication date: 8-Nov-2023
https://doi.org/10.1007/s11042-023-17640-8
Fadzli FIsmail ASuaib NYee L(2023)Autonomous Agent Using AI Q-Learning in Augmented Reality Ludo Board GameAdvanced Engineering, Technology and Applications10.1007/978-3-031-50920-9_24(311-323)Online publication date: 23-Dec-2023
https://doi.org/10.1007/978-3-031-50920-9_24
Varghese A(2020)Exploring Bi-modal Feedback in Augmented RealityProceedings of the 11th Indian Conference on Human-Computer Interaction10.1145/3429290.3429296(55-61)Online publication date: 5-Nov-2020
https://dl.acm.org/doi/10.1145/3429290.3429296
Sahib ZNuri Ucan OTalab MAlnaseeri MMohammed AAli Sahib H(2020)Hybrid Method Using EDMS & Gabor for Shape and Texture2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA49412.2020.9152829(1-6)Online publication date: Jun-2020
https://doi.org/10.1109/HORA49412.2020.9152829
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten