skip to main content
10.1145/2801040.2801058acmotherconferencesArticle/Chapter ViewAbstractPublication PagesvinciConference Proceedingsconference-collections
research-article

Vision-Based Technique and Issues for Multimodal Interaction in Augmented Reality

Published: 24 August 2015 Publication History

Abstract

Although many progresses have been accomplished in multimodal interaction, most researchers still treat each modality such as vision and speech, separately. They integrate the results at the application stage. This is because the roles of multiple modalities and their interactions continue to be quantified and precisely understood. However, there are many remaining issues in combining each modality individually. This paper will highlight the main vision problems based on our review for multimodal applications. This review paper will give an overview of the Augmented Reality (AR) technologies which are contributing in most of recent multimodal applications. We cluster vision techniques according to the natural human senses such as face, gesture, and speech that are frequently used in multimodal applications. The main contribution of this paper is to consolidate some of the main issues and approaches in vision-based technique, and to study some of the applications in AR that have been developed within the context of multimodal interaction. We conclude this paper with the future directions.

References

[1]
Ismail, A. W., & Sunar, M. S. (2014, November). Multimodal Fusion: Gesture and Speech Input in Augmented Reality Environment. In Computational Intelligence in Information Systems: Proceedings of the Fourth INNS Symposia Series on Computational Intelligence in Information Systems (INNS-CIIS 2014) (Vol. 331, p. 245). Springer.
[2]
Azuma R., Baillot Y., Behringer R., Feiner S., Julier S. and Blair M., "Recent advances in augmented reality. IEEE Computer Graphics and Applications", 20--38. (2001).
[3]
Zhou, Feng, Henry Been-Lirn Duh, and Mark Billinghurst. "Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR."Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE Computer Society, 2008.
[4]
Mitra, Sushmita, and TinkuAcharya. "Gesture recognition: A survey." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 37.3 (2007): 311--324.
[5]
Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., and Feiner, S.: 2003, Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality, ICMI '03: International Conference on Multimodal Interfaces (Aug 2003), 12--19.
[6]
Corradini, A. and Cohen, P.: 2002, On the Relationships among Speech, Gestures, and Object Manipulation in Virtual Environments: Initial Evidence, Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, 52--61.
[7]
Lim, C. J., Younghwan Pan, and Jane Lee. "Human Factors and Design Issues in Multimodal (Speech/Gesture) Interface." JDCTA 2.1 (2008): 67--77.
[8]
Jewitt, Carey. Technology, literacy and learning: A multimodal approach. Psychology Press, 2006.
[9]
Haller, M., Billinghurst, M., & Thomas, B. H. (Eds.). (2007). Emerging technologies of augmented reality: interfaces and design. IGI Global.
[10]
Richard, A.: Bolt: Put-That-There: Voice and Gesture at the Graphics Interface. In: Proceedings of the International conference on Computer graphics and interactive techniques, vol. 14, pp. 262--270 (1980)
[11]
Oviatt, S.L.: Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th Annual ACM international Conference on Multimedia (Santa Barbara, CA, USA, October 23--27, 2006), ACM, New York, NY, pp. 871--880 (2006).
[12]
Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. In: CVPR, vol. 2, pp. 407--412 (1999).
[13]
Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Pittman, J.A., Smith, I., Chen, L., Clow, J.: Quickset: Multimodal Interaction for Distributed Applications. In: Proceedings of the Fifth Annual International Multimodal Conference, pp. 31--40 (1997)
[14]
Sharma, Rajeev, Vladimir I. Pavlovic, and Thomas S. Huang. "Toward multimodal human-computer interface." Proceedings of the IEEE 86.5 (1998): 853--869.
[15]
Chu, C., Dani, T., and Gadh, R.: 1997, Multimodal Interface for a virtual reality based computer aided design system. Proceedings of 1997 IEEE International Conference on Robotics and Automation, 2, 1329--1334.
[16]
Nicholson, Mark, and Paul Vickers. "Pen-Based gestures: An approach to reducing screen clutter in mobile computing." Mobile Human-Computer Interaction-MobileHCI 2004. Springer Berlin Heidelberg, 2004. 320--324.
[17]
S. Oviatt and P. Cohen. Multimodal Interfaces That Process What Comes Naturally. Communications of the ACM, 43(3):45:53, 2000.
[18]
ARToolKit, http://www.hitl.washington.edu/artoolkit
[19]
Kölsch, M. Vision Based Hand Gesture Interfaces for Wearable Computing and Virtual Environments. PhD thesis, University of California, Santa Barbara, 2004
[20]
Olwal, A., Benko, H., Feiner, S. SenseShapes: Using Statistical Geometry for Object Selection in a Multimodal Augmented Reality System. In Proceedings of The Second IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2003). Tokyo, Japan. October 7-10, 2003. p. 300--301.
[21]
Heidemann, G., Bax, I., and Bekel, H.: 2004, Multimodal Interaction in an Augmented Reality Scenario, ICMI '04: Proceedings of International Conference on Multimodal Interfaces, 53--60.
[22]
Dierker, Angelika, et al. "Mediated attention with multimodal augmented reality." Proceedings of the 2009 international conference on Multimodal interfaces. ACM, 2009.
[23]
Irawati, S., Green, S., Billinghurst, M., Duenser, A., & Ko, H. (2006). An evaluation of an augmented reality multimodal interface using speech and paddle gestures. In Advances in Artificial Reality and Tele-Existence (pp. 272--283). Springer Berlin Heidelberg.
[24]
Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., Tachibana, K.: Virtual Object Manipulation on a Table-Top AR Environment. In: Proceedings of the International Symposium on Augmented Reality, pp. 111--119 (2000)
[25]
T. Lee and T. Höllerer. Handy AR: markerless inspection of Augmented Reality objects using fingertip tracking. In Proceedings of the 11th IEEE International Symposium on Wearable Computers, ISWC '07, pages 83--90, Boston, MA, USA, October, 2007.
[26]
Krum, D. M., Omotesto, O., Ribarsky, W., Starner, T., and Hodges, L. F.: 2002, Speech and gesture control of a whole earth 3D visualization environment, Proceedings of Joint Eurographics-IEEE TCVG Symposium on visualization, 195--200.
[27]
Billinghurst, Mark. "Hands and speech in space: multimodal interaction with augmented reality interfaces." Proceedings of the 15th ACM on International conference on multimodal interaction. ACM, 2013.
[28]
Lee, Minkyung. "Multimodal Speech-Gesture Interaction with 3D Objects in Augmented Reality Environments." (2010).
[29]
Neumann A. Design and implementation of multi-modal AR-based interaction for cooperative planning tasks. Bielefeld: Bielefeld University; 2011.
[30]
Neumann, A., Schnier, C., Hermann, T., & Pitsch, K. (2013). Interaction Analysis and Joint Attention Tracking In Augmented Reality. Proceedings of the 15th ACM International Conference on Multimodal Interaction, 165--172.
[31]
Pitsch K, Neumann A, Schnier C, Hermann T. Augmented reality as a tool for linguistic research: Intercepting and manipulating multimodal interaction. In: Multimodal corpora: Beyond audio and video (IVA 2013 workshop).; 2013: 23--29.
[32]
Ismail, A. W., & Sunar, M. S. (2013). Intuitiveness 3D objects Interaction in Augmented Reality Using S-PI Algorithm. TELKOMNIKA Indonesian Journal of Electrical Engineering, 11(7), 3561--3567.
[33]
Bai, H., Gao, L., El-Sana, J., & Billinghurst, M. (2013, November). Free-hand interaction for handheld augmented reality using an RGB-depth camera. In SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive Applications (p. 22). ACM.
[34]
Piumsomboon, T., Clark, A., Billinghurst, M., & Cockburn, A. (2013). User-defined gestures for augmented reality. In Human-Computer Interaction--INTERACT 2013 (pp. 282--299). Springer Berlin Heidelberg
[35]
Lee, Minkyung, Mark Billinghurst, Woonhyuk Baek, Richard Green, and Woontack Woo. "A usability study of multimodal input in an augmented reality environment." Virtual Reality 17, no. 4 (2013): 293--305
[36]
Bai, Huidong, Lei Gao, Jihad El-Sana, and Mark Billinghurst. "Markerless 3D gesture-based interaction for handheld augmented reality interfaces." In Mixed and Augmented Reality (ISMAR), 2013 IEEE International Symposium on, pp. 1--6. IEEE, 2013
[37]
Alexander. G. Hauptmann: Speech and Gestures for Graphic Image Manipulation. In Proceedings of ACM Conference on Human Factors in Computing Systems (1989) 241--245
[38]
Fröhlich, C., Biermann, P., Latoschik, M. E., & Wachsmuth, I. (2009). Processing Iconic Gestures in a Multimodal Virtual Construction Environment. In: M. Dias, S. Gibet, M. M. Wanderley, R. Bastos (Eds.), Gesture-Based Human-Computer Interaction and Simulation. Springer LNAI 5085, 187--192.
[39]
Rautaray, Siddharth S., and AnupamAgrawal. "Vision based hand gesture recognition for human computer interaction: a survey." Artificial Intelligence Review (2012): 1--54.
[40]
Ren, Z., Yuan, J., Meng, J., & Zhang, Z. (2013). Robust part-based hand gesture recognition using kinect sensor. Multimedia, IEEE Transactions on, 15(5), 1110--1120.
[41]
Rozan, M. R., Sidik, M. K. M., Sunar, M. S., & Omar, A. H. (2015). KIHECT©©: Reliability of Hand-Eye Coordination among Rugby Players Using Consumer Depth Camera. In Computational Intelligence in Information Systems (pp. 201--210). Springer International Publishing.
[42]
M. Pantic and L.J.M. Rothkrantz, "Automatic analysis of facial expressions: The state of the art," IEEE Trans. on PAMI, 22(12):1424--1445, 2000.
[43]
Paris, Sylvain, Samuel W. Hasinoff, and Jan Kautz. "Local Laplacian filters: edge-aware image processing with a Laplacian pyramid." ACM Trans. Graph. 30.4 (2011): 68.
[44]
N. Sebe, I. Cohen, and T.S. Huang, Multimodal emotion recognition, Handbook of Pattern Recognition and Computer Vision, World Scientific, 2005.
[45]
G. Bailly, E. Vatikiotis, and P. Perrier, Issues in Visual and Audio-Visual Speech Processing, MIT Press, 2004.
[46]
R. Navarathna, P. Lucey, D. Dean, C. Fookes, and S. Sridharan, "Lip Detection for Audio-Visual Speech Recognition In-Car Environment," in Proc. 10th International Conference on Information Science, Signal Processing and their Applications, 2010, pp. 598--601.
[47]
P. Shen, S. Tamura, and S. Hayamizu, "Evaluation of Real-time Audio-Visual Speech Recognition," presented at International Conference on Audio-Visual Speech Processing, Japan, 2010.
[48]
Z. Handley, "Is Text-to-Speech Synthesis Ready for Use in Computer-Assisted Language Learning?," The International Journal of Speech Communication, Elsevier, vol. 51, no. 10, pp. 906--919, 2009.
[49]
M.Z. Rashad, H.M. El-Bakry, I.R. Isma'il, and N. Mastorakis, "An Overview of Text-to-Speech Synthesis Techniques," in Proc. of the 4th International Conference on Communications and Information Technology, USA, 2010, pp. 84--89.
[50]
S. Goose, S. Sudarsky, X. Zhang, and N. Navab, "Speech-Enabled Augmented Reality Supporting Mobile Industrial Maintenance," The IEEE Journal of Pervasive Computing, vol. 2, no. 1, pp. 65--70, 2003.
[51]
Furlan, Rod. "Build your own google glass {Resources_Hands On}." Spectrum, IEEE 50.1 (2013): 20--21
[52]
Nogueira, Pedro Alves, Luís Filipe Teöfilo, and Pedro Brandão Silva. "Multi-modal natural interaction in game design: a comparative analysis of player experience in a large scale role-playing game." Journal on Multimodal User Interfaces 9(2), pp. 105--119, 2015.

Cited By

View all
  • (2024)Towards Enhanced Context Awareness with Vision-based Multimodal InterfacesAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3686646(1-3)Online publication date: 21-Sep-2024
  • (2024)Personalized decision-making for agents in face-to-face interaction in virtual realityMultimedia Systems10.1007/s00530-024-01591-731:1Online publication date: 24-Dec-2024
  • (2024)Digital Twin in Extended Reality Applications for Industry 4.0Renewable Power for Sustainable Growth10.1007/978-981-99-6749-0_58(867-880)Online publication date: 3-Jan-2024
  • Show More Cited By

Index Terms

  1. Vision-Based Technique and Issues for Multimodal Interaction in Augmented Reality

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      VINCI '15: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction
      August 2015
      185 pages
      ISBN:9781450334822
      DOI:10.1145/2801040
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Augmented Reality
      2. Multimodal Interaction
      3. Vision Technique

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      VINCI '15

      Acceptance Rates

      VINCI '15 Paper Acceptance Rate 12 of 32 submissions, 38%;
      Overall Acceptance Rate 71 of 193 submissions, 37%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Towards Enhanced Context Awareness with Vision-based Multimodal InterfacesAdjunct Proceedings of the 26th International Conference on Mobile Human-Computer Interaction10.1145/3640471.3686646(1-3)Online publication date: 21-Sep-2024
      • (2024)Personalized decision-making for agents in face-to-face interaction in virtual realityMultimedia Systems10.1007/s00530-024-01591-731:1Online publication date: 24-Dec-2024
      • (2024)Digital Twin in Extended Reality Applications for Industry 4.0Renewable Power for Sustainable Growth10.1007/978-981-99-6749-0_58(867-880)Online publication date: 3-Jan-2024
      • (2024)Selection in Stride: Comparing Button- and Head-Based Augmented Reality Interaction During LocomotionHCI International 2024 Posters10.1007/978-3-031-61950-2_3(22-32)Online publication date: 7-Jun-2024
      • (2024)Object Manipulation Using Real Hand Gesture for Augmented Reality Interior DesignEncyclopedia of Computer Graphics and Games10.1007/978-3-031-23161-2_366(1267-1275)Online publication date: 5-Jan-2024
      • (2023)Enhancing Augmented Reality Dialogue Systems with Multi-Modal Referential Information2023 China Automation Congress (CAC)10.1109/CAC59555.2023.10450983(6838-6843)Online publication date: 17-Nov-2023
      • (2023)An empirical practice of design and evaluation of freehand interaction gestures in virtual realityMultimedia Tools and Applications10.1007/s11042-023-17640-883:17(52481-52507)Online publication date: 8-Nov-2023
      • (2023)Autonomous Agent Using AI Q-Learning in Augmented Reality Ludo Board GameAdvanced Engineering, Technology and Applications10.1007/978-3-031-50920-9_24(311-323)Online publication date: 23-Dec-2023
      • (2020)Exploring Bi-modal Feedback in Augmented RealityProceedings of the 11th Indian Conference on Human-Computer Interaction10.1145/3429290.3429296(55-61)Online publication date: 5-Nov-2020
      • (2020)Hybrid Method Using EDMS & Gabor for Shape and Texture2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)10.1109/HORA49412.2020.9152829(1-6)Online publication date: Jun-2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media