A Framework for Optimal In-Air Gesture Recognition in Collaborative Environments
Open access
Author
Date
2020Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Hand gestures play an important role in communication between humans, and increasingly in the inter- action between humans and computers, where users can interact with a computer system to manipulate digital content, provide input, and give commands to a digital system.
Thanks to advances in computer vision and camera technology, in-air hand gestures can be tracked without the need for instrumenting the hands. This allows for a wide variety of interesting and powerful use cases.
First, hand gestures that happen naturally during human to human communication can be tracked and interpreted. This has been extensively used and researched, for example for communicating deictic ges- tures to remote participants. Most such solutions rely on communicating such gestures using extensive visual feedback, for example by showing remote participant’s hand, or even his or her full body, to their remote partners. While useful for many scenarios, such heavy reliance on visual feedback limits the us- ability and accessibility of such solutions, for example for blind and visually impaired (BVI) participants, or for scenarios where screen real state is limited.
Even when used for human-computer interaction, in-air interfaces rely on visual feedback. Because in-air gestures are ephemeral, and there is no haptic feedback, it is difficult for a new user to perform them properly. Thus, a typical approach to address this problem is by drawing the hand trajectory on the display. This causes distraction, especially if multiple users who share a single display simultaneously interact with the system. Another approach is to have a fast gesture classifier, which allows giving quick feedback to the user, even shortly before finishing the gesture, provided that it is sufficiently different. Due to the way that most of the current classifiers are designed, these feedbacks are mainly limited to reporting whether the gesture could be classified, and if so, to which class did it belong. Such feedback has limited usefulness, as the only thing the user can do after receiving such feedback is to repeat the gesture if it was failed. But why it failed and how their performance can be improved remains unknown.
This thesis proposes methods for utilizing in-air gestures for enhancing digital collaboration without heavy reliance on visual feedback. This is especially useful for collaborative scenarios where some participants have limited access to the visual channel, most notably BVI participants and remote participants, and for scenarios where the display in the collaborative environment is crowded with content, to showing large visual cues is not desirable.
Specifically, this thesis addresses two main challenges:
* How to communicate in-air gestures, specifically deictic gestures, to blind and visually impaired participants, as well as remote participants, while minimizing (or eliminating) the need for visual feedback. For BVI participants, this is achieved by tracking deictic gestures of sighted participants, deciding whether they are performing a deictic gesture, and then communicating the target of the gesture to the BVI participants using a Braille display or a screen reader. For remote participants, this is achieved by showing the target of the pointing gesture using a small highlighter on the screen, as well as by allowing the remote participant to control the opacity of the visual feedback if a more complicated visual feedback is necessary.
* How to use in-air gestures in collaborative scenarios for human-computer interaction, while minimizing the use of visual feedback. This is achieved by proposing a new algorithm for gesture recognition that can provide fast, useful, and non-distracting feedback for in-air gestures. The algorithm always keeps the user informed about the state of the gesture recognizer, and informs the user about what they need to do next to get closer to finishing a gesture by giving them non- distracting visual cues. Moreover, the proposed algorithm is independent of the speed, scale, or orientation of the gestures. This allows the users to perform gestures from different distances and angles relative to the camera, with a speed they are comfortable with, which gives them ample opportunity to learn how to perform gestures. Additionally, a new algorithm for creating large gesture sets for in-air interactions using a smaller set of gestures is introduced, thus reducing the need for learning new gestures by the users. The resulting gestures are also guaranteed to be easily detectable by the proposed gesture recognizer.
Finally, because studying these problems requires a setup capable of uninstrumented hand tracking, this thesis proposes cost-effective hardware setups that allow for setting up collaborative environments with horizontal or vertical displays that are capable of tracking in-air gestures. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000449030Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Gesture recognition; Collaborative environments; Remote collaboration; In-air gesturesOrganisational unit
03641 - Wegener, Konrad (emeritus) / Wegener, Konrad (emeritus)
08844 - Kunz, Andreas (Tit.-Prof.) / Kunz, Andreas (Tit.-Prof.)
03641 - Wegener, Konrad (emeritus) / Wegener, Konrad (emeritus)
08844 - Kunz, Andreas (Tit.-Prof.) / Kunz, Andreas (Tit.-Prof.)
Notes
This project was supported by SNSF project number CR21I2_138601 and CTI project number 16251.2 PFES-ES.More
Show all metadata
ETH Bibliography
yes
Altmetrics