Interacting with Cyber-Physical Systems - Advancements in Gesture Control and Eye-based Human-Computer Interaction
Open access
Author
Date
2020Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
With an ever-increasing number of smart devices in the users' surrounding that create ubiquitous opportunities for interaction, enabling natural and intuitive interaction with digital systems is a key challenge in Human-Computer Interaction (HCI). Computing technology is steadily getting smaller, cheaper, and more interconnected, including sensing and communication technology, currently witnessed by mobile and wearable devices of various forms. Smartwatches and fitness trackers are already ubiquitous, while smart glasses or augmented reality headsets such as the Microsoft HoloLens are closer to real products than to lab prototypes. Given such a changing landscape of devices in our environment, current input modalities may limit the effective utilisation of the available information flow.
When people interact with each other, they use multiple modalities including voice, body posture, hand gestures, facial expressions, or eye gaze. Touch interactions coupled with apps on our smartphones are today’s universal interaction devices and have recently been extended with voice control. However, such input modalities are no longer sufficient for complex applications like augmented or virtual reality. In this dissertation, we broaden and facilitate the interface between technology and people and contribute to the following research areas:
i) Collocated Multi-user Gestural Interactions. Natural gestures are a promising input modality in HCI because they enrich the way we interact with complex systems. However, few works have explored this input technique when multiple users are physically collocated. We explore multi-user interaction through gestures between people who are physically close to one another. The proximity of users is detected through inaudible acoustic ranging while in-air hand gestures are recognised by leveraging inertial sensors. To ensure scalability, the underlying communication protocol between users and devices is handled over Bluetooth. Through extensive evaluations, we demonstrate not only the robustness of our approach but also the feasibility of using off-the-shelf mobile and wearable devices. We first demonstrate our concept with HandshakAR, a wearable system that facilitates the exchange of digital information between two users. We then extend this concept to multiple users and show three different practical application scenarios.
ii) Eye-based Human-Computer Interaction. We investigate eye gaze as a hands-free high-bandwidth input modality and explore how users can augment their interaction capabilities with the gaze direction. We propose ubiGaze, a novel wearable system that enables attaching virtual content to any real-world object through gaze gestures, which are detected using a wearable eye tracker. While gaze gestures are less sensitive to accuracy problems or calibration shifts, many practical applications require calibrated eye trackers. However, existing calibration techniques are tedious and rely on special markers. We propose finger calibration, a novel method for head-mounted eye trackers in which users only have to point with their fingers at locations in the scene. This eliminates the need for additional assistance or specialised markers. Lastly, while eye trackers have become more accessible, the need for special-purpose equipment hinders many large-scale deployments. Therefore, we explore gaze-based interaction using a single off-the-shelf camera. We propose a method to detect pursuit eye movements, which have become widely popular because they enable spontaneous interaction. Our method combines appearance-based gaze estimation with optical flow in the eye region to jointly analyse eye movement dynamics in a single pipeline. Our results not only show the feasibility of our approach but point the way towards new methods that only require standard cameras, which are readily available in an ever increasing number of devices.
iii) Quantification of Visual Attention in Mobile HCI. Eye contact is a key measure of overt visual attention in mobile HCI as it enables understanding when, how often, or for how long users look at their devices. However, robustly detecting shifts of attention during everyday mobile interactions is challenging. Encouraged by recent advances in automatic eye contact detection, we provide a fundamental investigation into the feasibility of quantifying mobile visual attention. We identify core challenges and sources of errors associated with sensing visual attention in the wild, including the impact of face and eye visibility, the importance of robust head pose estimation, and the need for accurate gaze estimation. Guided by our analysis, we present a method to accurately and robustly detect eye contact in images captured with the front-facing camera of common mobile devices. Based on our evaluations, we show how eye contact is the fundamental building block for calculating higher-level attention metrics and, as such, enables studying visual attention in the wild. Finally, we present the Everyday Mobile Visual Attention (EMVA) dataset and quantitative evaluations of visual attention of mobile device users in-situ, i.e. while they use their devices during everyday routine. Using our proposed method for eye contact detection, we quantify the highly dynamic nature of everyday visual attention allocation across users, mobile applications, and usage contexts. We then discuss our findings which highlight the potential and inform the design of future mobile attentive user interfaces.
Along with this dissertation, we deliver open-source implementations of some of our contributions. The fingertip calibration method is available as a plugin for the open-source Pupil platform. Additionally, the EMVA dataset, which contains around 472 hours of video snippets from 32 participants recorded over more than two weeks in real life using the front-facing camera as well as associated usage logs, interaction events, and sensor data, is publicly available to support future work in this area of research. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000492658Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Human-computer interaction (HCI); Ubiquitous Computing; Eye Tracking; Gesture Interaction; Augmented Reality (AR)Organisational unit
03528 - Mattern, Friedemann (emeritus) / Mattern, Friedemann (emeritus)
More
Show all metadata
ETH Bibliography
yes
Altmetrics