Special Section on Touching the 3rd Dimension
Adapting user interfaces for gestural interaction with the flexible action and articulated skeleton toolkit

https://doi.org/10.1016/j.cag.2012.11.004Get rights and content

Abstract

We present the Flexible Action and Articulated Skeleton Toolkit (FAAST), a middleware software framework for integrating full-body interaction with virtual environments, video games, and other user interfaces. This toolkit provides a complete end-to-end solution that includes a graphical user interface for custom gesture creation, sensor configuration, skeletal tracking, action recognition, and a variety of output mechanisms to control third party applications, allowing virtually any PC application to be repurposed for gestural control even if it does not explicit support input from motion sensors. To facilitate intuitive and transparent gesture design, we define a syntax for representing human gestures using rule sets that correspond to the basic spatial and temporal components of an action. These individual rules form primitives that, although conceptually simple on their own, can be combined both simultaneously and in sequence to form sophisticated gestural interactions. In addition to presenting the system architecture and our approach for representing and designing gestural interactions, we also describe two case studies that evaluated the use of FAAST for controlling first-person video games and improving the accessibility of computing interfaces for individuals with motor impairments. Thus, this work represents an important step toward making gestural interaction more accessible for practitioners, researchers, and hobbyists alike.

Highlights

► We present a middleware software framework for designing gestural interactions. ► The toolkit allows third party software to be repurposed for full-body interaction. ► We define a syntax for representing human motion with spatio-temporal rule sets. ► We present a case study demonstrating gestural control of first-person video games. ► We present a clinical case study of user interfaces for the motor impaired.

Introduction

Recent advances in low-cost depth sensing technology have led to a proliferation of consumer electronics devices that can sense the user's body motion. The release of the Microsoft Kinect in late 2010 has sparked the rapid formation of a large and active community that has explored a myriad of uses ranging from informal hobbyist “hacks” to scientific research projects and commercial applications. However, despite the widespread accessibility of full-body motion sensing devices, designing intuitive and powerful gestural interactions remains a challenge for developers. In general, though the Kinect holds the record for the fastest selling consumer electronics device in history, the sales of many commercial Kinect for Xbox 360 game titles have been poor, which has been partially attributed to the lack of well-designed games that integrate body motion seamlessly into the experience [1], [2]. Indeed, research has shown that performing physical arm movements and gestures can have a profound impact on the user's attitudinal and emotional responses to visual stimuli [3], [4], [5]. These observations point to the need for both a theory of “gesture design” as well as the tools to enable the creation and customization of gestural interactions for 3D user interfaces and interactive media.

An important motivator for our work is the application of video game technology toward advances in the areas of rehabilitation [6] and health [7]. While the clinical value of leveraging motion gaming technology has received increased recognition in recent years, these applications pose several notable challenges for designers. Unlike commercial games, body-based control in a clinical setting is not “one-size-fits-all,” and must be customizable based on individual patient medical needs, range of motion, and motivation level. For example, a client with impaired arm movement would require a therapy game that encourages motion just outside the boundary of comfort, but not so far that achieving the required body pose becomes overly frustrating or impossible. Thus, the gestural interactions need to be designed on a per-client basis by the clinician, who often may not possess intimate technical knowledge and programming skills. Furthermore, these interactions need to be easily and immediately adjustable as the patient improves or encounters problems.

To facilitate the integration of full-body control with third party applications and games, we developed a middleware software framework known as the Flexible Action and Articulated Skeleton Toolkit (FAAST). The toolkit enables the adaptation of existing interfaces for gestural interaction even though they never intended to support input from motion sensors. To achieve the goal of intuitive and transparent gesture design, we defined a syntax for representing complex gestural interactions using rule sets that correspond to the basic spatial and temporal components of an action. These “action primitives” are represented as plain English expressions so that their meaning is immediately discernible for both technical and non-technical users, and are analogous to interchangeable parts on an assembly line—generic descriptors that can be reused to replicate similar gestural interactions in two completely different end-user applications. FAAST provides a complete end-to-end framework that includes a graphical user interface for custom gesture creation, sensor configuration, skeletal tracking, action recognition, and a variety of output mechanisms to control third party applications such as 3D user interfaces and video games. FAAST can either be used to support development of original motion-based user interfaces from scratch or to repurpose existing applications by mapping body motions to keyboard and mouse events, and thus represents an important step toward making gestural interaction design and development more accessible for practitioners, researchers, and hobbyists alike. In this paper, we present the FAAST system architecture (Section 3), our approach for representing and designing gestural interactions (Section 4), supported output modalities for manipulating arbitrary user interfaces (Section 5), and two case studies in which FAAST was evaluated within a specific application domain: controlling first-person video games for entertainment and increasing user interface accessibility for individuals with motor impairments (Section 6).

Section snippets

Gesture recognition

Computational analysis of human motion typically requires solving three non-trivial problems: detection, tracking, and behavior understanding [8]. In this paper, the software libraries from OpenNI and Microsoft Research provide both user detection and skeletal tracking, so FAAST subsequently focuses on recognizing the action being performed by the tracked user and generating an appropriate output. The quantity of literature focusing on action and gesture recognition from the computer vision and

System overview

FAAST is designed as middleware between the depth-sensing camera skeleton tracking libraries and end-user applications. Currently supported hardware devices include the Kinect sensor using the Microsoft Kinect for Windows SDK and any OpenNI-compliant sensor using the NITE skeleton tracking library from PrimeSense. Fig. 1 illustrates the toolkit's architecture. To make the toolkit as device agnostic as possible, communication with each skeleton tracker is split into separate modules that are

Representing and designing gestural interactions

In this section, we describe our representation scheme for the individual components that compose gestural interactions, followed by FAAST's capabilities for designing complex gestures using this core mechanic.

Manipulating arbitrary user interfaces

After designing the input criteria for gesture activation, it is necessary to define the output that FAAST should generate when the user performs a gesture. Similar to the representation scheme for gestural input, FAAST provides a robust set of output events, which can be combined simultaneously and sequentially. This approach allows the user to design sophisticated macros that can manipulate arbitrary user interfaces in a variety of ways, ranging from simple events to complex behaviors. All

Case studies

Starting as a very simple toolkit, FAAST evolved gradually based on experimentation in a variety of contexts and user feedback from the wider community. In this section, we detail the observations, conclusions, and limitations of using FAAST to adapt user interfaces for gestural interaction in two specific areas: (1) control of first-person video games for entertainment and (2) improving user interface accessibility for individuals with motor impairments.

Conclusion

In this paper, we describe a core mechanic for designing and customizing gestural interactions and present an integrated toolkit that enables the adaptation of existing user interfaces for body-based control. By providing gesture design tools that are transparent and easily discernible to non-technical users, along with the capability of repurposing third party applications, FAAST is particularly useful not only for entertainment purposes, but also for researchers and clinicians working in

References (27)

  • P. Turaga et al.

    Machine recognition of human activitiesa survey

    IEEE Trans Circuits Syst Video Technol

    (2008)
  • S. Mitra et al.

    Gesture recognitiona survey

    IEEE Trans Syst Man Cybern Part C

    (2007)
  • Yamato J, Ohya J, Ishii K. Recognizing human action in time-sequential images using hidden Markov model. In: IEEE...
  • Cited by (0)

    View full text