research-article

Open access

Iteratively Designing Gesture Vocabularies: A Survey and Analysis of Best Practices in the HCI Literature

Authors:

Daniel WigdorAuthors Info & Claims

ACM Transactions on Computer-Human Interaction (TOCHI), Volume 29, Issue 4

Article No.: 37, Pages 1 - 54

https://doi.org/10.1145/3503537

Published: 05 May 2022 Publication History

All formats PDF

Abstract

Gestural interaction has evolved from a set of novel interaction techniques developed in research labs, to a dominant interaction modality used by millions of users everyday. Despite its widespread adoption, the design of appropriate gesture vocabularies remains a challenging task for developers and designers. Existing research has largely used Expert-Led, User-Led, or Computationally-Based methodologies to design gesture vocabularies. These methodologies leverage the expertise, experience, and capabilities of experts, users, and systems to fulfill different requirements. In practice, however, none of these methodologies provide designers with a complete, multi-faceted perspective of the many factors that influence the design of gesture vocabularies, largely because a singular set of factors has yet to be established. Additionally, these methodologies do not identify or emphasize the subset of factors that are crucial to consider when designing for a given use case. Therefore, this work reports on the findings from an exhaustive literature review that identified 13 factors crucial to gesture vocabulary design and examines the evaluation methods and interaction techniques commonly associated with each factor. The identified factors also enable a holistic examination of existing gesture design methodologies from a factor-oriented viewpoint and highlighting the strengths and weaknesses of each methodology. This work closes with proposals of future research directions of developing an iterative user-centered and factor-centric gesture design approach as well as establishing an evolving ecosystem of factors that are crucial to gesture design.

1 Introduction

Within Human–Computer Interaction (HCI), gestural input most often refers to the user’s finger, hand, arm, or other body part movements to express desired actions to computing systems. From early work exploring gestural input with a light pen [309] to research that proposed manipulating virtual objects using single finger gestures on a force and position sensitive screen [209] and more recent work that used mid-air gestures to enhance the creation of dynamic phenomena animations in virtual reality [25], the design, facilitation, and detection of gestural input has had a long history within HCI.

The methodologies used to design gesture vocabularies may be broadly categorized into three groups. With Expert-Led methodologies, seasoned gesture designers utilize their expertise, and past experiences with gestural interaction to create mappings between physical actions and system responses [55, 98, 120]. Expert designers can quickly create gesture vocabularies that are simple, socially acceptable, and ergonomic using the knowledge they have gained from watching users and interacting with gesture-based systems themselves. Although this is one of the most common methods used, the resulting gesture vocabularies have been criticized for lacking discoverability or intuitiveness [362]. User-Led methodologies, on the other hand, employ participatory elicitation techniques to generate gesture vocabularies based on user behavior or feedback about which gestures map most naturally to a given command [229, 258, 362]. The elicited gestures that result when one draws on the “wisdom of the crowd” may be intuitive and discoverable by a target user, but may lack other characteristics, such as not being differentiable during recognition, not fully leveraging the input capabilities of a given input device, or not being ergonomic during repeated, long-term use. The last group of methodologies, i.e., Computationally-Based methodologies, utilize algorithms to identify gestures that are easy for a system or interface to recognize and trivial for a designer to specify [16, 26, 106, 199]. While algorithms can generate a gesture vocabulary that is easy to recognize, it is difficult to algorithmically model other factors such as the learnability, transferability, or social acceptability of the generated gestures.

Although these methodologies all seek to leverage the unique expertise, backgrounds, and capabilities of different designers (i.e., experts, users, and computational models), the factors that a designer may wish to optimize often vary depending on the desired functionality, hardware capabilities, target users, or social context of an application. As a result, it is unlikely that the gestures designed using one methodology will fulfill all the requirements a designer may have. Furthermore, due to the lack of understanding, definition, and categorization of the relevant factors that are important to gesture design, many gesture vocabularies created using these methodologies contain gestures with socially unacceptable semiotics, gestures that are not safe for long-term use, or gestures that are not generalizable to other contexts or users [60, 220].

The goal of the present work is thus to provide an as-complete-as-possible listing of factors found to be important in the design of gestures, and to provide an index to the research that provides the field’s best-known methods for optimizing these factors. By identifying these factors, researchers and designers can develop a multi-faceted understanding of the many factors that are crucial to gesture design and understand the strengths and weaknesses of current gesture design methodologies along each of the identified factors.

To identify the factors crucial to the design of gesture vocabularies, over 1,600 gestural interaction research papers were surveyed and 288 papers within this body of literature that specifically focused on gesture design were synthesized further. This activity resulted in the identification, definition, and analysis of thirteen factors critical to gesture design including Situational factors (i.e., Context, Modality, Social Acceptability), Cognitive factors (i.e., Discoverability, Intuitiveness, Learnability, and Transferability), Physical factors (i.e., Efficiency, Complexity, Ergonomics, and Occlusion), and System factors (i.e., Feedback and Recognition). The holistic importance of the identified factors was explored by examining the three categories of gesture design methodologies from a factor-oriented viewpoint and highlighting the strengths and weaknesses of each methodology. Informed by these findings, we delineate two future research directions, including a potential factor-centric gesture design approach wherein gesture designers identify, prioritize, design, evaluate, and refine the set of factors most related to their applications as well as establishing and maintaining an evolving ecosystem of gesture design factors. We believe these future directions can further the practices and the understanding of gesture design in both research community and industry.

2 Designing Gestures Today

Many methodologies have been proposed to assist in the creation of gesture vocabularies or gesture sets. Some methodologies rely exclusively on expert knowledge and experiences [6] or gather feedback and performance measures from users [6, 229] for gesture creation, whereas others emphasize how easy gestures will be to recognize [229] or use mathematical models and simulations [300]. Regardless of the methodology used, the general recommendation for the design of gesture vocabularies is that a resultant gesture vocabulary should contain interactions that are comfortable, efficient, easy to learn, and execute a user’s desired actions [228].

2.1 Expert-Led Methodologies

The most traditional method that has been used to design a gesture vocabulary employs a top-down approach, wherein designers or researchers rely on their knowledge and experience with a target input device, end user population, or intended user actions to develop a gesture vocabulary. This process initially involves the identification and collection of a set of requirements for an application. Techniques such as developing scenarios [308] or use cases [131], performing a task analysis [81], wireframing [126], or storyboarding [313] allow one to understand end users’ abilities, skills, environments, devices, and tasks. Using these requirements, a designer can then create a gesture vocabulary using gesture vocabularies that have been published by others [365], best practices and recommendations from the literature [376], commercial gesture vocabulary examples (e.g., Graffiti), metaphors or observations of analog interactions [119], or interaction designs that are based on a designer’s experiences [354]. Once the vocabulary has been developed, a user study is often performed to evaluate the gestures using a given population, environment, device, or task. Factors such as learnability, discoverability, ergonomics, and social acceptability, among others, are typically of interest [194, 202, 289, 352, 364]. For example, Wickeroth et al. proposed a Gesture Usability Scale, which extended the System Usability Scale [53] to measure the usability of gestures based on their learnability, ergonomics, complexity, and recognition [352].

A variant of this traditional methodology proposed by Sturman and Zeltzer, utilized a five-stage method to assist designers in developing whole hand gestures [306]. With this method, a designer first determined if whole hand input was appropriate for their application by considering how natural, adaptable, and dexterous application movements needed to be. If deemed appropriate, the designer then consulted Sturman and Zeltzer’s taxonomy of whole hand input to determine which types of gestures would be best suited to their application. Next, they decomposed the tasks to be performed into primitive actions and used these actions to decide which individual gestures would be best to use. These decisions could be made using the existing literature, the previous experience of a designer, or observations of how the hand is normally used in the intended, or similar, tasks. Next, the designer decided which device(s) would be used by the target population and evaluated the gestural vocabulary using these devices. After the evaluation, Sturman and Zeltzer noted that one may need to iterate through the five-stage process many times to ensure that the whole vocabulary met the identified requirements.

Barclay et al. systems-based view of gesture design proposed that designers work through a quantitative model that progressively measured the quality and functionality of individual gestures and the system as a whole [34]. First, the quality of a gesture was computed by combining various weighted measurements of factors including fatigue, naturalness, duration, and accuracy. The results of each measure were then averaged to attain the overall quality of a given gesture. Then, it was recommended that a designer should derive a state transition model of each desired task within the target interface. Traversing through a state transition model, a designer should then identify the pathway that is most typical, using videotaped data of users working with the interface. This view then recommended that a designer compute a functional quality score using this pathway. Lastly, a designer should compute the weighted average of all tasks within the system as an indicator of the overall quality of the system. The resulting value could then be used to determine the impact of changing individual gestures within the system or task-gesture mappings.

Rossini proposed a parameter-based model inspired by communicative gestures [273]. After creating a gesture vocabulary (using an unspecified process), Rossini recommended focusing on the size of a gesture to determine its appropriateness for the intended population and context. Next, Rossini recommended deconstructing the gesture into its timing phases and attending to the position of the hands throughout execution. Once this was completed, Rossini recommended identifying the points of articulation of the gesture and the body space, or locus, involved in the gesture. Using such information, one should be able to determine the morphology of the gesture and use the morphology to evaluate and compare gestures using an end user population.

Although varying in the extent to which they abstract information and integrate the user within the feedback and evaluation phases, expert-led methodologies can enable designers to harness their knowledge and experience to quickly design and develop a gesture vocabulary. Dangers relating to unnatural hand or limb postures and repetitive movements can also be avoided as experienced researchers or designers should know to monitor for such characteristics when designing their vocabulary. As the designer is aware of the entire battery of tasks or functionality within the interface, they can also easily check for similarity issues within a gesture vocabulary to ensure that each gesture is distinct. A challenge with such methodologies, however, is that the evaluation methodologies that are used, or the extent to which users are part of the process, can hamper opportunities to understand the learnability or discoverability of gesture vocabularies. As the designer is often not the end user of the gesture vocabulary, it can also be difficult to determine how intuitive the derived gesture and functionality mappings are.

2.2 User-Led Methodologies

Methodologies that employ users as part of the design process have become very popular as of late. Wobbrock et al. presented a technique to create gesture languages that focused on how easy they would be to discover [362]. Inspired by participatory design, this elicitation methodology for user-defined gestures (UDGs) does not rely on the expertise of a designer. Instead, an experimenter showed everyday users the state of a user interface before and after potential input and then asked them to describe or perform the gesture that they would expect to yield the change in state. Rather than utilizing an interface, some experimenters described an outcome and asked users to mime or describe the gesture that would correspond to that outcome [362]. Once the experimenters collected responses from several users, they computed an agreement score to determine the “degree of consensus” between users with respect to the elicited gestures [200, 361, 362]. Gestures exhibiting higher agreeability were believed to be most appropriate for one’s application as they were identified by the most users as being an appropriate gesture for a given action.

The main advantage of such elicitation techniques is that the resulting gesture vocabularies are generated directly from user behavior and beliefs about the gestures that would most naturally map to a given action. By leveraging the “wisdom of the crowd”, it is argued that the elicited gestures would eventually converge toward the gesture that would most likely be initially guessed or tried by an everyday user. Furthermore, because participants devise and perform each gesture without guidance, it is assumed that this methodology mimics users’ first attempts at a gesture in the wild. The technique has also been reported to help achieve a “preferred” gesture vocabulary, since participants may consider their preferences, in addition to their first-blush reactions, before reporting their chosen gesture to experimenters [362].

This elicitation methodology has been widely adopted since the publication of Wobbrock et al. article at 2009, being used to create gestures for freehand TV control [212, 323, 383] and smart glasses [317], to unmanned aerial vehicles [249] and deformable displays [312]. Perhaps this widespread adoption is due to the academic and industry support for participatory and user-centered design, where the integration of users into a design process can help identify errors, ineffective solutions, suboptimal strategies, or unique requirements and challenges that could impact an interface before its final development or deployment [70]. We refer the audience to systematic review conducted by Villarreal Narvaez et al. for more details, which surveyed 216 gesture elicitation studies [334].

Preceding the work of Wobbrock et al., Nielsen et al. proposed the “human-based approach” to create gesture vocabularies [229]. With this approach, a researcher first identified the functions or features of an interface that gestures would be mapped to. The researcher then walked participants through a variety of scenarios and asked them to describe how they would invoke specific functionality or had participants use the interface within a Wizard-of-Oz paradigm. As the scenarios were videotaped, the footage could be reviewed to identify how consistently and frequently different gestures were used. To determine the gestures that constituted the vocabulary, a researcher attended to the frequency and duration of gesture performances, the internal forces a gesture posture created, as well as the effects that the movements could have on the hand. Once the gesture vocabulary was composed, this approach recommended evaluating the gestures for guessability, memorability, and ergonomics. Earlier iterations of Nielsen et al. human-based approach also recommended the use of guided drawing tests to match gestures to desired functionality and functionality to specific gestures [229].

Pyryeskin et al. utilized Wobbrock et al. approach within their work to design in-air tabletop gestures [258]. However, they noted that such a methodology does not enable one to evaluate users’ performance of gestures or the ability for gestures to be recognized with limited hardware. They thus conducted a follow up study that used a target selection task with elicited gestures as input. Metrics measured during this experiment included gesture speed, gesture accuracy, and participant preferences. This two-study methodology enabled user input to be solicited and later refined per the requirements of the hardware that was available.

More recently, Wu et al. proposed a variant of Wobbrock et al. approach that integrated elicitation within a four-stage process [372]. In Stage 1, designers were encouraged to utilize semi-structured interviews to conduct a requirements analysis of an interface. Stage 2 utilized an elicitation study protocol that employed think-aloud methods, in addition to Wizard-of-Oz techniques (if appropriate), to determine two distinct sets of gestures for a given interface. Stage 3 used these gesture sets for a benchmark test that consisted of a degree of matching activity, memorability test, comfort test, and post-test questionnaire. The result of this third stage was a refined gesture vocabulary. Finally, Stage 4 made use of a personalized study design that employed the desired hardware and software to ensure that the gestures met the needs identified in the requirements analysis, the recognition algorithms were correct, and the gestures were evaluated for effectiveness, efficiency, and user satisfaction. The authors noted that at any point during the process, one may need to go back to a previous stage to make refinements.

Löcken et al. also proposed a four-stage process that integrated an elicitation methodology [191]. Unlike Wu et al. model, gesture designers first identified the usage context and functions of a system, but involved end users and other stakeholders in the process. Once the requirements were identified, a participatory elicitation study was conducted to iterate on the identified functionality and collect gestures that could constitute the final gesture vocabulary. Designers then developed multiple gesture vocabularies from the collected gestures and used them in the last step to conduct a comparative evaluation between the proposed gesture vocabularies. This evaluation was thought to hone in on an optimal set of gestures, which may be a combination of gestures from different vocabularies. The authors recommended that this evaluation should include participants and elicit qualitative feedback to determine the optimal gesture vocabulary.

User-led methodologies enable researchers and designers to obtain first-hand feedback from their target population and understand potential discoverability and comprehension issues very quickly. As each of the elicitation variants including Nielsen et al. human-based approach emphasized the need for appropriate mappings between gestures and functionality, it is unsurprising that so many have flocked to this method [334]. The major difference between these variants, however, is that they differ in when users become engaged in the process (i.e., Löcken et al. invite them as part of the requirements analysis whereas Wu et al., Nielsen et al., and Wobbrock et al. begin engagement after the requirements analysis is completed). These variants also differ in the outcomes of the engagement (i.e., requirements, multiple gesture languages, empirical measurements, and so on) and in the overarching goals that are of interest (e.g., discoverability, memorability, comfort, and so on).

Although such elicitation methodologies have become popular, the creation of gestures by uninformed or novice users does not come without challenges. Placing the onus on end users to create gestures which may be used by a wide population nullifies the expertise of designers, who often take a holistic view of an user interface and consider many facets that are important to gestural input (e.g., the impact of repeated usage [229], interference between gestures within a vocabulary [381], the learnability of a vocabulary [186, 384], and the reliable detection and differentiation of gestures [381]). Users frequently design gestures that are appropriations of existing gestures or are limited by the technological capabilities of their past experiences, resulting in them neglecting other important factors such as the learnability, legacy bias, social acceptability, recognition, or the system feedback that would be provided [214].

2.3 Computationally-Based Methodologies

Rather than employing expert knowledge or users as part of the gesture design process, a growing number of methodologies have focused on developing algorithms or models that result in gestures that can be easily specified by a designer and are trivial for a system or interface to recognize.

2.3.1 Design-By-Demonstration.

Early work on gesture interfaces focused on the design of tools to improve the specification, mapping, and recognition of stroke-based input using templates or demonstrations of gestures that were performed by gesture designers. Most often, pattern matching or learning algorithms were then employed to differentiate gestures based on their similarity to the provided templates or recognize gesture input.

For example, Rubine’s GRANDMA toolkit [278] enabled a designer to record multiple instances of a target gesture and associate the gesture to a user interface element via gesture handlers. Once the gesture handlers were specified, the designer could evaluate the recognition accuracy of the gesture using various recognition algorithms. Similarly, with the Gestural Interface Designer (GID) [16], a designer could place user interface elements within an interface and map semantic actions (i.e., gestures) to each element. It is unclear how one would determine which gestures to use, but technical descriptions of how gestures would be recognized and represented within the system were provided. Similarly, systems such as Agate [172], the Gesture Design Tool [195], and iGesture [292] enabled designers to provide templates and examples of gestures to create gesture vocabularies, as well as evaluate the performance of different classifiers via descriptive statistics, recognition results, and classification matrices. To facilitate this process, GHoST, proposed by Vatavu et al., utilized visualization techniques to aid in the analysis of gestures, by visualizing recognition errors and characteristics of gesture articulation patterns [327].

Several computer vision-based tools have enabled developers and designers to identify, track, and recognize hand, limb, or full-body gestures. Ashbrook and Starner’s MAGIC system [26], for example, enabled designers to iteratively design motion gestures. Using MAGIC, a designer first recorded themselves executing a number of example gestures with multiple repetitions. They then browsed through the examples to view recognition results pertaining to distinguishability and consistency. A designer could then examine gesture recognition performance by performing the gestures that should or should not be recognized. In MAGIC 2.0 [160], Kohlsdorf et al. extended MAGIC to better account for the false positive motions that could occur in everyday life. By integrating indexable Symbolic Aggregate approximations, a designer could search through a database of everyday motions to determine if a given gesture could be accidentally triggered. Baytaş et al. Hotspotizer [39] supported the declaration of locations where gesture input and recognition should be more precise and areas where it should be less well-defined to allow for more fine and coarse-grained recognition and tracking. Other tools such as Paper-Mache [158], crayons [88], and Eyepatch [207] also enabled a user to demonstrate a motion and test different recognition classifiers iteratively, albeit within a specialized interface that did not enable new recognition algorithms to be authored easily. The DejaVu system [148], however, was built within an IDE to enable gesture developers to visually and continuously monitor real-time gesture input, edit recognition algorithms, and inspect recognition results.

Although pattern matching with templates or using examples of gestures can be viable methods to quickly evaluate recognition algorithms, projects such as Gesture Coder [198] and Gesture Studio [199] supported the declaration of gestural input via high-level language-based constructs. Gesture Coder enabled multi-touch gestures to be programmed by demonstration within the Eclipse IDE. Its successor, Gesture Studio, further enabled a designer to record a demonstration of a desired gesture, revise and edit the demonstration, create high-level behaviors that were composed of multiple gestures or movement clips, and attach actions to the behaviors and gestures.

The aforementioned tools support designers in quickly capturing gestural data and prototyping recognition algorithms, however, these tools only evaluate recognition performance. Quill [196], however, enabled designers and developers to view accuracy information as well as empirical data and visualizations about how similar a gesture was to existing gestures in a vocabulary. This software also provided active feedback, i.e., hints and advice, that a designer could use to eliminate similarities between gestures. Such feedback could be an important indicator when determining how difficult gestures would be for a user to recall.

Other projects have focused on easing the development of gestures for tangible, mobile devices. The aCAPpella system [106], for example, enabled everyday users to demonstrate context-aware behaviors for recognition, extracted and collected relevant sensor information, detected events that occurred within the sensor stream, and invited the user to annotate events corresponding to the behaviors they wished to support. The system then utilized this information to train an interaction model and incorporated it within a context-aware behavioral recognizer. With Exemplar [111], a designer could connect sensors to a PC and record examples of themselves performing desired motions. The streaming data was then visualized using small multiple views, which enabled designers to interactively evaluate different filters using the live data stream. Once satisfied, the designer could indicate which elements of a motion signal they wanted to recognize and the system would compute how accurately the signal would be recognized amongst the other recorded samples. Mogeste [245] also enabled designers to create new gesture vocabularies by recording mobile phone movement and motions, selecting elements of the movement that defined a target gesture, and testing the recognition using built-in classifiers. In a slightly different vein, a suite of applications developed by Kim and Nam [152] and Kim et al. [153], enabled designers and everyday users to prototype interactive applications that used touchscreens, cameras, or sensors-embedded objects. Designers could demonstrate a target gesture and have various motion signals captured by embedded sensors. The designer could then specify the state-transition diagram of the resulting sensor streams using an interactive visual markup language to define their desired gestures.

2.3.2 Model-Driven Development.

Aside from design-by-demonstration approaches to gestural design, other notable projects have employed the use of declarative and theoretical models of human behavior [78, 180, 184]. Leiva et al., for example, postulated that gesture vocabularies could be made by generating a large amount of gestural variants that were derivatives of a single example gesture [180]. Using the kinematic theory of rapid human movement, they argued that such variants could be used to improve recognition and negate the need to recruit participants to evaluate gesture languages. With their system, Gestures Go Go, designers could iteratively refine synthesized gestures by removing variants they found unsuitable and then generating more variants. The designers could also select and export a gesture recognizer (from a set of built-in recognizers based on the programming language of their choice) and a synthesized gesture set, to reduce the development time and efforts of gesture-based applications. The generated gesture vocabulary could then be evaluated using various methods, such as Leiva et al. Omnis [185], the GATO [183], or the KeyTime technique [181].

Work by Stern et al. viewed gesture vocabulary design as a multi-objective optimization problem [300]. With their approach, a designer identified the tasks, commands, and gestures that should be used in a system. They then conducted user experiments to determine how intuitive, stressing, and how frequently used the gestures would be. These results were then used to compute an intuitive matrix, comfort and stress matrices, and a reduced gesture language matrix. The gesture language matrix was then optimized to minimize the size of the gesture vocabulary and maximize recognition accuracy. The resulting optimization was combined with the intuitive, comfort, and stress matrices to match gestures to commands such that Pareto optimal solutions were obtained. From the generated solutions, the designer could then select the gesture vocabulary that best met their preferences.

Hochreuter’s LemonGrasp [123] enabled designers to specify and design multi-touch gestures using tunable Manipulation-Attributes, such as speed, number of fingers, and inertia. This interface enabled designers to define gestures and the state changes (i.e., feedback) that should accompany the performance of each gesture. With Gestit [296], Spano et al. proposed a method for creating gesture interfaces that utilized compositional and declarative gesture definitions to define interaction. The framework enabled developers to use declarative notations to define their own gestures or combine existing gesture elements into gesture sequences. The notation was then used to associate gestures to specific UI views and elements and to recognize the gestures. The SNAP programming tool [155] also followed a programming-by-demonstration notion, but used a conceptual model that viewed gestures and motions as “triggers”, devices that provide feedback as “objects”, and device feedback as “responses”. The software then enabled designers to specify interactions by composing these elements.

Each of these systems ensures that the resulting gestures are easy to recognize and create, rather than focusing on user-facing aspects of gesture design. Many of these systems are also targeted toward everyday users, but it is this population that may be unaware of the challenges that repeated execution, social acceptability, or environmental context may have on the design and use of gesture vocabularies. This neglect can be detrimental to users if, for example, the resulting gestures are difficult to remember, cumbersome or awkward to execute, or illogically mapped to functionality. Such systems would be better suited as components within a larger gesture design process that incorporates processes to ensure gestures focus on these important factors as well.

2.4 Summary

Regardless of the methodology chosen, the process of creating gestural vocabularies remains a cumbersome, time consuming, and error-prone endeavor. While each aforementioned methodology has use cases or situational requirements that it is best suited for, what one method excels at, another fails or does not even consider (Table 1).

Table 1.

In the case of user-led methodologies, despite using gestures everyday, many users are often unaware of the facets of gesture design. On the other hand, when experts design gestures, their expertise and concern for environmental or technology factors such as recognition rates or desired modalities can hamper their abilities to design gestures that are discoverable or learnable. Gesture vocabularies that are designed via modeling or by demonstration are only as good as the underlying models used to generate, model, or evaluate them. Systems such as Gesture Go Go, LemonGrasp, Gestit, and SNAP offer though-invoking visions of future gesture design processes because they suggest that the best gestures may be those that humans cannot even conceive of.

More recently, some researchers have proposed combining or chaining different approaches for gesture vocabulary design. For example, building upon of the concept of constructing complex gestures from primitive gesture components [190], Delamare et al. explored user-defined combinations of gesture primitives which were previously designed by experts in the context of smart TV interaction. Vuletic et al. explored the chaining of User-led and Expert-led approaches to design gestures for conceptual design applications [341]. Within their approach, researchers would perform a gesture elicitation study (i.e., User-led methodology), and then professionals of varied backgrounds would be asked to evaluate the resulting gesture set (i.e., Expert-led methodology) so that the gesture vocabulary could be expanded or pruned as necessary to be suited to a larger audience and be easier to learn. Vuletic et al. found, however, that after going through this exercise and implementing the gesture set, it would necessary for researchers to undergo additional rounds of evaluation of the resulting gesture set.

While an abundance of methodologies and systems have eased some of the processes required to create gesture vocabularies, they also call for a cohesive, clear set of criteria or factors that can be used to design and evaluate gestures. Identifying the factors that constitute an “optimal” gesture vocabulary is essential because they could further enable a dissection of the strengths and weaknesses of existing and future methodologies and systems. In addition, they would enable gesture designers to design, evaluate, and iterate on gesture vocabularies using factors that are most relevant to their use cases, rather than being constrained to the subset of factors that a methodology is currently capable of evaluating.

3 Factors Implicated in Gesture Design

To understand the multitude of factors implicated in the design of gestures, a survey of the literature on gesture design was conducted. As the terminology used to describe the facets of gestural interaction has evolved over many decades, it has resulted in a fragmented lexicon. This section synthesizes the literature on gesture design and provides a holistic and comprehensive understanding that designers, developers, researchers, and the field have been missing.

3.1 Methodology

To collect a corpus of representative literature, specific terms including “gesture design”, “design of gestures”, “gesture tools”, “gesture set design”, “gesture toolkit”, “gesture factors”, “gesture software”, and “gesture vocabulary design” were queried on the ACM Digital Library, IEEE Xplore Digital Library, Google Scholar, and Microsoft Academic Search. This resulted in 3,352 publications being collected as of April 2021. After removing duplicate entries (i.e., 1,277 publications) and publications that were not in English (i.e., 144 publications), 1,931 publications remained. To identify publications that focused on gesture design, we define the inclusion criteria as that the publication must focus on designing gesture vocabulary for human–computer interaction scenarios, devising methodologies for gesture design, or studying the aspects that can affect gesture performance. In total, 1,643 publications were excluded as they focused on unrelated themes such as American Sign Language learning, computer music, gesticulation within the fine arts, gesture recognition algorithms, and so on, or because they republished the same results in multiple venues or utilized a prior publication’s gesture vocabulary within a new domain instead of iterating on or developing a new gesture vocabulary. Finally, a collection of 288 publications remained, which covered the design of both 2D gestures using the fingers or hands, as well as 3D gestures using the arms, head, or feet.

Using the search terms as a starting point, the best practices, recommendations, experimental results, and future work themes discussed in each publication were classified using an open coding method [305]. Themes were (re)classified or aggregated as new factors emerged (e.g., learnability was found to be referred to as Fast Learning, Memorability, Recall, Ability to Recognize, Similarity, Fits Well with its Associated Function, Systematic Understanding, Appropriate Mappings, User Uptake, Action Matches Function, and so on, so all of these “factors” were aggregated under the umbrella term learnability). This process underwent many iterations and resulted in the identification of thirteen factors crucial to the design of gesture vocabularies (Figure 1). For readability and improved organization, the factors are grouped into four categories, based on the interconnected relationships between factors, e.g., those related to the interactive system itself, such as feedback and recognition were grouped into System Factors. Herein, each factor is defined and described using relevant examples and findings from the literature.

Fig. 1.

3.2 Situational Factors

Governing the development of any gesture vocabulary are the overarching goals that a designer is trying to achieve and constraints they must contend with. Within the context of gesture design, three themes important to situational constraints and opportunities emerged, including the context where the gestures will be performed, the modality used to perform the gestures, and the social factors that may influence a user’s performance and willingness to perform the gestures. Most often, these factors are found within research that employed Expert-led methodologies, as it is experts and designers who decide the domain and scope of the activities associated to the gestures.

3.2.1 Context (i.e., Users, Environments, and Tasks).

As defined by Dey, context describes any information that characterizes the situation where a gesture will be performed (i.e., people, places, and devices) [79]. The literature review demonstrated that the term context is used infrequently, whereas terms such as users, environment, or task¹ were more common. If gestures are performed in a vehicle while driving, for example, a designer should identify the limitations of sitting in the driver’s seat and the risks associated with divided attention, as well as seek opportunities that minimize the movements of hands away from the wheel or the eyes away from the road [15, 86, 250, 252]. If children [61, 69, 189, 328], seniors [61, 303, 304], or those with motor [46] or visual impairments [142, 144] will be performing the gestures, it is important to identify the unique challenges that such a population may incur. For example, if a designer is creating a gesture vocabulary for elderly users, they should consider how the physical limitations and memory issues that often occur with age may influence the recall and performance of the gestures [303, 304].

Before one can evaluate the degree to which their gesture vocabulary meets or exceeds each of the challenges or opportunities that the context imposes, designers must determine the requirements of their gesture vocabulary, which is often done before the gesture vocabulary is created. However, as the survey demonstrated, none of the gesture vocabularies created to date weighed people, places, and devices equally. Thus, a requirements evaluation can serve as a sanity check for designers, ensuring that they are aware of all the challenges that their gesture vocabulary may encounter.

To determine the requirements of their gesture vocabulary, one could use techniques such as working through scenarios [308], storyboarding [313], use case development [131], task analysis [81], or wireframing [126]. Robertson and Robertson also proposed the Volere Requirements Specification Method (VRSM), which is used within engineering, organizational management, and product design to identify the importance of preconditions of a product [272]. It contains 27 types of requirements that include, among others, legal, environmental, cost, scope, and performance. To identify a requirement, one would complete a template scorecard (i.e., Requirement Shell) to articulate the rational and importance of the requirement, as well as identify the consequences that could occur if the requirement is not considered. This process lends itself to the holistic and extensive understanding of gesture requirements.

3.2.2 Modality.

According to Nigay, interaction modality with HCI has been discussed at multiple levels of abstraction. For example, a modality could be specified in general terms as “using gestures” when comparing with modalities such as speech and keyboard, or more specifically as “using finger gestures on a multi-touch screen” when comparing with arm or body gestures [72]. Today, a common level of abstraction has been the physical devices (e.g., touchscreen devices) and the interaction languages (e.g., a set of 2D gestures) that a user would employ to achieve their goals [232]. Modalities have also been referred to as enabling technologies [146] or tools [113] within the literature.

Within the context of gesture design and for the purpose of distinguishing the many different types of gestural interaction, we employ the lowest level of abstraction for modality, which refers to the sensory input channel used to perform a gesture and the output channel that a system uses to provide feedback to the user [72]. Designing for specific modalities has been, unsurprisingly, a very popular topic to explore.²

In terms of input, modalities come in a variety of forms: fingers [265, 302, 303, 315, 374], hands [187, 241, 244, 267], feet [134, 140, 156], nose [247, 385], the entire body [69], implements such as styli [63, 316], and tangible objects such as steering wheels [15], controllers [64], or remote controls [372]. Each input modality has a series of contexts that it works best for. Touch, for example, is best for the direct physical manipulation of objects and is ubiquitous in mobile settings, mice are preferred for their precision and accuracy, styli best mimic the nuances of inking and sketching, and the keyboard is one of the most efficient input devices. The reader is referred to Karam’s review of gestures in HCI for more specific details [146]. Such diverse input options, however, introduce constraints for designers, not only in terms of which modality to support, but also if and how execution will differ between modalities (e.g., finger and pen [316], or finger and arm [89]). Although finger and arm-based gestures are the most popular, gesture designers also need to be aware that alternative, less common input channels such as the feet, legs, elbows, and so on may be better suited for input when, for example, the hands are occupied or attention is divided (e.g., [112]). Note that work by Köpsel et al. indicated that the input modality chosen may play less of a role in gesture performance than its context [162].

In terms of output, the technology that is available in an environment can influence how gestures are executed and users’ responses to them. Many investigations have been undertaken to understand which gestures are best matched to a given technology (e.g., tablets [259], televisions [323], mobile phones [30, 48, 363, 390], multi-touch displays [173], and so on). For example, if only a large screen is available, users will have a tendency to make larger gestures, which will take longer to perform [127]. If content is being transferred between displays of different sizes, users have been found to first gesture to make the content smaller and then gesture to transfer the content [170]. Devices that have a smaller form factor such as tablets or mobile phones enable users to gesture with and on them, both when stationary and on the go [48, 316]. Larger devices such as computer monitors and wall-sized displays do not offer such opportunities. Differences have also been found in the smoothness and number of gesture repetitions users will perform while using tangibles and physical objects versus virtual proxies and digital content [217]. Even the techniques used to detect gestural input, e.g., optical versus surface acoustic wave, have been shown to impact the nature of gestural movements [97]. In other cases, environmental factors may limit the technology that can be used. For example, in-air gestures are often preferred over touch-based gestures for hygiene reasons in the kitchen [98] or during surgical procedures [274].

The utility and usability of modality-specific gesture vocabularies are often evaluated via Expert-led methodologies that involve end users or close approximations of them. Experts typically work with a specific modality, or a number of different modalities to compare performance and often have user study participants’ movements, behavior, and opinions of the modalities observed and measured [42, 66, 92, 377]. If gestures performed using a specific modality are found to require more time and effort to learn, a designer may need to modify or redesign the gestures. A designer also may need to switch to an alternative modality, but this could also result in the redesign of the entire gesture vocabulary. We recommend that designers select a modality taking a holistic consideration of the identified requirements within their desired context.

3.2.3 Social Acceptability.

Social acceptability, as articulated by Williamson, refers to the social and cultural factors that affect the user experience when a gesture is performed [358]. Such factors can include the location where the gesture will be performed, the presence and composition of the audience in this location, the age of the user, the body parts and areas that the gesture requires, the size and duration of the gesture, and so on.³ Within the literature, social acceptability has also been referred to as social comfort, cultural acceptability, and cultural transparency. Although interaction is largely considered to be an individual experience, acceptability includes both how the user feels while performing the gestures (i.e., the user’s social acceptability) and how nearby spectators will perceive the gestures (i.e., the spectator’s social acceptability [269]). Thus, the perceptions and reactions of others to the performance of gestures are important aspects of gesture design that cannot be overlooked.

Koelle et al. identified four design strategies employed in the HCI literature to improve the social acceptability of interaction techniques, which include using subtle and unobtrusive interaction to avoid negative attention, avoiding suggestiveness and misinterpretation, designing interactive devices that are accessory like and of familiar styles, as well as making interaction more candid, transparent, and observable [159]. These design strategies, however, do not always conform with each other. Pohl et al. also found that hiding and deception have been employed to enable subtle and unobtrusive interaction in social settings [256]. For example, Anderson et al. proposed using deceptive and illusory techniques informed by principles of magic to improve subtlety of interaction with digital devices to avoid negative effects on in-person interaction.

Rico and Brewster argued that acceptability is not purely a binary measure, but rather occurs along a continuum [269]. Some gestures may be more appropriate to perform in certain locations (e.g., in an elevator versus on a street corner) or with certain groups of spectators (e.g., with family versus strangers). The social acceptability of a gesture can be further influenced by one’s need to repeatedly perform movements due to poor recognition, if the gesture needs to be large and cannot be performed subtly, or if the gesture employs mimetic, instead of alphabetic, movements [87]. Cultural norms also play a role, as hand and bodily gestures from one culture may be offensive or inappropriate in another. When users deem gestures to be less acceptable or appropriate to perform, they will be more hesitant and less likely to utilize them, or if they do perform them, will feel uncomfortable [2, 82].

Best practices to achieve culturally- and socially-resilient gesture vocabularies recommend asking end users to perform gestures in the wild or in a laboratory. Participants are then asked via think-aloud protocols or questionnaires about how they feel while or after performing the gestures [2, 5, 82, 99]. As such methodologies require users to perform gestures in public settings or around people they do not know, “if users decide to interact by gestures (in the wild), they implicitly categorize this interface as socially acceptable” [99]. This enables a designer to evaluate social and cultural acceptability alongside other factors without incurring additional costs.

Gestures have also been recommended to be small, unobtrusive, unnoticeable [211], physically comfortable, enjoyable, similar to everyday movements or gestures used with existing technologies, and not interfere with communication [358]. As suggested by Rehfeld et al., assigning gestures to functional mappings such that gestures are universal across cultural contexts can make it easier for users to learn and remember gestures [263].

Lastly, repeated exposure to, and performance of, some gestures can result in them changing the cultural zeitgeist and becoming more socially acceptable over time [299]. Examples of now-culturally appropriate gestures include the swipe gesture that users performed on the temples of Google Glass, waving toward a Kinect sensor, and using foot gestures to interact with virtual spaces [134]. It is important for designers to remember that as certain gestures become more or less appropriate over time, it may be necessary to iterate on their gesture vocabularies later.

3.3 Cognitive Factors

In addition to situational factors, another theme that emerged was the importance of cognitive factors to gesture design. A number of studies within the literature have focused on the importance of the discoverability of the gesture vocabulary, how intuitive it is to use, how learnable the gesture vocabulary is, and how transferable the gesture vocabulary is to other domains or contexts. Each of these factors is affected by the user’s cognitive understanding of the system and the motions they should be performing. Cognitive factors are often found to be important with User-led, and occasionally Expert-led, methodologies [17, 121, 362, 364, 378].

3.3.1 Discoverability.

The quality of gestures that enables a user to access intended referents of those gestures despite a lack of knowledge about the gesture is often referred to as the discoverability of a gesture [361]. This factor has also been termed guessability, approachability, self-revealing, and metaphorically or iconically logical towards functionality⁴ and is most often considered during User-led methodologies. Regardless of the terminology used, in some contexts it may be crucial that a gesture or gesture vocabulary is discoverable. For example, if one is required to perform a gesture to initiate interaction with a kiosk that they are encountering for the first time, it is important to have a discoverable gesture vocabulary so they can quickly access system functionality, rather than become frustrated and walk away.

As proposed by Wobbrock et al. [361, 362], to determine how discoverable a gesture is, a gesture vocabulary designer can compute the agreement rate or agreeability of a set of gestures using various methods [9, 326, 331]. The agreement rate determines how similar the gestures elicited from different participants are to each other, and thus how easy the gestures would be to guess and perform in the wild [334, 340, 361]. As an alternative to this, Seto proposed the use of an in-the-wild observational paradigm, which used video data to measure how frequently gestures were performed by uninformed users in the wild and suggested that frequency could be used as a measure of discoverability [289].

If a gesture set or individual gesture is deemed undiscoverable, many techniques can be used to improve its discoverability. First, the output presented to users could be changed to provide pre-execution information about the gestures a system supports (i.e., self-revealing gestures [38] or visual and audio cues ([20, 381]). For example, using the hover state, a gesture cheat sheet, or an animation or video of the gestures that could be performed could be displayed [18, 357]. If visual feedforward and feedback cannot improve discoverability, a designer could also make use of an action-outcome mapping that mimics common actions (e.g., like Hinckley et al. cutting and pinching gestures ([121]). Piggybacking on an already familiar or discoverable gesture is another way of increasing the discoverability of functionality [378]. To reduce the need for users to discover and learn a new set of gestures when interacting with a new application or interactive device, Vatavu proposed nomadic gestures, which allow users to reuse their preferred and practiced gestures to interact with newly encountered devices [322].

As more everyday users become collectively required to use and acquire gestures to complete day-to-day tasks, the higher the likelihood that gestures that were not previously discoverable become discoverable. Therefore, in some cases, it may be appropriate for developers to utilize widely adopted gesture vocabularies for common interaction tasks.

3.3.2 Intuitiveness.

In 1994, Raskin stated that intuitiveness refers to the degree to which a gesture makes use of transferable and existing knowledge [260]. Years later, Jacob et al. furthered this by recommended that newly proposed interaction techniques should draw strength from users’ pre-existing knowledge of the everyday, non-digital world [130]. For example, swiping to the left to flip a virtual page could be considered an intuitive gesture because this action mimics the physical movements made when turning a page [343]. Intuitiveness has also been referred to as appropriateness, familiarity, and necessity.⁵ Intuitiveness is an important factor to consider because it ensures that the “gesture afforded by the interface design is aligned to how users expect to provide input” [173] and that there is an appropriate “cognitive association between [the] command or intent, and its physical gestural expression” [300]. If the mapping between the action or movement does not match the resulting functionality, or it is hard for the user to draw parallels or build upon the existing knowledge they have, they will perform the incorrect gesture or not perform any gesture at all. Note that a gesture may be intuitive without being discoverable (e.g., prior to 2007, novice touchscreen participants would rarely guess the “pinch to zoom” gesture, though once shown by an experimenter, would recall it flawlessly [357]).

Many methods can be used to measure how intuitive a gesture is. Nielsen et al., for example, proposed two approaches [229], (i) a bottom-up approach similar to Wobbrock et al. UDG methodology [362] that presented actions or functionality to users and asked them to create a suitable gesture and (ii) a top-down approach that showed users a gesture and asked them to select the functionality that should be associated to the gesture from a candidate list. With these approaches, it is common to ask novices [58, 101, 102, 115, 128, 300], experts [38], or members of a design or research team [100] to comment on intuitiveness via questionnaires, interviews, video transcriptions of the think-aloud words used by users while performing the gestures, and so on. As proposed by Agarwal and Prabaker, one could also empirically measure intuitiveness by the computing the differences in task duration between experts and novices [3].

If gestures are not found to be intuitive, it would be beneficial to change the mapping between the gesture and the outcome, making use of a mapping that assigns the gesture to more cognitively similar actions or mimic natural movements [121, 381]. It is also important to note that the repeated exposure and performance of gestures in one context may make them feel more intuitive in another context. For example, Wobbrock et al. found that the traditional desktop user interface metaphor was deeply rooted in the participants’ mental model and significantly influenced gestures they created for multitouch surfaces.

3.3.3 Learnability.

The notion of learnability refers to the ease with which new users can begin effective interaction and achieve maximal performance with a gesture [83]. Learnability is one of the most popular factors within the literature and is often referred to as fast learning, memorability, recall, recognize, similarity, fits well with its associated function, systematic, chunking, mapping, uptake, or matching.⁶

Learnability and intuitiveness are complementary, as the more intuitive a gesture is, the easier it is to learn. However, intuitiveness refers to the mapping between a movement and outcome, whereas learnability focuses on how this mapping, in addition to other factors, enables one to recall a gesture. If a user is never able to learn a gesture vocabulary, it will not be used. Learnable gesture vocabularies are beneficial because they negate the need for extensive training.

There are many methods that a designer can use to determine how learnable a gesture or gesture vocabulary is. Computing the number of gestures a user can remember is one method to determine learnability [21, 34, 50]. Wobbrock et al. proposed using techniques such as counting the gestures required for a unit action (i.e., gestures per character for text entry) or measuring the time it takes to perform each gesture [364]. These measures assume that less learnable gestures will have higher numbers of gestures per character and longer task times.

If training will be provided to a user before they will perform a gesture, then learnability may not be as important. As there have been many investigations into the best way to teach gestures, the reader is referred to Anderson and Bischof [13], Bau and Mackay [36], Bragdon et al. [50], or Yee [381] for more details.

3.3.4 Transferability.

Perkins and Salomon defined transferability as the degree to which the learning of a gesture can enhance or undermine its performance in another context [248]. Within the literature, this has also been termed generalizability, transfer to expertise, or adaptability, and has been applied to transferring learned gestures to other hands [17], modalities [294, 295], applications or devices [368], user populations [182, 325], and to expert performance [13, 34, 50, 93, 139, 384].⁷ For example, [325] framed interactive gestures as a type of knowledge that can be characterized by the information of the gesture movement, having an understanding of the situation where the gesture is performed, and the experience and skills of the user that leads to efficient execution of the gesture. [325] further proposed that gesture knowledge should be easily transferred to situations requiring different gesture articulation, sensing environments, and interpretations of users’ preferences.

Transferability is similar to learnability [67] in that to use a gesture in a different context, a user must first learn a gesture and then later recall it on demand. It is different, however, in that certain gestures lend themselves better to transferring to a different context or situation. Consider transferability to expert performance as an example. Marking Menus [171] gestures enable users to gradually improve the efficiency of gesture performance and robustness to error through repetitive executions. If gestures can be modified to chain off each other or utilize chunking, transferability can be improved [56]. This may, however, result in a gesture being composed of entirely new movements. Training and tutorials for gestures can also facilitate progress toward expert performance.

Transferability is often only considered when Expert-led methodologies are used. To measure the transferability of a gesture or entire gesture vocabulary, one can use a retention and transfer paradigm from the motor learning literature [13, 17]. As suggested by Silpasuwanchai and Ren [294, 295], a designer could also ask users from the target population to comment on other body parts that could be used to generate a given gesture. Given that this factor is dependent on the user’s cognitive ability to remember and execute motor movements, it is imperative that users are involved in the design or evaluation of a gesture vocabulary that is intended to be transferable [13, 17].

3.4 Physical Factors

The third theme that emerged within the literature were those factors that were influenced by the physical movements that users make. These factors largely relate to the efficiency with which gestures can be performed, the complexity of the movements, how ergonomically appropriate the movements are, how natural the movements feel, and the degree to which the interface or feedback provided to the user is occluded by the motions of the gestures.

3.4.1 Complexity.

The Oxford Dictionary defines complexity as the state or quality of being intricate or complicated [243]. When referring to gestures, this implies there is a certain degree of difficulty to the movements that must be remembered or performed. Complexity may also be referred to as mental load, performability, simplicity, or attention.⁸ The gestures that a user is required to perform should have the appropriate level of complexity given the task at hand. When using a mobile phone, for example, the gesture performed to unlock the phone, which may be done hundreds of times a day, should be as uncomplicated and simple as possible so that the user can perform it in a variety of situations with relative ease. In other situations, however, it may be appropriate to require that a more complex gesture be performed. If one is using an in-air gestural interface to direct the movements of an endoscopic surgical robot, for example, the complexity of the movements should match those the surgeon would normally perform during non-robotic surgery to ensure that the robot performs the correct maneuvers and that gestures are not accidentally identified.

Measuring the complexity of a gesture requires one to critically analyze the constituents of the gesture movement, such as the number of steps, repetitive patterns, or distinct, discrete movements each gesture requires [285]. Production time measurements and efficiency estimations with the models used to evaluate efficiency can also indicate the complexity of a gesture. As recommended by Yee, one should also manually check if a gesture requires unusual movements that increase the effort one would need to exert [381]. As a majority of the methods to determine complexity require empirical measurements, it is quite common for research concerned about complexity to utilize Expert-led or Computationally-based methodologies to create gesture vocabularies. For example, Leiva et al. proposed Omnis Prædictio, a generic technique that can provide user-independent estimations of the many numerical features of gestures, such as production time, speed, and curviness [185].

Complexity often influences other factors as well, e.g., a complex gesture may be hard to learn because it is difficult to use and not discoverable [179]. As advantageous as it is to have a simple gesture vocabulary, in some contexts, it is just as important to avoid overloading gestures (e.g., reusing the same gesture across applications but mapping it to different outcomes). Complexity can also be related to efficiency, as increased complexity often correlates with decreased efficiency (see Section 3.4.2 for a list of efficiency measures). These factors should thus be considered in tandem.

If a gesture is found to be highly complex, the designer can reduce the number of segments of the gesture movement, the number of fingers or hands required to perform the gesture, or completely redesign the gesture itself.

3.4.2 Efficiency.

As per Office, efficiency refers to the accuracy and completeness with which users perform an action or gesture [239]. Within the context of gesture design, efficiency is often referred to as difficulty, human performance, interaction cost, speed, duration, effort, easy to perform, or easy operation.⁹ Efficiency is important for gesture-based systems because it enables users to increase their throughput as well as decrease the amount of physical and mental effort that they need to exert. Selecting a single object using a lasso gesture, for example, takes longer, covers a greater distance, and increases the likelihood of incorrect selections when compared to a simple tap gesture. While using a mobile keyboard, for example, typing by tapping letters with one’s thumb is less efficient than swiping through the letters in a continuous manner, as the latter removes the “up and down” motion between letter selections [168, 387]. Furthermore, systems which support “lazy” execution, where the general shape of swipes is considered rather than specific locations, enables experts to generate gestural input with increased efficiency [387, 387].

Efficiency is often considered when using Expert-led or Computationally-based methodologies because the experts or algorithms designing the gestures use established techniques such as measuring the time necessary to perform each gesture [278] or by applying estimation models such as Vatavu et al. [330] or Isokoski’s [129] efficiency measures, Cao and Zhai’s CLC Model or Leiva et al. KeyTime technique for unistoke gestures, Bjerre and Pedersen’s [43] or Rice and Lartigue’s Touch-level Models [268], or Batran and Dunlop’s enhanced Keystroke-Level Model [35], as well as Leiva et al. GATO technique [183].

If the gestures are found to be inefficient, one can make several modifications to the gesture. The duration or size of the gesture could be decreased. A designer could also reduce the number of repetitions that are required to perform the gesture (e.g., change the control to display ratio), or completely redesign the gesture itself. In addition to modifying the gestures, beginning gesture predicting during input can also reduce the time needed for a system to recognize gestures and trigger corresponding actions [41, 286].

3.4.3 Ergonomics.

The Office defines ergonomics as the human anatomical, anthropometric, physiological, and bio-mechanical characteristics as they relate to physical activity [239]. Thus, with respect to gestures, users should feel physically well when performing a gesture and they should not encounter any pain, tiredness, or discomfort. When designing gestures, it is important to consider the potential risks that may be involved. Some risks may be immediately apparent, such as asking users to perform a complex gesture that requires visual attention on the touchscreen of a car, whereas other may be more long term, such as repetitive movements that lead to repetitive stress injuries or carpel tunnel syndrome [229]. Although many factors encompass ergonomics, it is important a user is aware of the positions that their body will be in when performing a gesture to avoid fatigue, such as with the “gorilla arm” [275]. Within the literature, ergonomics is also often referred to as physical ergonomics, point of articulation, safety, fatigue, comfort, stress, feasibility, bio-mechanical risk, or gesture mechanics.¹⁰

To determine if gestures are ergonomic, designers should evaluate if their proposed gestures will be safe and comfortable to perform. Designers should evaluate each gesture in their vocabulary by examining:

–

The use of outer positions of the joints of the finger, wrist, shoulder, and so on. [229]

–

The use of repetition [229]

–

How relaxed the muscles are [229]

–

The use of static positions [229]

–

How relaxed the “middle” neutral position is between outer positions [229]

–

The internal and external force exerted on joints [229]

–

If the gestures will stop body fluids from flowing [229]

–

Increases in heart rate and fatigue [34]

–

The amount of arm fatigue [117]

–

The prevalence of multi-finger gestures (as opposed to single touch and two-finger gestures) [266]

–

The prevalence of bi-manual movements (as opposed to sequential uni-manual movements) [266]

–

The flexibility in the number of strokes required for a gesture [266]

–

The use of unfamiliar shapes and complex geometries [266]

In addition to these guidelines, a gesture may be safe to perform infrequently but repeated performances of it may fatigue a user or gradually degrade other aspects of their posture (e.g., they begin to slouch, shift their weight onto one leg, lean against another surface, etc.). Thus, it is important to consider the complexity of a gesture, in addition to how it will be recognized, when thinking about its ergonomics. Within the literature, gesture ergonomics are most often associated with Expert-led or Computationally-based methodologies because end users infrequently consider the effects of performing a gesture multiple times over a long duration, or within different, possibly more tiring contexts. There has been some exploration of ergonomics within User-led methodologies, i.e., external physical constraints have been employed to emulate fatigue when eliciting gestures. For example, Ruiz and Vogel fastened weights to users’ wrists to simulate low arm fatigue [282]. However, more exploration is required to identify whether and how accurately external constraints can simulate the many kinds of ergonomic constraints and criteria that were identified from the literature.

Ergonomics also depends on the modality that gestures are performed with. Less physical effort will be exerted if a gesture is performed with fingers than with arms, for example. When directly manipulating elements on interactive surfaces, objects that are too large or too small can be problematic. A common approach to addressing the inherent ergonomic issues of modalities is to require that users perform small or indirect gestures with parts of their body that require the least physical movement [140, 251, 355].

If a gesture is deemed to be unsafe or harmful, a designer may be able to change the size of the gesture, the orientation of the body part making the gesture, where the gesture is performed, the pressure required (if utilized), or the number of fingers required. In severe cases, the gesture itself may need to be completely redesigned [229].

3.4.4 Occlusion.

How much of a system’s output is covered or blocked when a gesture is performed is commonly referred to as occlusion or the fat finger problem [336]. For example, on a smartphone, if a user wants to place a pin at a location on a map by holding their finger down for a fixed period of time, they would not know if the pin was placed correctly because their finger would occlude the portion of the screen underneath their finger [379]. This problem becomes even more concerning on ultra-small displays such as smartwatches [377]. When content is occluded, the user either misses important information or feedback, has to relocate where they perform a gesture, or change the size, scale, or orientation of the gesture they perform [336]. As such, the occlusion factor is related to the feedback factor and occlusion is of most relevance to systems that require direct input, such as touchscreens or stylus-enabled devices.¹¹

To determine if a gesture occludes other content, designers first need to consider how the movements each gesture requires will obscure or block any visual information or interaction widgets that may be presented to the user. As proposed by Vogel and Balakrishnan, the Hand Occlusion Model can be used to determine which areas of the screen will be obscured by the hand, wrist, and arm [336]. If this model is not a viable option for other body parts or posture, a designer could also visually inspect the gestures or the intended interface to determine how much content is blocked by the movements of the gestures, similar to Brandl et al. [51]. Because it is possible to determine if a gesture will occlude existing or future content without requiring that a user physically perform the gesture, this factor is often considered during Expert-led or Computationally-based methodologies.

If a designer has determined that one or more gestures will occlude content, a few options are available. First, the designer can maintain the gesture itself but change where it is performed on the screen. Making use of dedicated or offset gesture areas could also be a viable option [50, 381]. Alternatively, one could change the size of the gesture (i.e., instead of requiring motions that span an entire surface, a micro version of the gesture could be required). Call-outs of the occluded content or offsets of the input could also be possible modifications [336, 337].

3.5 System Factors

The final category of factors that emerged were those related to the features and techniques contained within the systems and devices themselves. The two factors within this category arise from the techniques and technologies that are used to recognize the user’s input and provide feedback to the user during and after the gesture has been performed.

3.5.1 Feedback.

Another factor that was prevalent in the literature was feedback. As defined by Schneiderman, feedback refers to the communication that a system has with a user resulting directly from the user’s actions (i.e., gesture [287]). Feedback is important because it allows a user to determine whether the actual functionality of the system matches their mental model of the functionality of the system [233]. Within gestural systems, feedback can inform the user whether a gesture they made is currently being recognized or was recognized. In some systems, this information can be conveyed using a sound [298], whereas in others visual changes will occur within the interface [17] or haptic, olfactory, or tactile feedback may be rendered or generated. Feedback has also been referred to as feedback ambiguity or system response,¹² and is most often considered when using Expert-led methodologies.

To determine what feedback is appropriate for a given action or functionality, one could elicit feedback mechanisms from users or evaluate the effectiveness of various techniques that were designed by experts during user studies with pre-defined sets of gestures. These studies could measure noticeability, frustration, and so on. The design and evaluation of feedback are similar to the complexity of gesture design itself because there are a multitude of intertwined factors that influence feedback including size, shape, scale, persistence, duration, frequency, modality, and so on [293]. In essence, feedback should clearly communicate success and sources of errors to users without ambiguity so that they can take corrective actions when using a system [356].

Although feedback occurs after the execution of a gesture, feedforward (i.e., the information provided while performing a gesture [36, 75, 76, 275]) is also important to consider as it can improve learnability. For example, with OctoPocus, on-screen feedback, and feedforward are provided while drawing a gesture. The feedback and feedforward enable users to not only understand the system’s status but also be reminded about possible gesture paths, which can improve the execution and learning of the gestures [36].

3.5.2 Recognition.

The recognition of a gesture is the process of tracking a gesture from its initial representation through to its later conversion into a semantically meaningful command [262]. Within a subsection of the research literature, recognition techniques (i.e., detectability, ease of implementation, orientation, recognition rate, selection accuracy, sensing, or risk of confusion with natural movements) have been of the utmost importance.¹³ When a system is unable to recognize a gesture, or recognizes a gesture incorrectly, users will try to perform the gesture again and again in the same way, make larger and more exaggerated movements until it is recognized, or they will stop performing the gesture altogether [40]. Recognition is most often considered when Expert-led or Computationally-based methodologies are used.

Recognition goes hand-in-hand with discoverability, i.e., if a user walks up to a system for the first time and performs the “correct” gesture, the gesture has no utility if the system does not accurately recognize it. Most often, the performance of a recognition technique can be evaluated using existing databases or recordings of gesture performances or in real time by capturing users perform gestures. As there are a multitude of gesture recognition techniques in the literature, the reader is referred to Khan and Ibraheem [150], Murthy and Jadon [222], Suarez and Murphy [307], or Pisharady and Saerbeck [254] for reviews of possible gesture recognition techniques.

3.6 Discussion

The categories of factors were also found to have different design and evaluation requirements. The situational factors, such as context and modality, often need to be approached on a case-by-case basis, where gestures are designed to suit the specific needs of the contexts and leverage the advantages of different modalities. As such, these factors were generally evaluated by examining whether the designed gestures could help users complete tasks under certain contexts and with certain modalities, and depending on the context could be assessed using expert or computational methodologies.

On the other hand, the physical, cognitive, and system factors were found to be general criteria that should apply to all gesture systems that are designed. Because of this, physical factors and system factors were generally evaluated using datasets or real-time motions that were collected from users but were most often only considered during Expert-led or Computationally-based methodologies. Cognitive factors were generally evaluated via questionnaires with participants during or at the conclusion of an experiment to capture their impressions but not impede with their performance of the gestures. These factors were often assessed only during User-led methodologies.

Thus, as the factor identification found, there are many factors that are implicated in the design of gestures (Table 2). Some factors stand relatively isolated, such as social acceptability, whereas others have a high interdependence, such as learnability and discoverability. Across the literature that was reviewed, there was not one experiment or methodology that considered all of these factors simultaneously, thereby underscoring the importance of reviewing and synthesizing this body of literature and identifying these factors. In what follows, we assess the three categories of gesture design methodologies in more detail, focusing on the implications that can occur when one chooses to use a given methodology to design a gesture vocabulary.

Table 2.

4 Holistic Importance of the Identified Factors to Gesture Design

The identification and synthesis of the 13 aforementioned factors should provide the community with a more consistent vocabulary and understanding of the characteristics that are important to consider when designing a gesture vocabulary. The analysis also provides an opportunity to holistically explore how these factors influence the gesture design process that one chooses to employ and the implications that such factors can have if they are not considered when designing a gesture vocabulary. Herein, we re-examine the three categories of gesture-design methodologies (i.e., Expert-Led, User-Led, and Computationally-Based) from a factor-oriented perspective to better understand the degree to which each methodology focuses on each factor and highlight the potential pitfalls of using such methods by applying the definitions and heuristics synthesized from the literature (Table 3).

Table 3.

4.1 Situational Factors: Context

Among the three methodologies, User-led methodologies best address the context of gesture vocabularies because end-users understand the context within which they will be performing gestures so they are often the best entities to relate designed gestures to targeted tasks and environments [179].

During Expert-led methodologies, if an expert is not well-versed or does not have extensive experience with the context within which gestures will be performed, then it is important that such methodologies contain a phase or component that involve the target end-user population (or a reasonably close approximation of them) in the design process. Such a phase will enable the expert to compensate for the knowledge they are missing and gather much needed insights and feedback from users. For example, when designing gestures for users with visual or motor impairments, it is essential to involve the target population during the design process so that the requirements and preferences of such users can be captured [142, 253].

Computationally-based methodologies are the least effective at understanding and designing for the nuances of context because it is difficult for algorithms to predict or quantitatively measure characteristics such as who will be using a gesture vocabulary, where they will be doing so, and what devices they will be using. In some sense, Computationally-based tools such as MAGIC [26] and MAGIC 2.0 [160] could be viewed as tools that understood a primitive level of context because they sought to learn and identify motions that should not be recognized (e.g., false positives or unintentional input within everyday use), however, this level of context is far more general and vague than that which could be obtained when User-led or Expert-led methodologies are used. If, however, a researcher explicitly provides such information to a computational model, such as how Stern et al. specified task-based information as part of their multi-objective optimization algorithm [300], then it is possible for such methodologies to consider context during gesture design. Such methodologies, however, do require that the expert, programmer, or designer has knowledge about the context to provide to such algorithms.

4.2 Situational Factors: Modality

Expert-led methodologies excel at considering the implications of different modalities because experts are much more familiar with the characteristics and limitations of a chosen modality that users, especially for emerging technologies that they may have created [18, 119, 376, 377]. For example, by instrumenting a digital stylus with multi-touch sensing capabilities along the entire barrel of the stylus, Hinckley et al. were able to recognize and identify a new set of stylus barrel gripping gestures that could be used to enhance interaction with multi-touch surfaces [119].

Computationally-based methodologies could be used to evaluate the influence of a modality on a gesture vocabulary, but only if the properties and functionality of a modality can be modeled or recorded by experts. For example, with the aCAPella [106], Exemplar [111], and Mogeste [245] systems, in addition to work by Kim and Nam and Kim et al., designers or developers could record data from modalities such as touchscreens, mobile phones, cameras, RFID tags, or sensor-embedded objects and use these data streams to generate gesture vocabularies [152, 153]. These approaches, however, require that the designer or developer has background knowledge and insights into which data they should record from such devices and has the ability to access and record such data.

Unlike experts, who’s knowledge about a modality comes from their authoring of the modality or their experience watching participants use the modality, users are far less likely to be experts with a given modality. Interestingly, many research projects have solicited gestures from novice users who were using a modality or device for the first time (e.g., interacting with smart glasses [317] or unmanned aerial vehicles [249]). While novices’ limited experiences may help them create gestures that are more discoverable, their naivete may encourage them to reuse gestures from existing applications or contexts [85, 214, 281] or design gestures that would be awkward or socially unacceptable to perform [204]. This thus makes User-led methodologies the least effective at evaluating the role of modality.

4.3 Situational Factors: Social Acceptability

The best practice to ensure gestures are socially acceptable is to involve end users in the design process and to encourage them to actively think about the social context that the gestures will be performed in [2, 82, 99, 269]. User-led methodologies are thus the best methodology to use to do so.

By default, Expert-led methodologies, are not the preferred method to solicit the social acceptability of gestures because, unless the expert(s) are experts in the nuances of the social context their gesture vocabulary will be used, they cannot understand the social or cultural implications of each gestures’ performance. However, if such methodologies are augmented to allow participants to provide feedback using think-aloud protocols or questionnaires, then such methodologies can be used to understand how participants feel while or after performing gestures [2, 5, 82, 99]. For example, Expert-led gesture vocabularies that were created without considering social acceptability recommended that interaction with remote displays should occur at substantial distances from a display (e.g, 250 cm) [205, 275]. However, research by Ahlström et al., found that in-air gestures performed further than 30 cm away from a device or for longer than 6 seconds were perceived as significantly less socially acceptable in public settings than those performed closer to a device or for a shorter duration [5].

Computationally-based methodologies are, unfortunately, not suitable to ensure gestures are socially acceptable because the many variables that are relevant to social acceptability, such as culture, audience, and environment cannot be easily quantified and often vary immensely from situation to situation.

4.4 Cognitive Factors: Discoverability

Discoverable gestures are often informed by the past knowledge, experiences, and habits of users [361]. In many cases, the history that a user brings to an elicitation study differs significantly from the history and experiences of experts and is difficult to capture within a computational model. As such, it is often difficult for experts or computational models to conceive gestures that will be discoverable by general or target populations. For example, both Hinckley et al. and Xia et al. introduced expressive bimanual pen and touch interaction techniques. However, participants reported that they would not have “guessed” many of the gestures without guidance [121, 376].

User-led methodologies, however, seek to mimic users’ first attempts at a gesture in the wild with the hope that the derived gestures will be easily discoverable. However, individual differences between users make it challenging for elicited gestures to be universally discoverable. For example, Wobbrock et al. found that among the 27 commands for which participants created gestures in their study, only 2 gestures (i.e., move a little and move a lot) were universally agreed upon by all participants [362]. As suggested by Vatavu and Wobbrock, in some cases, one may need to compute how dissimilar the gestures for a given referent are [332]. While gestures that receive moderate agreement ratings may still be discoverable after several attempts, the lack of consensus between users suggests that gestures that are created via User-led methodologies can improve discoverability but do not fully guarantee it.

4.5 Cognitive Factors: Intuitiveness

Similar to discoverability, intuitive gestures also rely on users’ past experience and first hand knowledge. Gestures designed by end users should be intuitive, however, Wobbrock et al. found low agreement amongst users for the majority of the referent commands in their study [362]. In addition, while gestures such as double-tap, touch and hold, pinch and spread were all included in the UDG set in Wobbrock et al. work [362], these same gestures were not discovered or understood by the older adult participants in Leitao and Silva’s work [179]. As such, gestures that are created via User-led methodologies may have increased intuitiveness but may not necessarily be intuitive.

In a similar vein, the use of Expert-led methodologies does not guarantee that one will solicit an intuitive gesture vocabulary, as experts’ prior experience and (likely) overexposure to gestural interaction may bias what they consider to be intuitive [224]. Lastly, because it is difficult for computational models to predict or create meaningful cognitive mappings, the use of Computationally-based methodologies is also not suitable for the creation of intuitive gesture vocabularies.

4.6 Cognitive Factors: Learnability

Gestures that are developed via User-led methodologies build upon users’ collective past experiences and knowledge, and are selected based on how high of an agreement score all of the gestures that were elicited for a given referent are [9, 58, 326, 362]. In theory, this should result in gestures that are easy to learn because they are the most common gestures that were elicited from participants, and are thus the most familiar or easy to recall. However, because participants rarely get to reflect or refine the entire set of gestures they have created during an experiment [82, 84, 105, 295, 370, 373], they do not have the opportunity to holistically consider how similar subsets of gestures may be, and thus how easy it would be to learn the entire gesture vocabulary. If participants are provided with refinement opportunities, then User-led methodologies could be a viable method to evaluate the learnability of a gesture vocabulary.

In terms of Expert-led methodologies, if an expert can assess how similar a subset of gestures is (e.g., [19, 197, 330]) or is aware of best practises for gesture design (e.g., use mnemonics [277] semaphoric or pantomimic mappings [6], be analogous to the physical effects of the real world [387], and so on) avoid, then it is possible that experts could perform preliminary assessments about the learnability of a gesture vocabulary. However, because experts have extensive experience watching users perform gestures or interact with devices, and have immense experience interacting with technology themselves, their very familiarity with gestural interaction may lead them to underestimate the difficulty that a target population of users may have while learning a gesture vocabulary. This was demonstrated by Nacenta et al., who found that gesture vocabularies that were created via User-led methodologies were easier for participants to recall after 24 hours and were preferred to gesture vocabularies that were created by experts [224].

As the literature does not yet have a clear understanding of the role that discoverability and intuitiveness have on the learnability of gesture vocabularies, and there is much ongoing research attempting to understand how to best teach gestures to users (e.g., [13, 36, 49]), it is difficult to empirically model or abstract which movement and function mappings will be best for a vocabulary of gestures. As such, Computationally-based methodologies do not currently support the evaluation of gesture learnability.

4.7 Cognitive Factors: Transferability

Transferable gestures are those that can be executed using different modalities, in different contexts, or with different devices [13, 17, 369]. To increase the likelihood of transferability, some experts have suggested that chaining or chunking techniques may be useful as they allow constituent gesture components to be reassembled as needed [171, 325], while others have recommended the use of mindful abstraction and performance metaphors during training [369]. Because the design of such gestures requires substantial expertise and holistic knowledge about user behaviours and a system’s functionality, and also require a deep understanding of the characteristics of different modalities, contexts, and devices [325], the use of Expert-led methodologies that involve end users in the evaluation process [13, 17] are the only viable way to design gesture vocabularies that fulfill such needs.

Although users may reuse gestures that they are familiar with during User-led methodologies, this does not guarantee that the gestures they propose will be appropriate for other devices, contexts, or applications [214]. Because understanding how transferable gestures are required expertise [325], it is unlikely that users will be able to design gestures that are universal or applicable to a variety of contexts. As such, User-led methodologies are an ineffective methodology to use to ensure that gestures are transferable.

As the research literature does not yet have a comprehensive understanding of the characteristics of gestures that make them transferable, there are no models or frameworks that can be integrated within a computational-based gesture design methodology to evaluate the transferability of a gesture vocabulary. Thus, similar to User-led methodologies, computational-based methodologies are not suitable to assess gesture transferability.

4.8 Physical Factors: Complexity

As noted in Section 3.4.1, there are many measures that can be computed by experts or integrated within computational models to measure the complexity of a gesture, such as counting the number of steps that need to be performed, the number of discrete movements that are made, or the number of repetitive movements that are required [285]. Experts or computational models can also make use of more complex measures such as those found in Leiva et al. Omnis Prædictio software, which can measure the turning angle, density, aspect, and perimeter to area ratio of a gesture [185]. Because these measures exist, and they are trivial to compute, both Expert-led and Computationally-based methodologies are suitable methodologies to ensure that the complexity of individual gestures, and entire gesture vocabularies themselves, are as low as possible.

On the other hand, with User-led methodologies, it is unlikely that users will create gestures that are complex to produce or intricate to perform, especially as users are likely to reuse gestures from other devices or situations during elicitation studies [214]. However, because not all User-led methodologies provide users with the opportunity to review the entire set of gestures they have proposed or see the final gesture set after acceptability criteria have been applied [82, 105, 295], they may neglect to consider the overall complexity of an entire gesture vocabulary. For this reason, depending on the specific instantiation of User-led methodology that is utilized, users may or may not be able to evaluate the complexity of the gestures that they create.

4.9 Physical Factors: Efficiency

Expert-led methodologies are a great way to evaluate efficiency because experts can harness established techniques or metrics within their experimental software to measure how efficient a gesture is. The simplest such metric is the time necessary to perform each gesture [278], however, depending on the type of gesture and input modality, one could also apply Vatavu et al. [330] or Isokoski’s [129] efficiency measures, Cao and Zhai’s CLC Model [59], Leiva et al. KeyTime technique for unistoke gestures [181], Bjerre and Pedersen’s [43] or Rice and Lartigue’s Touch-level Models, Rice and Lartigue [268] or Batran and Dunlop’s enhanced Keystroke-Level Models [35], or Leiva et al. GATO technique [183].

Because the models and techniques that experts can use within Expert-led methodologies utilize much of the same type of information as that required by Computationally-based methodologies, one could imagine that such efficiency measurement techniques could be integrated into the software used for Computationally-based gesture design. While such methodologies currently do not include measures of efficiency, they would allow for such computations to occur automatically alongside evaluations of recognizer accuracy, thus further increasing the utility of Computationally-based methodologies.

Because the focus of many User-led methodologies is to find mappings between functions and movements that are easy discover, it is not common for users to be concerned with efficiency [334]. As users are often asked to design one gesture per referent or perform a small number of repetitions of a movement until they are “happy” with the gesture they have made [334, 340], users thus have limited opportunities to become familiar with the gestures they have designed or perform them for extended periods of time. This often results in User-led methodologies being unable to evaluate efficiency unless users are explicitly asked about how easy the gestures they have designed would be to perform [22, 276, 282, 362].

4.10 Physical Factors: Ergonomics

As identified within research by Barclay et al. [34], Hinckley et al. [118], Nielsen et al. [229], and Rekik et al. [266], there are many facets of ergonomics that one should attend to while creating gesture vocabularies. Although users may have personal experience with the effects of performing repetitive or tiring movements [221, 225], unless they are explicitly told to attend to characteristics such as the forces exerted on the joints while performing various movements, which types of movements can stop body fluids from flowing, or the arm fatigue that may occur when performing certain motions [282], it is unlikely that they will consider these characteristics. As such, User-led methodologies that do not inform the user about these characteristics are the least suitable to use when creating gesture vocabularies.

On the other hand, many experts are aware of the role that such ergonomic characteristics can have during gesture production. In some cases, this may be due to their years of experience watching participants complete tasks during user studies (i.e., seeing the gorilla arm phenomenon occur first hand [44]), receiving feedback about how uncomfortable certain gestures are to perform [7, 80, 151, 165, 269], or their awareness of the literature on gesture ergonomics [57, 107, 210, 246, 258]. Regardless of how they obtained their expertise about ergonomics, the use of Expert-Led methodologies allows these expert designers to implicitly evaluate their gesture vocabulary for ergonomics issues throughout the implementation, testing, and debugging phases of their experimental software or gesture recognition algorithm development. This unique opportunity to iteratively refine a vocabulary through first-hand experience cannot be found with other types of methodologies.

Much work has also been conducted to model the bio-mechanical characteristics of the human body, and has resulted in a number of bio-mechanical models or metrics that describe movement. For example, work by Hincapié-Ramos et al. created the consumed endurance metric, which is the ratio of total time to endurance and can be used to characterize the gorilla arm effect during mid-air interaction [117]. In similar research by Jang et al., the cumulative arm fatigue of mid-air interaction was computed using calculations about a user’s perceived exertion and arm motion kinetics [133]. Other research such as that by Bachynskyi et al. identified 11 clusters of bio-mechanical movements that had distinct performance, muscular, and spatio-temporal characteristics and could be used to summarize muscle movement during interactive tasks [32]. Although such models and metrics have yet to be integrated into the computational-based gesture design tools that are available today, one could assume that such integration would enable the automatic detection of some characteristics that may be too complex or dangerous for a human designer to test and evaluate themselves, e.g., if a gesture stopped body fluid from flowing to a certain body part or after how many repetitions a repetitive stress injury may occur.

4.11 Physical Factors: Occlusion

Because experts are aware of the graphics or information that will be provided on an interface before, during, and after the production of a gesture, it is unsurprising that experts, and Expert-led methodologies, are best suited to consider issues related to occlusion. As demonstrated by Brandl et al. work with multi-touch tabletops and styli, experts often have ample prior experience and knowledge about how the hands or arms may move around a surface and could block or occlude content that is displayed [51]. In the case of Brandl et al. research, the researchers used their prior knowledge to create handedness-aware menus and adapt the location of UI widgets so that the information they presented to users would not be covered by users’ hands or styli.

Although not currently integrated within the computational tools that have been designed to aid in the development of gesture vocabularies, computational models such as Vogel et al. model of hand occlusion would be a very useful addition [339]. As this research showed, the small variety of hand, wrist, and forearm postures that are used while inking can be abstracted into a circle and rectangle unit that can be reoriented around the position of a stylus nib, provided that the location of the stylus nib and the handedness of the user is known. This model can then be used to relocate the position and orientation of graphics and UI widgets on an interface [336]. Similar models have also been developed for single user [74, 338] and multi-user [223] touch-based interaction on multi-touch tabletops [27, 279, 380].

User-led methodologies are the least effective at considering occlusion because users are often unaware of how their, or others’, movements can occlude information that is being presented [31]. In addition, these methodologies typically use a simple test environment to elicit gestures or ask participants to make modifications to their movements to overcome technical limitations, thus making it difficult for users to fully understand the visual context that will be present during the real world performance of the gestures [62, 135, 157, 255, 319, 321, 333, 362]. These practises, and thus the use of User-led methodologies, decrease users’ abilities to design gestures that will have a negative influence on user experiences.

4.12 System Factors: Feedback

The design of effective feedback or feedforward solutions for gestural systems requires one to have a holistic understanding of the ways in which the timing, duration, modality, semantic meaning, and presentation schedules of feedback will influence gesture vocabularies [13, 36, 293]. Given that much expertise is required to (i) be aware of such factors and their interdependencies, (ii) understand how to measure them, (iii) be able to design feedback techniques that fulfill them, and (iv) be able to interpret users’ reactions to them, it is unsurprising that Expert-led methodologies are the only viable methodologies to use when designing effective feedback for gesture vocabularies.

While Computationally-based methodologies may be able to help designers and developers assess how the timing or duration of a gesture can influence its recognition rate, current tools and software are unable to simulate users’ responses to various forms of feedback. If such tools were extended to include information from crowdsourced studies that evaluated different feedback techniques, similar to how CrowdLearner enabled the rapid creation of mobile gesture recognizers [11] and the crowd was used to evaluate the social acceptability of spatial user interactions on head-mounted displays [8], then it may be possible for Computationally-based methodologies to be viable solutions to evaluate feedback in the future.

Although users may be quick to recognize when they perform an action and do not receive any feedback [36, 75, 93, 164, 189], their lack of knowledge about how an interface senses their input, how it determines if feedback should be provided, how it determines which feedback should be provided, and how it generates or visualizes feedback results in users, and thus User-led methodologies, being unable to evaluate feedback during gesture design.

4.13 System Factors: Recognition

Computationally-based methodologies are the best suited of the three categories of methodologies to ensure that gesture vocabularies can be correctly recognized by a system because almost all of the tools that have been created to assist with gesture design include recognition accuracy as a measure that can be quantified or visualized [88, 158, 172, 195, 207, 278, 292, 327]. Because designers and developers can directly see potential recognition performance in the same tool that they are using to generate gestures, it becomes trivial to use such methodologies to evaluate the impact that different recognition algorithms, datasets, or sensing techniques have on a gesture vocabulary.

While creating gestural interfaces and running experiments to evaluate them, it is common for experts to have some form of gesture recognition within their system, either via a series of algorithms that monitor sensor data (e.g., [36, 48, 63, 224, 377]), or by using a Wizard-of-Oz approach (e.g., [149, 154, 297]). Thus, many experts are aware of which data would be needed to recognize a gesture, how the similarity between gestures can influence recognition, and the role that poor recognition would have on user experiences. Therefore, it is not uncommon for Expert-led methodologies to consider recognition during gesture vocabulary creation.

Because many of the users who participate in experiments that are conducted using User-led methodologies are not developers or designers, but rather laypeople, their knowledge about how a technology works, the data that can or is sensed while a gesture is performed, or the techniques that are available to recognize a gesture, is lacking [71, 105, 135, 258, 334]. Thus, they are often the least qualified to make judgements about how recognizable a single gesture or an entire gesture vocabulary will be, resulting in User-led methodologies being the least suitable methodology to evaluate this factor.

4.14 Summary

The above evaluation used a factor-centric view of the three categories of gesture design methodologies that are commonly used within the literature. Based on this evaluation, it appears that User-led or Expert-led methodologies are better suited to consider situational factors, however, the end users involved in these design processes may not have the prior knowledge required to leverage the strength of a given modality. User-led methodologies also seem suitable to evaluate some cognitive factors, however, novice users may not be aware of the many issues that can hamper these factors, so it is difficult to definitively recommend a category of methodologies to use. Expert-led and Computationally-based methodologies appear to excel at supporting physical and system factors because the extensive knowledge and expertise that experts have accumulated, and the existence of models derived from this knowledge, can be integrated into systems to evaluate these types of factor categories.

Although each methodology has its own strengths and weaknesses, the analysis demonstrated that no one methodology is currently capable of considering all factors, however, Expert-led methodologies are capable of evaluating a subset of factors from each factor category. As it is currently unclear the degree to which these methodologies could be modified to consider each factor or more factors within a factor category, we next identify research directions that are needed to clarify and identify how these methodologies could be used more fruitfully in the future.

5 Future Research Agendas

The review of the literature on gesture design, the identification of the factors that are important to consider when designing gesture vocabularies, and the analysis of how different methodologies do or do not support the evaluation of different factors, suggested that there are many research avenues that should be considered to improve the design of gestures. In what follows, we present a potential gesture design process that could be used to iteratively evaluate a subset of factors that a designer deems appropriate for a given use case. We also discuss some additional factors that arose during the literature review and highlight how they could impact the ever-changing ecosystem of gesture design.

5.1 A Potential User-Centered, Factor-Centric Gesture Design Process

As indicated above, one limitation of existing gesture design approaches is that there is a mismatch between the sets of factors a methodology is suitable for and the sets of factors gesture designers may wish to optimize. Rather than selecting one methodology and being unable to evaluate a given factor, one possible alternative is to take a user-centered, factor-centric approach, wherein one uses a process that enables different factors to be iteratively refined, evaluated, or removed during design time, whilst respecting the inter-dependencies that exist between various factors.

One possible instantiation of such a design process is depicted in Figure 2. Within this example process, there are four phases that a designer would work through while designing their gesture vocabulary: identifying the preconditions that will dictate the user experience with the gestures, assigning prioritization weights to each of the evaluation factors based on the requirements from the preconditions, creating an initial gesture design using an existing methodology (or using a stock gesture vocabulary), continually refining and evaluating the vocabulary keeping in mind the weights that were assigned to different factors, and finally, arriving at the final gesture vocabulary.

Fig. 2.

5.1.1 Understanding Requirements and Prioritizing Factors.

With such an approach (Figure 2), designers would first determine and understand the requirements of their gesture vocabulary. In this example, user experience factors such as context, modality, recognition, and feedback, which will influence the user experience and the designer’s goals, are examined to determine the degree to which other factors should be weighed when iterating on a gesture vocabulary. For example, if a system is to be used with a mall kiosk, the discoverability of the gestures might be weighed higher than ergonomics because a given gesture may be performed infrequently. The identification of requirements will not only help designers form a complete understanding of the goals that are guiding their user experience, but also serve as a sanity check to ensure that a designer is aware of all the challenges their gesture vocabulary may encounter.

As many have demonstrated, there are tradeoffs between many of the factors that were identified in [224, 266, 330, 335]. As these tradeoffs make it impossible to design a gesture vocabulary that would fulfill all the requirements of the factors, a designer should then prioritize the evaluation factors (and reconsider the effectiveness of the feedback and the recognition accuracy), given the requirements of the preconditions. One way to do this would be for a designer to assign each factor an importance descriptor, such as high, medium, or low [353]. As an example, if the end user population is blind, then it may be appropriate to rate factors such as Occlusion or Social Acceptability low, whereas factors such as Discoverability and Recognition should be rated high.

It is also worth noting that executing the various evaluation and refinement methodologies also requires time and resources (e.g., equipment and algorithms). Therefore, we recommend gesture designers consider the accessibility and suitability of the different methodologies regarding each of the factors when adopting this new gesture design approach.

5.1.2 Iteratively Evaluating and Refining Gesture Vocabulary.

After prioritizing the factors, a designer could then use their preferred method of choice to seed an initial gesture vocabulary (e.g., using a User-led, Expert-led, Computationally-based methodology, or using an existing, stock gesture vocabulary). Once the initial gesture vocabulary has been developed, the designer could then work through each of the factors based on their priority, iteratively evaluating, refining, and modifying the proposed gesture vocabulary using the methods summarized in Section 3 and Table 2. When a tradeoff must occur, factors with high or medium ratings should receive priority over lower rated ones. Factors that are interrelated, such as Complexity and Efficiency, should be evaluated holistically so that any changes that result do not cause a cascading effect in subsequent iterations over the gesture vocabulary.

The point at which each factor is considered should also be determined based on how likely each factor would be to cause detrimental ripple effects across the entire gesture vocabulary, similar to what is recommended for factors that are interrelated. Factors that, when considered, would require modifications to a number of gestures or the entire vocabulary, should have higher priority and thus be evaluated before those that would require minimal modifications. For example, a factor such as Social Acceptability could require that an entirely new gesture be created to conform to cultural or location norms. In comparison, there are user interface changes that could be implemented to decrease the Occlusion of a gesture. Therefore, within the context of this example, Occlusion should be considered later in the process than Social Acceptability.

A challenge of any iterative design process is to determine when to stop. Most often, an endpoint is reached when one runs out of time or resources, however, achieving the perfect gesture vocabulary may never be possible as the design process itself is inherently one large set of tradeoffs. After iterative evaluation and refinement, the resulting gesture vocabulary should fulfill the requirements of the preconditions and the factors that were deemed to be high importance because important factors should have received enough revision, and been revised early enough, during the process.

5.1.3 Summary.

Although this process is but an example, we believe that the creation and use of user-centered, factor-centric approaches have the potential to ensure that the gesture vocabularies that are designed will be the most usable, safe, and beneficial for their target populations. Different from existing gesture design methodologies, which focus on designing suitable gestures by leveraging the expertise, experience, and capability of experts, users, or systems individually, taking a user-centered, factor-centric approach to gesture design should enable the benefits of each different methodology to be used, when appropriate.

Future work is obviously needed to validate this new user-centered, factor-centric gesture design approach by comparing it with existing methodologies, not only to ensure that the resulting gesture approach results in more appropriate gesture sets, but also to evaluate the costs associated with their use (e.g., number of iterative cycles needed, monetary cost of running multiple user studies, etc.) and gain insights into reducing the cost of iterations. If for nothing else, we hope that the proposal and outline of such an approach will improve awareness about the challenges inherent in the design of gestures, the dependencies that exist between different factors, and the potential pitfalls of using the most popular or easy-to-use method to create gesture vocabularies.

5.2 The Evolving Ecosystem of Gesture Design Factors

Although the literature review and analysis identified 13 factors that are critical for the design of gestures, it may never be possible to identify and distill the complete set of factors that influence how users remember, learn, and perform gestures, and how systems can support them in these tasks. We first discuss the role of factor interdependencies on gesture design, detail some additional, higher-level factors that emerged, and then provide commentary on the “naturalness” of gestures.

Understanding the complex connections among various factors is essential for gesture design, as it enables designers to better understand the implications of prioritizing different factors and resolving tradeoffs when designing gesture vocabularies. The identification of the 13 factors was a first step toward constructing a comprehensive picture of gestural interaction. However, much future work is needed to better understand how to disentangle factors such as Complexity and Efficiency, Intuitiveness and Learnability, Ergonomics and Recognition, Occlusion and Feedback, and so on, as well as if and how they should be designed for independently. In addition to the identified factors, it is also possible that new primitive factors may arise and should be added to the ecosystem of factors as the field furthers its understanding of gesture design. While the user-focused, factor-centric approach that was proposed is extensible and should allow for the addition of new factors, care must be taken to identify and understand the relationships that exist between the newly discovered and existing factors. To encourage the inclusion of new factors into the ecosystem, we plan to construct an online, accessible, and community-maintained database and visualizations of the factors and relevant research articles about these factors to ensure the ecosystem can stay relevant and will evolve with time.

During the literature review, we encountered three other factors that are not included in the final set of 13 factors. The first factor that emerged was the accessibility of gestures, e.g., for older adults [179, 253], as well as with users who have motor [388], or visual impairments [142, 143, 145]. Within the factors that we identified, we did not include accessibility as a distinct factor, because accessibility is a higher-level factor that can be decomposed into a number of the factors that we identified, such as Context, Complexity, Learnability, and Feedback. For example, accessibility is related to situational factors such as Context, because the requirements of users’ accessibility need to be understood and considered by designers. It is also related to cognitive and physical factors such as Complexity and Learnability, because a user’s cognitive and physical abilities may pose different design constraints on these factors [179, 253]. Accessibility is also be associated with system factors, such as Feedback, because tactile instead of visual feedback may need to be evaluated for users with visual impairments [143]. As this factor, and likely others, can be decomposed into a subset of the 13 factors that this present research identified, it seems important to first establish the set of primitive factors before the inclusion of higher-level factors.

The second factor was multi-user gestural interaction (i.e., the use of gestural interaction in multi-user collaborative or cooperative settings [174, 215, 216]). The factor identification and analysis provided within this article was limited to single user use cases because, as the analysis began to demonstrate, there was yet to be a consensus about all of the factors implicated in the design of single user gestures, let alone a methodology that could evaluate all of the factors at the same time. As the multi-user use case is a more complex version of the single-user scenarios that are commonly evaluated and used as use cases in the literature, it did not seem appropriate to dive into this more complex topic before a solid foundation of single user gestural interaction was developed. While the findings about some factors such as Social Acceptability and Context may seem, at first glance to be the most applicable to these scenarios, there is much need and opportunity to explore how all the 13 factors influence these scenarios and also to identify any additional factors that could be unique to multi-user settings (e.g., multi-user cooperation, user roles, territoriality, and so on).

Lastly, one of the most popular factors that emerged was naturalness, which has also been referred to as fluidity, feelings, satisfaction, controllability, affect, and familiarity.¹⁴ Naturalness, however, is complex factor to consider because it is difficult to define and measure objectively. Baudel and Beaudouin-Lafon argued that gestural interaction is natural because it builds upon the manipulation and gestural communication skills that humans acquire naturally [37]. Wigdor and Wixon, however, proposed that naturalness is not in fact a design factor at all, but rather an experience goal, wherein a system should evoke a feeling of effortlessness and enjoyment that results in users “act[ing] and feel[ing] like a natural” in its use [357]. These “natural” feelings are often compounded by many relevant factors, such as Intuitiveness, Learnability, and Ergonomics. The pinch to zoom gesture, for example, does not mimic or build upon an existing hand motions or metaphors that users are accustomed to, but over repeated presentations and performances of the gesture, has been described as a fluid, satisfying, familiar gesture by users who perform it everyday [357].

Gestural interaction has often been used as one example of a “natural” user interface when compared to the Windows, Icons, Menus, Pointer user interface metaphor [320]. Despite naturalness having been repeatedly referred to in the literature, the definition of naturalness has been inconsistent [37, 357] and was often used alongside descriptions of other factors such as Learnability, Intuitiveness, and Complexity [119, 121, 359, 376]. Given the lack of consensus regarding what constitutes a “natural” movement or feeling and the fact that naturalness is often referred to as an experience goal rather than as a design factor [101, 208, 300], naturalness was not considered to be one of the factors crucial to the design of gestures within the context of this work. We respectfully encourage the community to avoid the use of the terms “natural” or “naturalness” when describing interactive systems, and instead focus on using terms to describe the situational, cognitive, physical, and system factors that govern gestural interaction, such as those that were identified, analyzed, summarized in this work.

6 Conclusion

Over the decades, gestural interaction has become an increasingly dominant way to interact with computing devices, such as touchscreens, mobile phones, or watches, or interact with immersive experiences, such as augmented and virtual reality. Despite the widespread adoption of devices with gesture sensing capabilities, the design of gesture vocabularies that suit the various needs and constraints of an application, it’s context, and it’s users, is still a challenging task. This work documents the efforts undertaken within the HCI community to design and understand gestural interaction, through the invention of new input modalities and sensing techniques, as well as new ways to measure and evaluate gesture vocabularies.

While a significant amount of work has explored different gesture design methodologies and techniques that can address the problems inherent in gestural interaction, there is a lack of work that has holistically understood, identified, or analyzed the various factors that are essential to consider when designing gestural vocabularies. This work reviewed the literature on gesture design and summarized the three main gesture design methodologies used today, i.e., Expert-led, User-led, and Computationally-based, as well as identified 13 factors, separated into four categories, that were crucial to the design of gestural experiences. These including Situational (i.e., Context, Modality, and Social Acceptability), Cognitive (i.e, Discoverability, Intuitiveness, Learnability, and Transferability), Physical (i.e, Efficiency, Complexity, Ergonomics, and Occlusion), and System (i.e., Feedback and Recognition) factors. Best practices to evaluate and refine gestures were also summarized for each of the identified factors, and a factor-centric analysis of the existing gesture-design methodologies demonstrated that none of the existing methodologies satisfy all the identified factors.

We thus proposed how iterative, user-focused, factor-centric methodology approaches could serve as a practical guide for gesture designers to develop suitable gesture vocabularies in the future. Such methodologies could identify the preconditions that will influence user experiences with the gestures, assign weights to each factor to indicate their relative priority, call for the creation of an initial gesture vocabulary, and encourage designers to continually refine and evaluate the gesture vocabulary while being mindful of the importance of different factors. Lastly, we concluded with a short discussion on three additional factors that we encountered (i.e., multi-user, accessibility, and naturalness) and discuss how these factors are either compositions of the 13 factors that we identified or are descriptors of experience goals, rather than design factors.

Through these exercises, we believe that this article has provided a foundational understanding of the factors that influence gestural design, how these factors can be evaluated, and the implications that arise when different methodologies do or do not allow for an evaluation of such factors. Although there is much research still to be done within respect to modeling and understanding factors such as Transferability, Feedback, and Intuitiveness, in addition to examining more complex scenarios involving multiple users, this research should help designers, developers, and researchers develop a holistic understanding of the elements and characteristics that govern gestural interaction and should pave the way for much future innovation with respect to the gesture design methodologies, tools, and evaluations that will be possible.

Footnotes

Reviewed literature included [3, 6, 10, 15, 23, 28, 31, 33, 36, 37, 38, 45, 47, 48, 52, 54, 55, 60, 61, 62, 64, 65, 68, 69, 73, 86, 90, 91, 93, 96, 97, 98, 100, 102, 103, 104, 109, 110, 114, 115, 116, 117, 118, 119, 122, 124, 132, 134, 138, 142, 142, 144, 151, 157, 160, 161, 167, 169, 170, 175, 176, 177, 179, 186, 187, 189, 191, 192, 193, 201, 206, 208, 213, 215, 218, 219, 227, 234, 235, 236, 237, 238, 240, 242, 244, 246, 250, 252, 253, 255, 258, 261, 264, 267, 271, 280, 281, 284, 288, 290, 291, 294, 295, 301, 302, 304, 311, 314, 315, 316, 321, 323, 324, 329, 331, 333, 338, 342, 344, 345, 347, 349, 350, 351, 354, 355, 356, 362, 364, 366, 367, 368, 371, 372, 375, 382, 384, 386].

See references including [6, 15, 17, 28, 31, 37, 40, 47, 48, 54, 55, 68, 69, 73, 85, 86, 90, 93, 94, 95, 96, 97, 100, 108, 119, 122, 125, 132, 134, 138, 142, 147, 157, 161, 166, 176, 186, 187, 191, 201, 204, 208, 213, 218, 227, 235, 236, 240, 241, 244, 250, 257, 265, 266, 280, 282, 283, 284, 288, 300, 301, 302, 303, 304, 310, 311, 315, 323, 324, 331, 335, 338, 343, 344, 345, 346, 347, 348, 350, 367, 371, 376, 377, 382, 385].

See references including [2, 5, 14, 82, 87, 99, 134, 159, 211, 240, 256, 263, 269, 288, 299, 335, 350, 358].

⁴

Reviewed references included [1, 9, 31, 58, 110, 175, 271, 289, 326, 331, 332, 334, 340, 345, 360, 361, 362, 378, 385, 386].

⁵

See references including [1, 3, 4, 38, 54, 58, 73, 86, 98, 100, 101, 102, 115, 116, 128, 130, 134, 157, 169, 170, 173, 176, 177, 179, 191, 206, 224, 229, 235, 236, 237, 244, 250, 252, 255, 260, 282, 294, 295, 300, 301, 319, 342, 350, 371, 384].

⁶

As suggested by the following: [1, 6, 9, 10, 13, 19, 21, 24, 33, 34, 36, 37, 38, 49, 50, 55, 83, 93, 94, 98, 101, 104, 116, 137, 169, 175, 178, 186, 191, 192, 197, 208, 224, 244, 250, 266, 277, 283, 284, 288, 294, 295, 300, 318, 324, 326, 330, 333, 342, 344, 347, 356, 364, 366, 368, 371, 372, 387].

⁷

See references including [13, 17, 34, 50, 56, 67, 93, 139, 171, 182, 248, 294, 295, 325, 368, 369, 384].

⁸

See references including [1, 15, 24, 37, 49, 54, 59, 73, 79, 82, 91, 94, 103, 105, 165, 175, 179, 185, 187, 188, 240, 244, 246, 255, 281, 285, 291, 295, 318, 324, 333, 335, 362, 368, 372, 381, 386, 387].

⁹

Suggested references include [22, 35, 43, 56, 59, 103, 125, 129, 136, 181, 183, 268, 276, 278, 288, 311, 314, 330, 334, 364, 366, 387].

¹⁰

See references including [1, 32, 34, 37, 44, 57, 62, 90, 98, 100, 107, 117, 118, 120, 125, 133, 210, 229, 236, 240, 244, 246, 258, 266, 267, 282, 300, 315, 319, 323, 335, 342, 350, 351, 366, 367, 372, 374, 375, 382].

¹¹

See references including [27, 31, 40, 50, 51, 74, 93, 201, 223, 279, 291, 336, 337, 338, 339, 346, 354, 355, 374, 379, 380, 387].

¹²

See references including [17, 36, 75, 86, 91, 93, 98, 103, 115, 120, 163, 164, 189, 233, 234, 275, 287, 293, 298, 333, 337, 342, 351, 354, 355, 356, 379, 384].

¹³

Suggested references include [12, 20, 24, 29, 34, 40, 64, 88, 90, 100, 108, 109, 119, 124, 132, 136, 141, 147, 150, 151, 158, 160, 163, 164, 165, 166, 167, 172, 186, 187, 195, 203, 207, 219, 222, 230, 234, 238, 241, 244, 254, 257, 262, 264, 265, 266, 270, 273, 278, 283, 292, 300, 307, 314, 315, 321, 327, 342, 343, 348, 351, 354, 372, 387, 389].

¹⁴

See references including [1, 28, 29, 37, 54, 55, 65, 73, 86, 91, 98, 100, 101, 102, 103, 108, 110, 115, 116, 119, 134, 157, 169, 170, 175, 176, 177, 186, 187, 206, 208, 213, 226, 235, 235, 237, 238, 242, 244, 246, 250, 252, 255, 258, 281, 282, 284, 288, 291, 294, 295, 301, 302, 314, 319, 323, 324, 329, 333, 347, 349, 350, 362, 368, 382, 384].

References

[1]

Hajar G. H. Abadi, Lim Yan Peng, and Ali Mohammad Hossein Zadeh. 2012. Guessability study on considering cultural values in gesture design for different user interfaces. In Proceedings of the International Proceedings of Economics Development and Research, Vol. 37. 1–4.

Abstract

1 Introduction

2 Designing Gestures Today

2.1 Expert-Led Methodologies

2.2 User-Led Methodologies

2.3 Computationally-Based Methodologies

2.3.1 Design-By-Demonstration.

2.3.2 Model-Driven Development.

2.4 Summary

3 Factors Implicated in Gesture Design

3.1 Methodology

3.2 Situational Factors

3.2.1 Context (i.e., Users, Environments, and Tasks).

3.2.2 Modality.

3.2.3 Social Acceptability.

3.3 Cognitive Factors

3.3.1 Discoverability.

3.3.2 Intuitiveness.

3.3.3 Learnability.

3.3.4 Transferability.

3.4 Physical Factors

3.4.1 Complexity.

3.4.2 Efficiency.

3.4.3 Ergonomics.

3.4.4 Occlusion.

3.5 System Factors

3.5.1 Feedback.

3.5.2 Recognition.

3.6 Discussion

4 Holistic Importance of the Identified Factors to Gesture Design

4.1 Situational Factors: Context

4.2 Situational Factors: Modality

4.3 Situational Factors: Social Acceptability

4.4 Cognitive Factors: Discoverability

4.5 Cognitive Factors: Intuitiveness

4.6 Cognitive Factors: Learnability

4.7 Cognitive Factors: Transferability

4.8 Physical Factors: Complexity

4.9 Physical Factors: Efficiency

4.10 Physical Factors: Ergonomics

4.11 Physical Factors: Occlusion

4.12 System Factors: Feedback

4.13 System Factors: Recognition

4.14 Summary

5 Future Research Agendas

5.1 A Potential User-Centered, Factor-Centric Gesture Design Process

5.1.1 Understanding Requirements and Prioritizing Factors.

5.1.2 Iteratively Evaluating and Refining Gesture Vocabulary.

5.1.3 Summary.

5.2 The Evolving Ecosystem of Gesture Design Factors

6 Conclusion

Footnotes

References

Cited By

Index Terms

Recommendations

A comparative evaluation of finger and pen stroke gestures

A Survey on Hand Gesture Recognition

Differences and Similarities between Finger and Pen Stroke Gestures on Stationary and Mobile devices

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options