Keywords

1 Introduction

The advent of the Macintosh in 1984 was an important event in the history of human-machine interface and computation in general. Since then, people fundamentally changed the way people thought of computers and how to interact with them. Ever since this shift in human-computer interaction (HCI), technology in general has drastically evolved. Memory capacity, processing speeds and graphics power have skyrocketed. The Web and, more recently, the cloud have revolutionized the way we interact with computers. Interfaces have though yet to change (Underkoffler 2010).

Humans still mainly interact with digital media using mouse and keyboard, sticks, levers, buttons or touch, with everything displayed on a screen. We are seeing a shift in how we interact with digital media though. Voice control combined with artificial intelligence is becoming more and more implemented in people’s interaction patterns. But as for now, it only works as a supplement to screen-based interfaces and can’t replace them completely.

Humans’ instinctive way of interacting with objects is using their hands. It is, therefore, logical to look at gesture control when we talk about the next generation of HCI. Gesture control is not something new. Steven Spielberg’s movie Minority Report where Tom Cruise is controlling a futuristic gesture-based interface is often used as a reference in these discussions. And with good reasons; the movie producers actually hired a team of designers and engineers to design the interface as an R & D (Underkoffler 2010). The movie premiered in 2002 and ever since have different companies tried to make use of this technology and revolutionize how we interact with digital media.

To discuss the potential of gesture-based interaction this article will firstly look at some key milestones in the history of gesture control, then look at different domains and digital media where gesture control can be applied.

2 Background

2.1 Method

The article relies mostly on reviewing literature that was relevant to the subject. Articles in domains of computer science, psychology, robotics, engineering and interaction design were focused on. It was important to understand the technical challenges of gesture-based interaction by looking at articles from computer science, engineering and robotics but also get an understanding of what had been done in terms of research and testing, therefore looking at articles from psychology and interaction design.

2.2 Gesture Control Defined

The term Gesture control is used in a wide range of contexts and can be interpreted differently. In Gartner Glossary it is defined as: “Gesture control is the ability to recognize and interpret movements of the human body in order to interact with and control a computer system without direct physical contact” (Gesture Control 2019).

The definition above covers “movements of the human body”, where this article will merely focus on hand gestures. Combinations of controller inputs and gestures (i.e. VR controllers) will neither be in the scope of this article.

2.3 Brief History of Gesture Control

To get an understanding of the future of gesture control, one has to look at where gesture control has been used in the past. This section will mention some key milestones in the history of gesture control.

The first prototype of a gesture tracing glove emerged in 1977 and was called the Sayre Glove. It used flexible tubes with a light source at one end and a photocell at the other, which were mounted along each finger of the glove. When you bent your fingers, the light passing through the tubes would decrease. Later in the 70s and the 80s several different prototypes of gloves emerged, using different types of sensors (Premaratne 2014).

The use of cameras to recognize hand gestures started very early along with the development of the first wearable sensor gloves. There were many hurdles at that time in interpreting camera-based gestures. Coupled with significantly low computing-power available only on mainframe computers, cameras offered very poor resolution along with color inconsistency. Despite these hurdles, the first computer vision gesture recognition system was reported in the 1980s (Premaratne 2014). The MIT-LED glove was developed at the MIT Media Laboratory in the early 1980s as part of a camera-based LED system to track body and limb position for real-time computer graphics animation (Sturman and Zeltzer 1994). The first instance of a hand recognition system that totally relied on computer vision without markers was reported by Rehg and Kanade in 1993 and was called DigitEyes (Rehg and Kanade 1994).

Throughout history the gaming industry has been one of the leading forces to bring gesture control to the consumer market. From the Nintendo Power Glove in 1989, marketed as the future of game controller (Lee 2011) to the PlayStation Eye Toy in 2003, the Nintendo Wii in 2006 and the Xbox Kinect in 2009. Even though these devices do not fit the given definition of gesture control, they are still worth mentioning.

Around 2008 several companies start to focus on gesture control for the consumer electronics(CE) market. In IFA 2008, the second-largest CE show in the world (behind CES), Toshiba showed gesture control for TVs. Samsung, JVC and Hitachi presented TVs with gesture control at CES in 2008 and in 2009 (Shan 2010). Gesture control has also been implemented in mobile phones over the years. Sony Ericsson launched its Z555 in 2008 with the ability to mute or snooze the alarm by waving the hand to the build-in camera (Shan 2010). In 2019 Google launched the Pixel 4 that supports gesture control.

Leap Motion launched its sensor in 2011, that lets users control their computer by using gestures. The Dji Spark drone (nicknamed the “Selfiedrone”) launched in 2017, allows users to control the drone by using their hand instead of a controller.

2.4 Technologies for Tracking Gestures

Gesture control requires advanced technology and several different approaches have been tested over the years. This article will give an overview of the general approaches in the technology of gesture control (Fig. 1).

Fig. 1.
figure 1

Overview of technologies

Sensor Gloves.

Sensor Gloves was the first technology made for gesture control already back in 1970s. Sensor glove in essence is a wired interface with certain tactile or other sensory units that were attached to the fingers or joints of the glove, worn by the user (Premaratne 2014). These gloves offer high accuracy but are also highly intrusive compared to other technologies.

Vision-Based Gesture Recognition.

In recent years, more and more research is concentrated on vision-based hand gesture recognition. Compared to sensor gloves, vision-based recognition is more natural and comfortable- able, as it does not constrain the flexibility of hand movements (Premaratne 2014). The downside to this technology is that it lacks the accuracy that modern sensor gloves provide.

2D Cameras:

With camera sensors becoming low-cost and pervasive in CE products, vision technologies receive increasing attention, which allow unobtrusive and passive gesture sensing (Shan 2010). By using markers on your hands, either as colored gloves or as stickers on your fingers, the camera is able to track your hand gestures accurately (Premaratne 2014). The problem with this approach is that using gloves makes it more intrusive and might be ineffective in certain use cases. As camera technology advances, many examples occur of gesture tracking using algorithms to track gestures without using markers, delivering high accuracy. A problem with 2D camera tracking is that one might lose accuracy due to self-occlusion, where the hand from certain angels overlaps itself (Premaratne 2014).

Stereo Cameras:

Astereo camera is a camera with two or more lenses and can simulate human binocular vision. Today stereo cameras are implemented in high-end smartphones, but previously required a stationary setup with several cameras. With stereo camera setups for gesture tracking one has to use several cameras to get a 3D tracking of the hand and bypass the problem of self-occlusion (Premaratne 2014). Stereo cameras are more expensive than 2D cameras.

Radar Technology.

More recently, products using radar technology to track hand gestures have been released. Soli, Google’s gesture recognition radar chip is an example of this. It is a high-resolution, low- power, miniature gesture sensing technology. It is based on millimeter-wave radar that operates on the principle of reflection and detection of radiofrequency electromagnetic waves (Lien et al. 2016). Even though this technology is very promising, its full potential is yet to be seen.

Other Technologies.

Electromagnetic Sensors have also been used for tracking gestures. The Myo wristband tracks hand gestures by sensing tension in one’s arm muscles (Sathiyanarayanan and Rajan 2016). Ultrasonic waves is a cheap alternative to track hand gestures (Kalgaonkar and Raj 2009) as well as Infra-Red sensors, that can be used as a standalone sensor (Hillebrand et al. 2006; Kim et al. 2012; Megalingam et al. 2016) or combined with other inputs (Abraham et al. 2018).

The (Table 1) summarizes how the different technologies perform in relation to movability, mobility and accuracy. Where movability refers to the amount of intrusion for the user (i.e. a glove will be more intrusive than a free hand), mobility refers to the setup being stationary or mobile and accuracy refers to how accurate the technology can track your hand gestures. The classification of low, medium and high is in relation to each other, meaning that low in accuracy does not necessarily mean that it is inaccurate but less accurate than the others.

Table 1. Movability, Mobility and Accuracy of the given technologies

3 Findings

3.1 Gesture Control in Different Media

Gesture control can be used in a range of different domains, as presented below.

Gesture Control for Virtual Reality.

Gesture control for Virtual Reality (VR) has a huge potential as it can give visual feedback to the user. Many VR systems are depending on hand controllers to interact in the virtual world. These controllers try to give as much feedback to the user as possible by tracking some hand gestures as rotation and some finger movements but restrict the user’s movability. By using accurate gesture control technology one can map physical hands in VR 1-1, giving the user the feeling of embodiment in the virtual world (Haans and Ijsselsteijn 2012).

Research has shown that by engaging the user’s motoric system via the hand leads to a reduction in simulator sickness, which can be used as an argument for gesture control in VR (Stanney and Hash 1998).

Gesture Control for Augmented Reality.

In Augmented Reality (AR) visual information is layered on top of the world one sees. A visual representation of your hands is therefore not needed. But by using gestures to control something in the real world, AR can be highly beneficial for giving visual feedback that is needed.

Gesture Control for Screens.

Many examples of gesture control for screens have been given in the chapter about history (2.2). As humans have developed a well-known interaction pattern for interacting with screens, the arguments for adapting to another type of interaction have to be substantial. Don Norman argues that:

Gestures will form a valuable addition to our repertoire of interaction techniques, but they need time to be better developed, for us to understand how best to deploy them, and for standard conventions to develop so the same gestures mean the same things in different systems (Norman 2010).

Gesture Control for Digital Objects.

Interacting with objects using your hands is a natural part of human behavior. Using your hands to interact with digital objects and smart devices is therefore natural as well. Digital objects are in this article defined as devices that do not have a designated interface. This can be electronics and devices in a smart home such as lights, ovens speakers etc. These digital objects often go under the term Internet of Things (IoT) and Talal H. Noor argues that “Gesture control can help IoT services’ users to have a better experience when controlling the IoT products” (Noor 2018, p. 3894). In his article he explores the technical possibility of controlling IoT products by making signs with their fingertips (Noor 2018). By recognizing the user’s eyes and finger, gestures can be used to interact with a screen. This method makes the HCI more secure than a touch-based interface (Xu et al. 2018).

3.2 Potential Domains for Gesture Control

Gesture control has been explored in several different domains. This section will look at some domains where gesture control can have a big impact.

Gesture Control in Robotics.

Telerobotics allows operators to execute tasks from a safe distance and have been proven successful in many situations. But many tasks are still completed by human operators despite the presence of hazards and the costs associated with protective gear, training and additional waste disposal. Gesture control can in this example have a great advantage as workers still have performance and cognitive advantages over remote systems. This includes work-rate, adaptability, dexterity and minimal delay and latency issues during task planning and execution (Valner et al. 2018).

TeMoto is a teleoperation system that combines gesture control with voice control and a physical turn knob to interact with a telerobot. The system was tested on different robots in different scenarios; threading a needle with a robotic arm and navigating a rover and controlling its arm. The results show that untrained operators can quickly understand and use TeMoto as well as the well-established mouse input, even to run systems too complex to be efficiently operated with only a mouse input (Valner et al. 2018).

Gesture Control in Healthcare.

A domain that can benefit a lot from gesture control is the healthcare sector by offering a major advantage in sterility (Wachs et al. 2008) as well as aiding health professionals in situations where conventional interfaces might not be sufficient. In a case study, researchers gathered ethnographic evidence from surgeons about the concept of gesture-based control over the display of their patients’ radiographic scan data during surgery. This gave the surgeons direct access to their patients’ scan data without compromising their sterile working field and without needing to rely on other clinicians to interpret display instructions (Stevenson et al. 2016). Another example of gesture control in health care is Gestix, a vision-based hand gesture capture and recognition system that interprets in real-time the user’s gestures for navigation and manipulation of images in an electronic medical record (EMR) database (Wachs et al. 2008). Ultigesture is another example; a low-cost wristband able to track hand gestures for simple navigation. (Zhao et al. 2019). Using gesture control to operate a robotic microscope in surgery (Antoni et al. 2015), laparoscopic instruments (Arkenbout et al. 2018) and for controlling operating light (Hartmann and Schlaefer 2013) have also been explored.

Gesture Control for Space and Military.

Domains that leads to physical restrictions to bodily movement will have clear incentives for taking advantage of gesture control. Astronauts in Space are one example of this. Research has been done using sensor gloves to control a snake-like robot in space (Liu et al. 2016). More recently, Ntention, a start-up at the Norwegian University of Science and Technology, collaborated with NASA and the SETI Institute to make a glove that controls a drone by only using hand gestures specifically for use in space exploration (McDonald 2019). Using multimodal interaction, combining gesture control and speech, has been researched in the field of military technology to develop so-called “real-time solider-robot teaming” (Barber et al. 2016).

Gesture Control in Automotive Sector.

Gesture control has been increasingly applied to the automotive industry to reduce the distraction caused by in-vehicle interactions to the primary task of driving (Ma et al. 2016). Studies show that gesture control indeed required less attention off the road (gaze aversion) (Ma et al. 2016; Zöller et al. 2018). On the other hand, the types of gestures that are necessary/wanted while driving are important to consider. Users prefer not to use gestures to control functions that are directly related to their safety (as adjusting rearview mirror) or require high precision (as controlling air vent) (Ma et al. 2016).

Gesture Control in Heavy Industry.

A huge that domain can benefit from gesture-based interaction is the heavy industry. In construction, cranes can improve their control system by using gesture control (Pietrusewicz 2014). Mobile robotic system for remote leak sensing and localization can also benefit from gesture-based interaction (Soldan et al. 2012).

Gesture Control for Enhanced Learning.

An area where gesture control can be highly beneficial is when used for educational purposes. It is shown that gestures activate larger portions of the sensori-motor system and motoric pre-planning pathways than the other two systems and gestures may, therefore, lead to stronger memory traces (Goldin-Meadow 2011). Taking advantage of modern VR technology gives the possibility to simulate situations and visualizations in 3D to make the learner acquire knowledge faster and show better retention compared to 2D (Jeffrey 2011). By using hands as controls with gestures, the learner is given the possibility not only to get 3D visualizations but also to interact with the content. This follows the concept of “learning by doing” which works as a strong argument for gesture control in education. Engelkamp and Zimmer performed a task literature where they found that when participants performed short tasks, the task-associated words were better remembered compared to conditions where the participants read the words, or saw others perform the tasks (Engelkamp and Zimmer 1994). When learners take physical decisions about the placement of content using representational gestures, they become “active learners”, which has shown an increase of STEM grades by 20% (Waldrop 2015). This research can both argue for using gesture-based interaction directly for educational purposes, but also as an argument for using gesture control over conventional controllers for enhanced learning effect.

4 Discussion

4.1 Changing Conventional Interaction Patterns Requires Good Incentives

Looking back at history, we have seen that gesture control is not something new. Trying to implement gesture-based interaction in the consumer market has been tried several times without making a substantial impact. Gesture control for TVs had incentives for implementing this technology; to solve the problem of losing the remote or having to get out of the sofa to find it. However, the incentives where not impactful enough for people to learn this new way of interaction. Additionally, using inaccurate and immature technology are arguably reasons why we do not see this type of interaction in TVs today. Mobile phones, computers and any screen-based CE serves even fewer incentives for implementing gesture-based interaction. The Magic Leap did a serious attempt at changing how we interact with computers, but it serves more as a tool for designers and researchers on the topic of gesture control rather than a consumer product. Looking at these examples one could argue that gesture control should not be applied where humans have already a well-established interaction pattern if it is not enough incentives to change it.

4.2 What Medium Has the Biggest Potential for Gesture-Based Interaction?

Most screen-based interfaces lack the incentives to adapt to gesture-based interaction, but there are some examples where this is not the case. In healthcare, where sterility often is required, gesture-based interaction can aid health professionals to interact with the same tools without compromising their work- or hygienic environment.

Digital objects and IoT do not have an established interaction pattern and therefore are tablets and computers often used to control these units. In this case, it is plenty of incentives to implement a new and more effective way of interacting. The problem is that without any interface, the user does not get the feedback that is needed. Layering information and feedback with AR can solve this problem, but as AR technology is not a common possession for the general consumer, it might not be the right time to apply this technology.

The need for feedback is one of 6 design principles by Norman (the other being: visibility, affordance, mapping, constraints and consistency) (Norman 2013). These principles can be used as a reference when discussing the right medium to apply gesture-based interaction. Doing so, one can argue that VR and AR stand out as the preferred medium as it provides better visibility and affordance.

4.3 Where Should Gesture-Based Interaction Be Applied?

Some domains show clearer incentives of applying gesture-based interaction than others. Healthcare have already been mentioned as an interesting domain due to sterility, but many of its examples also overlaps with the domain of robotics. Controlling robots, in any industry, usually relies on advanced controllers with sticks and levers. If designed correctly, gesture-based interaction can in these cases offer a more natural and intuitive form of interaction. Studies also show that by using gestures compared to controllers it leads to faster learning and better memorability (Goldin-Meadow 2011; Valner et al. 2018; Waldrop 2015).

In situations where the human body is physically restrained also shows several incentives for applying gesture-based interaction. That could be operating drones in a pressurized spacesuit, controlling equipment under water, in hazardous environments or used in military combat. Gesture-based interaction serves the benefit of only require small movements of hand gestures and requires only one hand.

Gesture-based interaction in the automotive industry have some incentives for being useful and is implemented in several modern cars. Despite arguments for less gaze-aversion while driving and lower operating times of dashboard controllers (Zöller et al. 2018) the award of learning this way of interaction is only when operating minor controls in the car (i.e. volume, audio tracks, heat, etc.). One can therefore argue that even though there are incentives for applying gesture-based interaction it is not where it has the biggest potential. One can also speculate if cars will drive by themselves in the near future and will not need human interaction at all.

4.4 What Complications Need to Be Considered When Designing for Gesture-Based Interaction?

Gestures are highly related to our culture. In some countries thumbs up means good, while it might mean something completely different somewhere else. Cultural differences are something designers have to take into account when designing for gestures. In a study to test gesture control on TVs across different countries they concluded that their findings support the possibility of creating a global gesture language for most basic TV interactions (Meier et al. 2014). The aspect of taking account for cultural differences mainly applies when designing for products that will reach the global market. On the other hand it is shown that user-defined gestures are preferable and more memorable than the other types (Malizia and Bellucci 2012); (O’hara et al. 2013) which might solve the problem of cultural differences.

Social acceptance is also something to consider when designing for gesture control. When talking about consumer electronics many would be uncomfortable using gestures in public spaces (Rico and Brewster 2010). Also, people will always be reluctant to learn something new. As argued before, it should be a clear incentive for people to switch from conventional interaction to gesture-based interaction.

The technology also has to be mature enough for gesture control to be accepted by the public. It is reasonable to argue that gesture tracking technology will be highly accurate and non-intrusive in a couple of years. Until then the technology being used should fit the purpose of the application. If one is depending on high accuracy but mobility is not a necessity, then a sensor glove might be the right technology to use However, if movability is required but only simple gestures are being tracked, then a camera-based technology might be suitable. The technology we have today is definitely good enough to be considered a reliable tool for gesture-based interaction. Some technologies though, like AR and VR, are well developed, but still expensive and not a common possession for the general consumer. This makes these technologies more usable when designing for a few experts (B2B) in certain domains than for the general public (B2C).

Although the technology is mature, the market has to be as well. Timing and execution are therefore a major aspect to consider when releasing new technology. The technology has to be released at a time when the market is ready for it and executed properly. The Nintendo Power Glove is a good example of a great concept but lacked both the mature technology (as the glove was highly inaccurate and unresponsive) and the execution (the Nintendo platform was not designed for this type of interaction). Decades later, they managed to deliver a technology that was mature enough and executed properly with the, greatly successful, Nintendo Wii (Lee 2011).

The lack of physical feedback using gestures contradicts with Norman’s design principles and can be used as an argument when interacting against the use of gesture-based interaction. Studies of controlling surgical robot they concluded that a touch-based interface was better suited for the task than gesture interaction due to lack of force-feedback provided by contact with a surface, thus leading to a lack of precision, sensitivity, and context (Zhou et al. 2016). With this in mind, gesture-based interaction might, in certain areas, not be suitable before better technology (as sensor gloves with haptic feedback) are available.

5 Conclusion

This article has looked at the history of gesture control and how and where it has been used in the past. Different media where gesture control can be used have been mentioned. And lastly, it has presented promising domains where gesture control can make an impact.

There is no doubt that gesture-based interaction has a large potential, but it has to be used in areas where it has clear incentives for changing a well-established interaction pattern. Different media and domains offer different challenges and possibilities. Designers and researchers have to consider this when making products for gesture control. When designing gesture controlling for global consumers, one has to take into account social acceptance, cultural diversity and the burden of learning something new. In all cases applying this technology one has to consider the following; gesture-based interaction should not be used because it is possible, but because it is needed.