Keywords

1 Introduction

Gesture is an important way of nonverbal communication and conveys a variety of information [1]. With the development of computer technology, human-computer interaction becomes an important aspect of interaction between human and outside world. Gesture interaction, wildly used in human communication to express various messages [2], becomes a great trend of natural and harmonious human-computer interaction [3]. Natural human-computer interaction is aimed to achieve efficient interaction similar to the interaction among human, which requires the interaction principle in human-computer interaction design to be consistent with human cognition mechanism [4, 5]. Therefore, exploration of human cognition mechanism of gesture recognition will benefit the natural and harmonious gesture interaction.

1.1 Two Types of Gesture

Gestures can be divided into manipulation gestures (transitive gestures) and meaningful gestures (intransitive gestures) [6]. Manipulation gestures are involved in the usage of tools (for example, the gesture of waving a fist up and down means using a hammer), which are the results of interaction between human and artifacts. Meaningful gestures carry certain meanings in human communication (for example, the gesture of holding up thumb means approval or compliment, which are the results of interactions among human in society. These two types of gestures are also different in the neural basis. Previous research shows that representation of manipulation gestures induces greater activation in parietal cortex compared to meaningful gestures, which is involved in representation and execution of artifacts manipulation [7,8,9].

1.2 Two Manipulation Systems and Two Manipulation Gesture

Humans can manipulate artifacts in two ways: either reaching and grasping objects to move them, or using objects functionally [10]. Thus, based on different goals, manipulation can be divided into structure-based manipulation, which involves in grasping and moving objects, and function-based manipulation, which involves in using objects. The representation of structure-based manipulation depends on bilateral dorso-dorsal pathway (grasp system) and the representation of function-based manipulation relies on left ventro-dorsal pathway (use system) [11, 12]. The two manipulation representation systems have different characteristics based on distinct neural bases. The structure-based manipulation system processes visuospatial information and structure properties, such as object location, size and shape, to form the action representation about how to grasp an object [13]. It depends on on-line input and occupies little cognitive resource, so it can be activated in unattended [14,15,16,17] or unconscious condition [18, 19] and rapidly decay after activation [20]. Accordingly, function-based manipulation system extracts and stores the core features of the skilled using action to form the action representation about how to use an object [21,22,23], which can be stored in long-term memory. Its retrieval requires more cognitive resource and its activation is an off-line processing with the involvement of attention and consciousness [24, 25], which can last for a long time [20].

Further research suggests that neural basis of the two manipulation systems overlaps with the distribution of mirror neurons. Mirror neurons is a type of sensorimotor neurons which can activate in action execution, as well as in action observation [26, 27]. Therefore, two manipulation systems are not only involved in object manipulation, but also in recognition of manipulation gesture [28]. Thus, recognition of structure-based manipulation gestures does not require the retrieval of association between gestures and certain objects. It relies more on on-line input features that are low-level and unrelated to gesture meaning, which demands less cognitive resource. The function-based manipulation gestures, by contrast, have fixed association with certain objects. Thus, recognition of function-based manipulation gestures requires retrieval of their association with certain objects that are stored in long-term memory, which demands more cognitive resource [12, 29, 30]. Therefore, the recognition of structure-based manipulation gestures is more sensitive to the gesture features unrelated to the gesture meaning because of the superfluous cognitive resource in the recognition of structure-based manipulation gestures.

The present study aims to explore whether the processing of features unrelated to manipulation gesture meaning would be modulated by the way of gesture presentation. Gestures can by presented in static or dynamic way in human-interaction interface. Compared with static gestures, dynamic gestures contribute more information to the understanding of gesture meaning [31, 32], and demands more cognitive resource, which may decrease the processing of gesture features unrelated to gesture meaning.

1.3 Hypothesis

Based on previous research, the present study hypothesizes that gesture presentation could influence the recognition of manipulation gesture by modulating the processing of low-level gesture features unrelated to gesture meaning. Firstly, compared to static structure-based manipulation gesture, the recognition of dynamic structure-based manipulation gesture could not be influenced by the features unrelated to gesture meaning, because of the insufficiency of cognitive resource offered to meaning-unrelated features in dynamic presentation. Secondly, compared to dynamic function-based manipulation gesture, the recognition of static function-based manipulation gesture could be influenced by the features unrelated to gesture meaning, which is caused by the superfluous cognitive resource for the meaning-unrelated features in static presentation.

Here, we consider gesture orientation and left/right hand information as low-level features that are unrelated to gesture meaning. Gestures can be towards or away from the manipulable parts of tools, and can be left-hand or right-hand gestures. These features do not contain information of gesture meaning, because people cannot understand gesture meaning only based on these features. If the performance of gesture recognition is significantly different in distinct gesture orientations with left or right hands, the features unrelated to gesture meaning are processed in gesture recognition, and if the influence of features unrelated to gesture meaning on gesture recognition varies in static and dynamic presentation, the gesture presentation could modulate the effect of the low-level features on manipulation gesture recognition.

2 Method

2.1 Participants

Twenty-four undergraduate and graduate students (11 males, average age 23) participated and received payment after the experiment. All participants were right-handed and had normal or corrected-to-normal vision, and all were naĂŻve to the purpose of the experiment.

2.2 Stimuli

We assigned two types of manipulation gestures to each object (see Fig. 1). The gesture stimuli consisted of 17 static gesture pictures (6 structure-based hand gestures and 11 function-based gesture) and 22 dynamic gesture videos (10 structure-based gesture movies and 11 function-based gesture movies).

Fig. 1.
figure 1

Illustration of two manipulation gestures

Size of each picture on the screen was circa 15 × 10 cm (horizontal visual angle about 9.5° and vertical visual angle about 6.3° at a viewing distance about 90 cm), and 10 structure-based movie clips and 11 function-based gesture movie clips, each lasting 2500 ms, whose size on the screen was circa 17.8 × 10 cm (horizontal visual angle about 11.2° and vertical visual angle about 6.3° at a viewing distance about 90 cm). The object stimuli consisted of 11 gray-scale pictures of familiar manipulable objects. The size of object picture on the screen was circa 10 × 10 cm (visual angle about 6.3° at a viewing distance about 90 cm). All stimuli were presented in the center of a 22-in. monitor with a resolution of 1024 × 768 pixels and a refresh rate of 100 Hz.

The pictures and movie clips depict gestures that are towards or away from the manipulable parts of tools (such as tools’ handle) and are either left-hand or right-hand gestures (see Fig. 2). The gesture orientation and the left/right hand information were features unrelated to manipulation gesture meaning.

Fig. 2.
figure 2

Gesture orientation and the left/right hand information

Gesture pictures or movie clips and manipulable objects were matched into 4 types of pairs: structure-congruent pair (gestures in the pictures or movie clips were appropriate to grasp the following objects), structure-incongruent pair (gestures in the pictures or movie clips were inappropriate to grasp the following objects), function-congruent pair (gestures in the pictures or movie clips were appropriate to use the following objects), and function-incongruent pair (gestures in the pictures or movie clips were inappropriate to use the following objects).

Normalization of Stimuli.

Action congruency between gestures and objects in these pairs was rated on a 5-point scale. In the norming experiment, a gesture and a manipulable object were presented simultaneously and participants were asked to judge to what extent the gesture was appropriate to grasp (structure-based manipulation) or use (function-based manipulation) the object on a scale from 1 to 5, where 1 indicated the gesture was very inappropriate and 5 was very appropriate to manipulate the object. The action congruency between gesture pictures and objects and that between gesture movie clips and objects were rated separately by different participants. A two-tailed paired-sample t test revealed that the action congruency scores of congruent pairs were significantly higher than those of incongruent pairs (action congruency of gesture pictures and objects shown in Table 1: structure-congruent: 3.87, structure-incongruent: 1.82, function-congruent: 4.34; function-incongruent: 1.89, action congruency of gesture movie clips and objects shown in Table 2: structure-congruent: 4.15, structure-incongruent: 1.79, function-congruent: 4.60; function-incongruent: 1.11), indicating that the experimental manipulation was effective.

Table 1. Action congruency rating results of gesture pictures and objects
Table 2. Action congruency rating results of gesture movie clips and objects

2.3 Procedure

Participants were asked to sit in front of the monitor at a distance about 90 cm. At the beginning of each trial, a fixation was presented for 500 ms, followed by a gesture picture or a gesture movie clip that lasts 2500 ms. It was followed by a 70 ms blank screen and then an object picture that was displayed for another 80 ms. The object picture was then covered by a mask which did not disappear until a response was made. Participants were instructed to judge whether the gesture is appropriate to grasp or use the object by pressing “f” or “j” (button assignment counterbalanced across participants) as accurately and quickly as possible (see Fig. 3). Participants should respond “yes” in congruent condition and respond “no” in incongruent condition by pressing corresponding button. Response accuracy and reaction time were recorded by E-prime 2.0 automatically.

Fig. 3.
figure 3

Procedure

The main experiment consisted of 2 static gesture recognition blocks (structure-based and function-based gesture picture block) and 2 dynamic gesture recognition blocks (structure-based and function-based gesture movie block). The order of 4 blocks were counterbalanced across participants. Half of participants started the experiment with static gesture recognition and the half began with the reverse arrangement. Moreover, half of the participants starting with the static gesture recognition conducted the structure-based gesture picture block firstly and the other half conducted the function-based gesture picture block firstly. The same arrangement was made for the participants starting with the dynamic gesture recognition.

Each block consisted of 2 types of trials: congruent trial in which gestures were suitable to manipulate the objects presented and thus participants should respond “yes”, and incongruent trials in which gestures were unsuitable to manipulate the objects and thus participants should respond “no”. For congruent trials, gestures were either towards or away from the manipulable parts of tools, and were either left-hand or right-hand gestures. For incongruent trials, gestures might left-hand or right-hand gestures. Therefore, there were 44 congruent trials (2 × 2 × 11) with “yes” response as correct reaction and 22 incongruent trials (2 × 11) with “no” response as correct reaction. Incongruent trials were regarded as filter trials which would not be involved in result analysis. The experiment was within-subject design and every participant would accomplish a practice phase (20 trials, other stimuli than in the main experiment) and 4 blocks consisted of 264 trials. The whole experiment would last about 40 min.

3 Results

We analyzed the data using SPSS 22.0. The analysis was restricted to congruent trials with correct reaction as “yes”. Mean recognition accuracy of static and dynamic gestures in each condition for every subject were computed and presented in Table 3 (recognition accuracy of structure-based manipulation gesture) and Table 4 (recognition accuracy of function-based manipulation gesture) respectively. Analysis of ANOVA of structure-based and function-based manipulation gesture recognition accuracy with gesture presentation (static or dynamic) and gesture orientation (towards or away from the manipulable part of objects) as factors were conducted to explore whether the processing of gesture orientation would be modulated by gesture presentation. Moreover, analysis of ANOVA of structure-based and function-based gesture recognition accuracy with gesture presentation (static or dynamic) and the left/right hand information (left or right hand) as factors were also conducted to explore the influence of gesture presentation on the processing of the left/right hand information.

Table 3. Recognition accuracy of structure-based manipulation gesture
Table 4. Recognition accuracy of function-based manipulation gesture

3.1 Structure-Based Manipulation Gestures

Recognition accuracy of structure-based manipulation gestures with different gesture orientation in static and dynamic presentation. The repeated ANOVA analysis by subject revealed significant main effect of gesture orientation, F 1 (1,23) = 11.086, p = .003, \( \eta_{p}^{2} \, = \,. 3 2 5 \), and significant interaction between gesture presentation and gesture orientation, F 1 (1,23) = 11.931, p = .002, \( \eta_{p}^{2} \, = \,. 3 4 2 \). Simple effect analysis showed that only for dynamic structure-based manipulation gestures, the recognition accuracy of gestures towards the manipulable part of objects (M = 0.837) was significantly higher than that of gestures away from the manipulable part of objects (M = 0.758), p < .001, which indicated that gesture orientation of structure-based manipulation gesture could be processed in dynamic but not in static presentation (Fig. 4-left figure). And the main effect of gesture presentation was not significant, F 1 (1,23) = 2.595, p = .121, \( \eta_{p}^{2} \, = \,.101 \).

Fig. 4.
figure 4

Recognition accuracy of structure-based manipulation gestures with different gesture orientation (left figure) and different the left/right hand information (right figure), * p < .05, error bar: ±1 SE.

The repeated ANOVA analysis by term showed same result: the main effect of gesture orientation was significant, F 2 (1,20) = 9.546, p = .006, \( \eta_{p}^{2} \, = \,. 3 2 3 \), but the main effect of gesture presentation was not significant, F 2 (1,20) = 0.74, p = .40, \( \eta_{p}^{2} \, = \,.036 \). The interaction between gesture presentation and gesture orientation, F 2 (1,20) = 5.916, p = .025, \( \eta_{p}^{2} \, = \,.228 \). Simple effect analysis showed that for dynamic structure-based manipulation gestures, the recognition accuracy of gestures towards the manipulable part of objects (M = 0.837) was significantly higher than that of gestures away from the manipulable part of objects (M = 0.758), p = .001, but the effect was not found in the recognition of static structure-based manipulation gestures.

Recognition accuracy of structure-based manipulation gestures with different the left/right hand information in static and dynamic presentation. The repeated ANOVA analysis by subject only found significant interaction between gesture presentation and the left/right hand information, F 1 (1,23) = 7.389, p = .012, \( \eta_{p}^{2} \, = \,.243 \). Simple effect analysis showed that only for static structure-based manipulation gestures, the recognition accuracy of left-hand gestures (M = 0.767) was significantly higher than that of right-hand gestures (M = 0.723), p = .019, which indicated that the left/right hand information of structure-based manipulation gesture could be processed in static but not in dynamic presentation (Fig. 4-right figure). And the main effect of gesture presentation was not significant, F 1 (1,23) = 2.617, p = .119, \( \eta_{p}^{2} \, = \,.102 \), as well as the left/right hand information, F 1 (1,23) = 1.490, p = .235, \( \eta_{p}^{2} \, = \,.061 \).

The repeated ANOVA analysis by term was not same as the results of analysis by subject and no significant main effect or interaction was found.

3.2 Function-Based Manipulation Gestures

Recognition accuracy of function-based manipulation gestures with different gesture orientation in static and dynamic presentation. The repeated ANOVA analysis by subject revealed significant main effect of gesture presentation, F 1 (1,23) = 37.776, p < .001, \( \eta_{p}^{2} \, = \,.622 \), significant main effect of gesture orientation, F 1 (1,23) = 5.813, p = .024, \( \eta_{p}^{2} \, = \,.202 \), and marginal significant interaction between gesture presentation and gesture orientation, F 1 (1,23) = 3.423, p = .077, \( \eta_{p}^{2} \, = \,.130 \). Simple effect analysis showed that only for static function-based manipulation gestures, the recognition accuracy of gestures towards the manipulable part of objects (M = 0.778) was significantly higher than that of gestures away from the manipulable part of objects (M = 0.722), p = .022, which indicated that gesture orientation of function-based manipulation gesture could be processed in static but not in dynamic presentation (Fig. 5-left figure). Besides, the recognition accuracy of dynamic gesture was significantly higher than that of static gesture.

Fig. 5.
figure 5

Recognition accuracy of function-based manipulation gestures with different gesture orientation (left figure) and different the left/right hand information (right figure), *p < .05, error bar: ±1 SE.

The repeated ANOVA analysis by term also revealed significant main effect of gesture presentation, F 2 (1,20) = 13.806, p = .001, \( \eta_{p}^{2} \, = \,.408 \), significant main effect of gesture orientation, F 2 (1,20) = 10.041, p = .005, \( \eta_{p}^{2} \, = \,.334 \), and significant interaction between gesture presentation and gesture orientation, F 2 (1,20) = 5.123, p = .035, \( \eta_{p}^{2} \, = \,.204 \). Simple effect analysis showed that only for static function-based manipulation gestures, the recognition accuracy of gestures towards the manipulable part of objects (M = 0.778) was significantly higher than that of gestures away from the manipulable part of objects (M = 0.722), p = .001.

Recognition accuracy of function-based manipulation gestures with different the left/right hand information in static and dynamic presentation. The repeated ANOVA analysis by subject only found significant main effect of gesture presentation, F 1 (1,23) = 37.410, p < .001, \( \eta_{p}^{2} \, = \,.619 \), and the recognition accuracy was significantly higher in dynamic presentation. The main effect of the left/right hand information was not significant, F 1 (1,23) = 3.354, p = .080, \( \eta_{p}^{2} \, = \,.127 \), so as the interaction between gesture presentation and the left/right hand information, F 1 (1,23) = 0.004, p = .953, \( \eta_{p}^{2} \, < \,.00 1 \), which indicated that the left/right hand information of function-based manipulation gesture could not be processed in any presentation (Fig. 5-right figure).

The repeated ANOVA analysis by term also found significant main effect of gesture presentation, F 2 (1,20) = 13.581, p = .001, \( \eta_{p}^{2} \, = \,.404 \), and the main effect of the left/right hand information was marginal significant, F 2 (1,20) = 4.158, p = .055, \( \eta_{p}^{2} \, = \,.172 \). But the interaction between gesture presentation and the left/right hand information was not significant, F 2 (1,20) = 0.034, p = .855, \( \eta_{p}^{2} \, = \,.002 \).

4 Discussion

The present study was aimed to explore whether gesture presentation can influence the recognition of manipulation gesture by modulating the processing of low-level features unrelated to gesture meaning, such as gesture orientation and the left/right hand information. The results showed that gesture presentation could influence the processing of meaning-unrelated features in the recognition of function-based manipulation gestures and thus facilitate the recognition. Static function-based manipulation gesture recognition could be influenced by gesture orientation, while dynamic function-based manipulation gesture recognition, with higher recognition accuracy, could not be influenced by two features unrelated to gesture meaning. It suggested that dynamic presentation can effectively avoid the process of meaning-unrelated features and thus improve the recognition accuracy of function-based manipulation gesture. However, the processing of features unrelated to gesture meaning remained in both static and dynamic gesture presentation, and dynamic presentation failed to improve the recognition performance, which was incongruent with hypothesis. Static structure-based manipulation gesture recognition could be influenced by the left/right hand information and dynamic structure-based manipulation gesture recognition could be influenced by gesture orientation. It indicated that dynamic presentation failed to totally avoid the processing of meaning-unrelated features.

4.1 Different Effect of Presentation on Two Manipulation Gesture Recognition

The results indicated that static and dynamic gesture presentation could affect the recognition of manipulation gestures by modulating the influence of gesture features unrelated to gesture meaning, and had different influence on structure- and function-based manipulation gestures.

For function-based manipulation gestures, its recognition depended more on information stored in long-term memory, but not on-line input low-level feature, which requires more cognitive resource [12,13,14,15]. However, the static presentation containing limited information decreased the cognitive resource occupied by function-based manipulation gestures recognition, and the superfluous cognitive resource was devoted into the processing of meaning-unrelated gesture features, resulting in the influence of recognition of static gestures. Oppositely, dynamic presentation containing rich information attracted all cognitive resource into the processing in gesture meaning, resulting in insufficient cognitive resource for processing of the features unrelated to gesture meaning.

For structure-based manipulation gestures, incongruent with hypothesis, its recognition was still affected by low-level gesture features unrelated to gesture meaning even in dynamic presentation. It might be attributed to the dependence of on-line input and limited cognitive resource demand in structure-based manipulation processing [30]. So, its recognition failed to attract all cognitive resource and the superfluous resource can be involved in the processing of the meaning-unrelated features regardless of gesture presentation.

4.2 The Significance of Application

The present results demonstrated the different effect of gesture presentation on recognition of structure-based and function-based manipulation gestures by modulating the processing of gesture features unrelated to gesture meaning, which provided reference in the manipulation gesture presentation in human-computer interaction. It suggested that dynamic presentation of function-based manipulation gestures would be a better option in human-computer interaction to improve the recognition efficiency. And the successful recognition of structure-based manipulation gestures demanded the control of features unrelated to gesture meaning, such as gesture orientation and the left/right hand information. For example, in virtual simulation environment, grasping action can be shown by virtual hand [33, 34]. The strong immersion of such environment required accurate action expression of virtual hand [35], as well as users’ effective gesture recognition. Therefore, according to present results, whether virtual hands are towards or away from the objects in scene should be taken into consideration in order to effective human-computer interactions.

5 Conclusion

The present results indicated that static and dynamic gesture presentation could affect the recognition of structure-based and function-based manipulation gestures in different aspect. It suggests that dynamic presentation of function-based manipulation gestures would be a better option in human-computer interaction, and the gesture features unrelated to gesture meaning should be controlled in the presentation of structure-based manipulation gestures, in order to ensure the successful gesture recognition. The findings of the present study have theoretical implications for the design of gesture interaction methods in human-computer interaction.