Skip to main content
Log in

Studies in automated hand gesture analysis: an overview of functional types and gesture phases

  • Survey
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper presents an overview of studies on automated hand gesture analysis, which is mainly concerned with recognition and segmentation issues related to functional types and gesture phases. The issues selected for discussion have been arranged in a way that takes account of problems within the Theory of Gestures that each study seeks to address. Their principal computational factors that were involved in conducting the analysis of automated hand gesture have been examined, and an analysis of open research issues has been carried out for each application dealt with in the studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Since each section presents a different aspect of the studies, some studies are covered in more than one section, each section addressing the pertinent aspects of the studies.

  2. The objectives of Gestures Studies are discussed on Sect. 3.

  3. This framework is summarized in McNeill (1992) and Kita et al. (1998).

  4. Handedness regards the hands used in a gesture: left hand, right hand, or both.

  5. The fundamental frequency contour and the intensity contour represents, respectively, pitch (or intonation) and loudness of the signal (Pokorny 2011).

  6. In this paper, the results are not detailed for each class. Thus, the results correspond to the average accuracy rate of the preparation, contour, pointing, and retraction classes.

  7. Recall is a performance measure that evaluates how many instances were classified as positive (rest position in this case) from all positives instances.

  8. Precision is a performance measure that evaluates how many instances were really positive (rest position in this case) from all instances that were classified as positive.

  9. The values for precision and recall are not explicitly presented in Wilson et al. (1996). These values (precision of 82% and recall of 79%) were estimated by the authors of this paper from a graphical analysis of the figure that compares a video annotated by a person and a video annotated by the heuristic.

  10. When considering a gesture performed by both hands, if both perform the same movement (even with different direction), the gesture is symmetric; else, the gesture is asymmetric. Both features are obtained from the manual transcripts made by researchers, and not by computer methods.

  11. http://archive.ics.uci.edu/ml/index.html.

  12. http://research.microsoft.com/en-us/um/people/zliu/actionrecorsrc.

  13. For instance, one annotator may describe a movement as an outward movement while another may describe it as a front-outward movement.

References

  • Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 11(26), 832–843.

    Article  Google Scholar 

  • Allwood, J., Cerrato, L., Dybkjaer, L., Jokinen, K., Navarretta, C., & Paggio, P. (2004). The mumin multimodal coding scheme. Technical report, University of Copenhagen. http://www.cst.dk/mumin/.

  • Artstein, R., & Poesio, M. (2008). Inter-corder agreement for computation linguistics. Computational Linguistics, 34, 555–596.

    Article  Google Scholar 

  • Brugman, H., & Russel, A. (2004). Annotating multimedia/multi-modal resources with elan. In Proceedings of the 4th International Conference on Language Resources and Evaluation (pp. 2065–2068).

  • Bryll, R., Quek, F., & Esposito A. (2001). Automatic hand hold detection in natural conversation. In Proceedings of the IEEE Workshop on Cues in Communication (pp. 1–6).

  • Bull, P. (1986). The use of hand gesture in political speeches: Some case studies. Journal of Language and Social Psychology, 5(2), 103–118.

    Article  Google Scholar 

  • Bunt, H. (2009). Multifunctionality and multidimensional dialogue semantics. Keynote Speaker on Workshop on the Semantics and Pragmatics of Dialogue.

  • Carson-Berndsen, J. (1998). Time map phonology: Finite state models and event logics in speech recognition. New York: Kluwer Academic Publishers.

    Book  Google Scholar 

  • Chen, L., Harper, M., & Quek, F. (2002). Gesture patterns during speech repairs. In Proceedings of 4th IEEE International Conference on Multimodal Interfaces, IEEE Computer Society (pp. 155–160).

  • Chen, L., Liu, Y., Harper, M. P., & Shriberg, E. (2004). Multimodal model integration for sentence unit detection. In Proceedings of the 6th international conference on Multimodal interfaces (pp. 121–128). New York: ACM Press.

  • Colgan, S. E., Lanter, E., McComish, C., Watson, L. R., Crais, E. R., & Baranek, G. T. (2006). Analysis of social interaction gestures in infants with autism. Child Neuropsychology, 12(4–5), 307–319.

    Article  Google Scholar 

  • Dias, D. B., Madeo, R. C. B., Rocha, T., Bscaro, H. H., & Peres, S. M. (2009). Hand movement recognition for brazilian sign language: A study using distance-based neural networks. In Proceedings of the International Joint Conference on Neural Networks, IEEE, (pp. 697–704).

  • Eisenstein, J., Barzilay, R., & Davis, R. (2008). Gesture salience as a hidden variable for coreference resolution and keyframe extraction. Artificial Intelligence, 31(1), 353–398.

    Google Scholar 

  • Fauconnier, G., & Turner, M. (2003). Conceptual blending, form and meaning. Recherches en Communication, 19, 57–86.

    Google Scholar 

  • Garnham, A. (1994). Psycholinguistics: Central topics. London: Routledge.

    Google Scholar 

  • Gebre, B. G., Wittenburg, P., & Lenkiewicz, P. (2012). Towards automatic gesture stroke detection. In Proceedings of the 8th International Conference on Language Resources and Evaluation Istanbul.

  • Gibbon, D. (2009). Gesture theory is linguistics: On modelling multimodality as prosody. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (pp. 9–18).

  • Gibbon, D. (2006). Time types and time trees: Prosodic mining and alignment of temporally annotated data. Berlin: Walter de Gruyter.

    Google Scholar 

  • Gibbon, D., Gut, U., Hell, B., Looks, K., Thies, A., & Trippel, T. (2003). A computational model of arm gestures in conversation. Proceedings of Eurospeech, 2003, 813–816.

    Google Scholar 

  • Grynszpan, O., Martin, J. C., & Oudin, N. (2003). On the annotation of gestures in multimodal autistic behaviour. In The 5th International Workshop on Gesture and Sign Language based Human-Computer Interaction (pp. 25–33).

  • Gullberg, M. (2010). Methodological reflections on gesture analysis in second language acquisition and bilingualism research. Second Language Research, 26(1), 75–102.

    Article  Google Scholar 

  • Hampton, J. A. (1981). An investigation of the nature of abstract concepts. Memory and Cognition, 9(2), 149–156.

    Article  Google Scholar 

  • ISGS (2013). ISGS: International Society for Gesture Studies. http://www.gesturestudies.com. Accessed 30 Apr 2013.

  • Kadous, M. W. (2002). Temporal classification: Extending the classification paradigms to multivariate time series. Ph.D. thesis, School of Computer Science and Engineering, University of New South Wales.

  • Kahol, K., Tripathi, P., & Panchanathan, S. (2004). Automated gesture segmentation from dance sequences. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, IEEE (pp. 883–888).

  • Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), The relationship of verbal and nonverbal communication (pp. 207–227). The Hague: Mounton Publishers.

    Google Scholar 

  • Kendon, A. (2005). Gesture: Visible action as utterance. Cambridge: Cambridge University Press.

    Google Scholar 

  • Kettebekov, S. (2004). Exploiting prosodic structuring of coverbal gesticulation. In Proceedings of the 6th International Conference on Multimodal interfaces (pp. 105–112). New York, NY: ACM Press.

  • Kettebekov, S., Yeasin, M., Krahnstoever, N., & Sharma, R. (2002). Prosody based co-analysis of deictic gestures and speech in weather narration broadcast. In Proceedings of the Workshop on Multimodal Resources and Multimodal System Evaluation.

  • Kettebekov, S., Yeasin, M., & Sharma, R. (2005). Prosody based audiovisual coanalysis for coverbal gesture recognition. IEEE Transactions on Multimedia, 7(2), 234–242.

    Article  Google Scholar 

  • Kipp, M. (2012). Multimedia annotation, querying, and analysis in anvil. London: Wiley.

    Book  Google Scholar 

  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In I. Wachsmuth & F. Frohlich (Eds.), Gesture and sign language in human-computer interaction (Vol. 1371, pp. 23–35)., Lecture notes in computer science Berlin: Springer.

    Chapter  Google Scholar 

  • Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Liddell, S. K. (2003). Grammar, gesture, and meaning in American Sign Language. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Liu, J., & Kavakli, M. (2010). A survey of speech-hand gesture recognition for the development of multimodal interfaces in computer games. In Proceedings of the 2010 IEEE International Conference on Multimedia and Expo (pp. 1564–1569).

  • Lyons, J. (1977). Semantics. Semantics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Madeo, R. C. B., Lima, C. A. M., & Peres, S. M. (2013). Gesture unit segmentation using support vector machines: Segmenting gestures from rest positions. In Proceedings of 28th Annual ACM Symposium on Applied Computing (pp. 114–121).

  • Margolis, E. (1994). A reassessment of the shift from the classical theory of concepts to prototype theory. Cognition, 51(1), 73–89.

    Article  Google Scholar 

  • Maricchiolo, F., Gnisci, A., & Bonaiuto, M. (2012). Coding hand gestures: A reliable taxonomy and a multi-media support. In Cognitive behavioural systems (pp. 405–416). Springer.

  • Martell, C. (2002). Form: An extensible, kinematically-based gesture annotation scheme. In Proceedings of the 2002 International Conference on Language Resources and Evaluation Istanbul.

  • Martell, C. (2005). Form: An experiment in the annotation of the kinematics of gesture. Ph.D. thesis, University of Pennsylvania.

  • Martell, C. H., & Kroll, J. (2007). Corpus-based gesture analysis: An extension of the form dataset for the automatic detection of phases in a gesture. International Journal of Semantic Computing, 1, 521–536.

    Article  Google Scholar 

  • McNeill, D. (1992). Hand and mind: What the hands reveal about thought. Chicago: University of Chicago Press.

    Google Scholar 

  • McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.

    Book  Google Scholar 

  • McNeill, D., Quek, F., Mccullough, K., Duncan, S., Furuyama, N., Bryll, R., et al. (2001). Catchments, prosody and discourse. Gesture, 1(1), 9–33.

    Article  Google Scholar 

  • Mitra, S., & Acharya, T. (2007). Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 37(3), 311–324.

    Article  Google Scholar 

  • Mol, L., Krahmer, E., & van de Sandt-Koenderman, M. (2012). Gesturing by aphasic speakers, how does it compare? Journal of Speech, Language and Hearing Research, 56(4), 1224–1236.

    Article  Google Scholar 

  • Moni, M., & Ali, A. B. M. S. (2009). Hmm based hand gesture recognition: A review on techniques and approaches. In Proceedings of the 2nd IEEE International Conference on Computer Science and Information Technology (pp. 433 –437).

  • Petukhova, V., & Bunt, H. (2009). Dimensions of communication. Technical Report TiCC TR 2009–003, Tilburg University.

  • Pickering, C. (2005). The search for a safer driver interface: A review of gesture recognition human machine interface. Computing Control Engineering Journal, 16(1), 34–40.

    Article  Google Scholar 

  • Pokorny, F. (2011). Extraction of prosodic features from speech signals.

  • Quek, F., McNeill, D., Ansari, R., Ma, X., Bryll, R., Duncan, S., & McCullough, K. (1999). Gesture cues for conversational interaction in monocular video. In Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, IEEE (pp. 119–126).

  • Quek, F. (2004). The catchment feature model: A device for multimodal fusion and a bridge between signal and sense. EURASIP Journal on Advances in Signal Processing, 2004(11), 1619–1636.

    Article  Google Scholar 

  • Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X., Kirbas, C., et al. (2002). Multimodal human discourse: Gesture and speech. ACM Transactions on Computer Human Interaction (TOCHI), 9(3), 171–193.

    Article  Google Scholar 

  • Ramakrishnan, A. S. (2011). Segmentation of hand gestures using motion capture data. Master’s thesis, University of California.

  • Rossini, N. (2004). The analysis of gesture: Establishing a set of parameters. In Gesture-Based Communication in Human-Computer Interaction: 5th International Gesture Workshop (pp. 124–131).

  • Rossini, N. (2012). Reinterpreting gestures as language: Language in action. Amsterdam: IOS Press.

    Google Scholar 

  • Scherr, R. E. (2008). Gesture analysis for physics education researchers. Physical Review Special Topics Physics Education Research, 4(1), 010,101.

    Article  Google Scholar 

  • Sowa, T. (2008). The recognition and comprehension of hand gestures: a review and research agenda. In Proceedings of the Embodied Communication in Humans and Machines, 2nd ZiF Research Group International Conference on Modeling communication with Robots and Virtual Humans (pp. 38–56). Springer-Verlag, Berlin.

  • Theodoridis, T., & Hu, H. (2007). Action classification of 3d human models using dynamic anns for mobile robto surveillance. In International Conference on Robotics and Biomimetics (pp. 371–376).

  • Wilson, A., Bobick, A., & Cassell, J. (1996). Recovering the temporal structure of natural gesture. In Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (pp. 66 –71).

  • Xiong, Y., Quek, F., & McNeill, D. (2002). Hand gesture symmetric behavior detection and analysis in natural conversation. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, IEEE Computer Society (pp. 179–184).

  • Xiong, Y., & Quek, F. (2006). Hand motion gesture frequency properties and multimodal discourse analysis. International Journal of Computer Vision, 69(3), 353–371.

    Article  Google Scholar 

  • Yang, F.J. (2011). The talking hands? The relation between gesture and language in aphasic patients. Ph.D. thesis, University of Trento.

Download references

Acknowledgements

Renata C. B. Madeo thanks São Paulo Research Foundation for the financial support—process number 2011/04608-8.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarajane M. Peres.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madeo, R.C.B., Lima, C.A.M. & Peres, S.M. Studies in automated hand gesture analysis: an overview of functional types and gesture phases. Lang Resources & Evaluation 51, 547–579 (2017). https://doi.org/10.1007/s10579-016-9373-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-016-9373-4

Keywords

Navigation