Abstract
This paper presents an overview of studies on automated hand gesture analysis, which is mainly concerned with recognition and segmentation issues related to functional types and gesture phases. The issues selected for discussion have been arranged in a way that takes account of problems within the Theory of Gestures that each study seeks to address. Their principal computational factors that were involved in conducting the analysis of automated hand gesture have been examined, and an analysis of open research issues has been carried out for each application dealt with in the studies.
Similar content being viewed by others
Notes
Since each section presents a different aspect of the studies, some studies are covered in more than one section, each section addressing the pertinent aspects of the studies.
The objectives of Gestures Studies are discussed on Sect. 3.
Handedness regards the hands used in a gesture: left hand, right hand, or both.
The fundamental frequency contour and the intensity contour represents, respectively, pitch (or intonation) and loudness of the signal (Pokorny 2011).
In this paper, the results are not detailed for each class. Thus, the results correspond to the average accuracy rate of the preparation, contour, pointing, and retraction classes.
Recall is a performance measure that evaluates how many instances were classified as positive (rest position in this case) from all positives instances.
Precision is a performance measure that evaluates how many instances were really positive (rest position in this case) from all instances that were classified as positive.
The values for precision and recall are not explicitly presented in Wilson et al. (1996). These values (precision of 82% and recall of 79%) were estimated by the authors of this paper from a graphical analysis of the figure that compares a video annotated by a person and a video annotated by the heuristic.
When considering a gesture performed by both hands, if both perform the same movement (even with different direction), the gesture is symmetric; else, the gesture is asymmetric. Both features are obtained from the manual transcripts made by researchers, and not by computer methods.
For instance, one annotator may describe a movement as an outward movement while another may describe it as a front-outward movement.
References
Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 11(26), 832–843.
Allwood, J., Cerrato, L., Dybkjaer, L., Jokinen, K., Navarretta, C., & Paggio, P. (2004). The mumin multimodal coding scheme. Technical report, University of Copenhagen. http://www.cst.dk/mumin/.
Artstein, R., & Poesio, M. (2008). Inter-corder agreement for computation linguistics. Computational Linguistics, 34, 555–596.
Brugman, H., & Russel, A. (2004). Annotating multimedia/multi-modal resources with elan. In Proceedings of the 4th International Conference on Language Resources and Evaluation (pp. 2065–2068).
Bryll, R., Quek, F., & Esposito A. (2001). Automatic hand hold detection in natural conversation. In Proceedings of the IEEE Workshop on Cues in Communication (pp. 1–6).
Bull, P. (1986). The use of hand gesture in political speeches: Some case studies. Journal of Language and Social Psychology, 5(2), 103–118.
Bunt, H. (2009). Multifunctionality and multidimensional dialogue semantics. Keynote Speaker on Workshop on the Semantics and Pragmatics of Dialogue.
Carson-Berndsen, J. (1998). Time map phonology: Finite state models and event logics in speech recognition. New York: Kluwer Academic Publishers.
Chen, L., Harper, M., & Quek, F. (2002). Gesture patterns during speech repairs. In Proceedings of 4th IEEE International Conference on Multimodal Interfaces, IEEE Computer Society (pp. 155–160).
Chen, L., Liu, Y., Harper, M. P., & Shriberg, E. (2004). Multimodal model integration for sentence unit detection. In Proceedings of the 6th international conference on Multimodal interfaces (pp. 121–128). New York: ACM Press.
Colgan, S. E., Lanter, E., McComish, C., Watson, L. R., Crais, E. R., & Baranek, G. T. (2006). Analysis of social interaction gestures in infants with autism. Child Neuropsychology, 12(4–5), 307–319.
Dias, D. B., Madeo, R. C. B., Rocha, T., Bscaro, H. H., & Peres, S. M. (2009). Hand movement recognition for brazilian sign language: A study using distance-based neural networks. In Proceedings of the International Joint Conference on Neural Networks, IEEE, (pp. 697–704).
Eisenstein, J., Barzilay, R., & Davis, R. (2008). Gesture salience as a hidden variable for coreference resolution and keyframe extraction. Artificial Intelligence, 31(1), 353–398.
Fauconnier, G., & Turner, M. (2003). Conceptual blending, form and meaning. Recherches en Communication, 19, 57–86.
Garnham, A. (1994). Psycholinguistics: Central topics. London: Routledge.
Gebre, B. G., Wittenburg, P., & Lenkiewicz, P. (2012). Towards automatic gesture stroke detection. In Proceedings of the 8th International Conference on Language Resources and Evaluation Istanbul.
Gibbon, D. (2009). Gesture theory is linguistics: On modelling multimodality as prosody. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (pp. 9–18).
Gibbon, D. (2006). Time types and time trees: Prosodic mining and alignment of temporally annotated data. Berlin: Walter de Gruyter.
Gibbon, D., Gut, U., Hell, B., Looks, K., Thies, A., & Trippel, T. (2003). A computational model of arm gestures in conversation. Proceedings of Eurospeech, 2003, 813–816.
Grynszpan, O., Martin, J. C., & Oudin, N. (2003). On the annotation of gestures in multimodal autistic behaviour. In The 5th International Workshop on Gesture and Sign Language based Human-Computer Interaction (pp. 25–33).
Gullberg, M. (2010). Methodological reflections on gesture analysis in second language acquisition and bilingualism research. Second Language Research, 26(1), 75–102.
Hampton, J. A. (1981). An investigation of the nature of abstract concepts. Memory and Cognition, 9(2), 149–156.
ISGS (2013). ISGS: International Society for Gesture Studies. http://www.gesturestudies.com. Accessed 30 Apr 2013.
Kadous, M. W. (2002). Temporal classification: Extending the classification paradigms to multivariate time series. Ph.D. thesis, School of Computer Science and Engineering, University of New South Wales.
Kahol, K., Tripathi, P., & Panchanathan, S. (2004). Automated gesture segmentation from dance sequences. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, IEEE (pp. 883–888).
Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), The relationship of verbal and nonverbal communication (pp. 207–227). The Hague: Mounton Publishers.
Kendon, A. (2005). Gesture: Visible action as utterance. Cambridge: Cambridge University Press.
Kettebekov, S. (2004). Exploiting prosodic structuring of coverbal gesticulation. In Proceedings of the 6th International Conference on Multimodal interfaces (pp. 105–112). New York, NY: ACM Press.
Kettebekov, S., Yeasin, M., Krahnstoever, N., & Sharma, R. (2002). Prosody based co-analysis of deictic gestures and speech in weather narration broadcast. In Proceedings of the Workshop on Multimodal Resources and Multimodal System Evaluation.
Kettebekov, S., Yeasin, M., & Sharma, R. (2005). Prosody based audiovisual coanalysis for coverbal gesture recognition. IEEE Transactions on Multimedia, 7(2), 234–242.
Kipp, M. (2012). Multimedia annotation, querying, and analysis in anvil. London: Wiley.
Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In I. Wachsmuth & F. Frohlich (Eds.), Gesture and sign language in human-computer interaction (Vol. 1371, pp. 23–35)., Lecture notes in computer science Berlin: Springer.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Thousand Oaks, CA: Sage.
Liddell, S. K. (2003). Grammar, gesture, and meaning in American Sign Language. Cambridge: Cambridge University Press.
Liu, J., & Kavakli, M. (2010). A survey of speech-hand gesture recognition for the development of multimodal interfaces in computer games. In Proceedings of the 2010 IEEE International Conference on Multimedia and Expo (pp. 1564–1569).
Lyons, J. (1977). Semantics. Semantics. Cambridge: Cambridge University Press.
Madeo, R. C. B., Lima, C. A. M., & Peres, S. M. (2013). Gesture unit segmentation using support vector machines: Segmenting gestures from rest positions. In Proceedings of 28th Annual ACM Symposium on Applied Computing (pp. 114–121).
Margolis, E. (1994). A reassessment of the shift from the classical theory of concepts to prototype theory. Cognition, 51(1), 73–89.
Maricchiolo, F., Gnisci, A., & Bonaiuto, M. (2012). Coding hand gestures: A reliable taxonomy and a multi-media support. In Cognitive behavioural systems (pp. 405–416). Springer.
Martell, C. (2002). Form: An extensible, kinematically-based gesture annotation scheme. In Proceedings of the 2002 International Conference on Language Resources and Evaluation Istanbul.
Martell, C. (2005). Form: An experiment in the annotation of the kinematics of gesture. Ph.D. thesis, University of Pennsylvania.
Martell, C. H., & Kroll, J. (2007). Corpus-based gesture analysis: An extension of the form dataset for the automatic detection of phases in a gesture. International Journal of Semantic Computing, 1, 521–536.
McNeill, D. (1992). Hand and mind: What the hands reveal about thought. Chicago: University of Chicago Press.
McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.
McNeill, D., Quek, F., Mccullough, K., Duncan, S., Furuyama, N., Bryll, R., et al. (2001). Catchments, prosody and discourse. Gesture, 1(1), 9–33.
Mitra, S., & Acharya, T. (2007). Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 37(3), 311–324.
Mol, L., Krahmer, E., & van de Sandt-Koenderman, M. (2012). Gesturing by aphasic speakers, how does it compare? Journal of Speech, Language and Hearing Research, 56(4), 1224–1236.
Moni, M., & Ali, A. B. M. S. (2009). Hmm based hand gesture recognition: A review on techniques and approaches. In Proceedings of the 2nd IEEE International Conference on Computer Science and Information Technology (pp. 433 –437).
Petukhova, V., & Bunt, H. (2009). Dimensions of communication. Technical Report TiCC TR 2009–003, Tilburg University.
Pickering, C. (2005). The search for a safer driver interface: A review of gesture recognition human machine interface. Computing Control Engineering Journal, 16(1), 34–40.
Pokorny, F. (2011). Extraction of prosodic features from speech signals.
Quek, F., McNeill, D., Ansari, R., Ma, X., Bryll, R., Duncan, S., & McCullough, K. (1999). Gesture cues for conversational interaction in monocular video. In Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, IEEE (pp. 119–126).
Quek, F. (2004). The catchment feature model: A device for multimodal fusion and a bridge between signal and sense. EURASIP Journal on Advances in Signal Processing, 2004(11), 1619–1636.
Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X., Kirbas, C., et al. (2002). Multimodal human discourse: Gesture and speech. ACM Transactions on Computer Human Interaction (TOCHI), 9(3), 171–193.
Ramakrishnan, A. S. (2011). Segmentation of hand gestures using motion capture data. Master’s thesis, University of California.
Rossini, N. (2004). The analysis of gesture: Establishing a set of parameters. In Gesture-Based Communication in Human-Computer Interaction: 5th International Gesture Workshop (pp. 124–131).
Rossini, N. (2012). Reinterpreting gestures as language: Language in action. Amsterdam: IOS Press.
Scherr, R. E. (2008). Gesture analysis for physics education researchers. Physical Review Special Topics Physics Education Research, 4(1), 010,101.
Sowa, T. (2008). The recognition and comprehension of hand gestures: a review and research agenda. In Proceedings of the Embodied Communication in Humans and Machines, 2nd ZiF Research Group International Conference on Modeling communication with Robots and Virtual Humans (pp. 38–56). Springer-Verlag, Berlin.
Theodoridis, T., & Hu, H. (2007). Action classification of 3d human models using dynamic anns for mobile robto surveillance. In International Conference on Robotics and Biomimetics (pp. 371–376).
Wilson, A., Bobick, A., & Cassell, J. (1996). Recovering the temporal structure of natural gesture. In Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (pp. 66 –71).
Xiong, Y., Quek, F., & McNeill, D. (2002). Hand gesture symmetric behavior detection and analysis in natural conversation. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, IEEE Computer Society (pp. 179–184).
Xiong, Y., & Quek, F. (2006). Hand motion gesture frequency properties and multimodal discourse analysis. International Journal of Computer Vision, 69(3), 353–371.
Yang, F.J. (2011). The talking hands? The relation between gesture and language in aphasic patients. Ph.D. thesis, University of Trento.
Acknowledgements
Renata C. B. Madeo thanks São Paulo Research Foundation for the financial support—process number 2011/04608-8.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Madeo, R.C.B., Lima, C.A.M. & Peres, S.M. Studies in automated hand gesture analysis: an overview of functional types and gesture phases. Lang Resources & Evaluation 51, 547–579 (2017). https://doi.org/10.1007/s10579-016-9373-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-016-9373-4