Abstract
We examine whether 3D pose and face features can be used to both learn and recognize different conversational interactions. We believe this to be among the first work devoted to this subject and show that this task is indeed possible with a promising degree of accuracy using both features derived from pose and face. To extract 3D pose we use the Kinect Sensor, and we use a combined local and global model to extract face features from normal RGB cameras. We show that whilst both of these features are contaminated with noises. They can still be used to effectively train classifiers. The differences in interaction among different scenarios in our data set are extremely subtle. Both generative and discriminative methods are investigated, and a subject specific supervised learning approach is employed to classify the testing sequences to seven different conversational scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Computing Survey 43(16), 1–43 (2011)
Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: BMVC (2011)
Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs fisherfaces: recognition using class specific linear projection. IEEE T-PAMI 19(7), 711–720 (1997)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. of Machine Learning Research 3, 993–1022 (2003)
Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching TV (using weakly aligned subtitles). In: CVPR (2009)
Cootes, T., Edward, G., Taylor, C.: Active appearance models. IEEE T-PAMI 23(6), 681–685 (2001)
Cristinacce, D., Cootes, T.: Automatic feature localisation with constrained local models. PR 41, 3054–3067 (2008)
Daubney, B., Xie, X.: Entropy driven hierarchical search for 3d human pose estimation. In: BMVC, pp. 1–11 (2011)
Daubney, B., Xie, X.: Tracking 3d human pose with large root node uncertainty. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1321–1328 (June 2011)
Deng, J., Xie, X., Daubney, B.: A bag of words approach to subject specific 3d human pose interaction classification with random decision forests. Graphical Models 76(3), 162–171 (2014)
Deng, J., Xie, X., Daubney, B., Fang, H., Grant, P.W.: Recognizing conversational interaction based on 3D human pose. In: Blanc-Talon, J., Kasinski, A., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2013. LNCS, vol. 8192, pp. 138–149. Springer, Heidelberg (2013)
Fang, H., Deng, J., Xie, X., Grant, P.: From clamped local shape models to global shape model. In: IEEE ICIP, pp. 3513–3517 (September 2013)
Friedman, J., Hastie, T., Tibshirani, R.: Addictive logistic regression: a statistical view of boosting. Annals of Statistics 28, 337–407 (2000)
Gee, A.H., Cipolla, R.: Determining the gaze of faces in images. IVC 12, 639–647 (1994)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Kovar, L., Gleicher, M.: Automated extraction and parameterization of motions in large data sets. ACM ToG 23(3), 559–568 (2004)
Müller, M., Röder, T., Clausen, M.: Efficient content-based retrieval of motion capture data. ACM ToG 24(3), 677–685 (2005)
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)
Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE T-PAMI 22(8), 831–843 (2000)
Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2), 137–154 (2004)
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Modeling individual and group actions in meetings with layered hmms. IEEE Multimedia 8(3), 509–520 (2006)
Zhou, S.M., Lyons, R.A., Bodger, O., Demmler, J.C., Atkinson, M.A.: Svm with entropy regularization and particle swarm optimization for identifying childrens health and socioeconomic determinants of education attainments using linked datasets. In: IEEE Inter. Conf. Neural Networks, pp. 3867–3874 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Deng, J., Xie, X., Zhou, S. (2014). Conversational Interaction Recognition Based on Bodily and Facial Movement. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8814. Springer, Cham. https://doi.org/10.1007/978-3-319-11758-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-11758-4_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11757-7
Online ISBN: 978-3-319-11758-4
eBook Packages: Computer ScienceComputer Science (R0)