Skip to main content

Conversational Interaction Recognition Based on Bodily and Facial Movement

  • Conference paper
  • First Online:
Image Analysis and Recognition (ICIAR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8814))

Included in the following conference series:

  • 2153 Accesses


We examine whether 3D pose and face features can be used to both learn and recognize different conversational interactions. We believe this to be among the first work devoted to this subject and show that this task is indeed possible with a promising degree of accuracy using both features derived from pose and face. To extract 3D pose we use the Kinect Sensor, and we use a combined local and global model to extract face features from normal RGB cameras. We show that whilst both of these features are contaminated with noises. They can still be used to effectively train classifiers. The differences in interaction among different scenarios in our data set are extremely subtle. Both generative and discriminative methods are investigated, and a subject specific supervised learning approach is employed to classify the testing sequences to seven different conversational scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Computing Survey 43(16), 1–43 (2011)

    Article  Google Scholar 

  2. Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: BMVC (2011)

    Google Scholar 

  3. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs fisherfaces: recognition using class specific linear projection. IEEE T-PAMI 19(7), 711–720 (1997)

    Article  Google Scholar 

  4. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching TV (using weakly aligned subtitles). In: CVPR (2009)

    Google Scholar 

  6. Cootes, T., Edward, G., Taylor, C.: Active appearance models. IEEE T-PAMI 23(6), 681–685 (2001)

    Article  Google Scholar 

  7. Cristinacce, D., Cootes, T.: Automatic feature localisation with constrained local models. PR 41, 3054–3067 (2008)

    Article  MATH  Google Scholar 

  8. Daubney, B., Xie, X.: Entropy driven hierarchical search for 3d human pose estimation. In: BMVC, pp. 1–11 (2011)

    Google Scholar 

  9. Daubney, B., Xie, X.: Tracking 3d human pose with large root node uncertainty. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1321–1328 (June 2011)

    Google Scholar 

  10. Deng, J., Xie, X., Daubney, B.: A bag of words approach to subject specific 3d human pose interaction classification with random decision forests. Graphical Models 76(3), 162–171 (2014)

    Article  Google Scholar 

  11. Deng, J., Xie, X., Daubney, B., Fang, H., Grant, P.W.: Recognizing conversational interaction based on 3D human pose. In: Blanc-Talon, J., Kasinski, A., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2013. LNCS, vol. 8192, pp. 138–149. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Fang, H., Deng, J., Xie, X., Grant, P.: From clamped local shape models to global shape model. In: IEEE ICIP, pp. 3513–3517 (September 2013)

    Google Scholar 

  13. Friedman, J., Hastie, T., Tibshirani, R.: Addictive logistic regression: a statistical view of boosting. Annals of Statistics 28, 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gee, A.H., Cipolla, R.: Determining the gaze of faces in images. IVC 12, 639–647 (1994)

    Article  Google Scholar 

  15. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)

    Article  Google Scholar 

  16. Kovar, L., Gleicher, M.: Automated extraction and parameterization of motions in large data sets. ACM ToG 23(3), 559–568 (2004)

    Article  Google Scholar 

  17. Müller, M., Röder, T., Clausen, M.: Efficient content-based retrieval of motion capture data. ACM ToG 24(3), 677–685 (2005)

    Article  Google Scholar 

  18. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)

    Article  Google Scholar 

  19. Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE T-PAMI 22(8), 831–843 (2000)

    Article  Google Scholar 

  20. Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2), 137–154 (2004)

    Article  Google Scholar 

  21. Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Modeling individual and group actions in meetings with layered hmms. IEEE Multimedia 8(3), 509–520 (2006)

    Article  Google Scholar 

  22. Zhou, S.M., Lyons, R.A., Bodger, O., Demmler, J.C., Atkinson, M.A.: Svm with entropy regularization and particle swarm optimization for identifying childrens health and socioeconomic determinants of education attainments using linked datasets. In: IEEE Inter. Conf. Neural Networks, pp. 3867–3874 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xianghua Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Deng, J., Xie, X., Zhou, S. (2014). Conversational Interaction Recognition Based on Bodily and Facial Movement. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8814. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11757-7

  • Online ISBN: 978-3-319-11758-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics