Skip to main content

Interaction Recognition Through Body Parts Relation Reasoning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12046))

Abstract

Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. It begins with solutions based on carefully designed local-points and hand-crafted features, and then progresses to deep learning architectures, such as CNNs and LSTMS. These solutions often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. Our contribution with this work is by proposing a more simple yet very powerful architecture, named Interaction Relational Network, which utilizes minimal prior knowledge about the structure of the data. We drive the network to learn to identify how to relate the body parts of the persons interacting, in order to better discriminate among the possible interactions. By breaking down the body parts through the frames as sets of independent joints, and with a few augmentations to our architecture to explicitly extract meaningful extra information from each pair of joints, our solution is able to achieve state-of-the-art performance on the traditional interaction recognition dataset SBU, and also on the mutual actions from the large-scale dataset NTU RGB+D.

A. C. Kot—This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the National Research Foundation, Singapore, and the Infocomm Media Development Authority, Singapore.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging LSTMs to anticipate actions very early. In: IEEE International Conference on Computer Vision (ICCV), pp. 280–289 (2017)

    Google Scholar 

  2. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields, pp. 1–14. arXiv preprint arXiv:1812.08008 (2018)

  3. Chowdhury, M.I.H., Nguyen, K., Sridharan, S., Fookes, C.: Hierarchical relational attention for video question answering. In: IEEE International Conference on Image Processing (ICIP), pp. 599–603 (2018)

    Google Scholar 

  4. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2625–2634 (2015)

    Google Scholar 

  5. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118 (2015)

    Google Scholar 

  6. Ibrahim, M.S., Mori, G.: Hierarchical relational networks for group activity recognition and retrieval. In: Springer European Conference on Computer Vision (ECCV), pp. 721–736 (2018)

    Chapter  Google Scholar 

  7. Ji, Y., Cheng, H., Zheng, Y., Li, H.: Learning contrastive feature distribution model for interaction recognition. J. Vis. Commun. Image Represent. 33, 340–349 (2015)

    Article  Google Scholar 

  8. Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6 (2014)

    Google Scholar 

  9. Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Leveraging structural context models and ranking score fusion for human interaction prediction. IEEE Trans. Multimedia (TMM) 20(7), 1712–1723 (2018)

    Article  Google Scholar 

  10. Li, W., Wen, L., Chuah, M.C., Lyu, S.: Category-blind human action recognition: a practical recognition system. In: IEEE International Conference on Computer Vision (ICCV), pp. 4444–4452, December 2015

    Google Scholar 

  11. Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)

    Article  Google Scholar 

  12. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50

    Chapter  Google Scholar 

  13. Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. (TIP) 27(4), 1586–1599 (2018)

    Article  MathSciNet  Google Scholar 

  14. Liu, J., Wang, G., Duan, L.Y., Hu, P., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647–1656 (2017)

    Google Scholar 

  15. Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2650–2657 (2013)

    Google Scholar 

  16. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming video. In: IEEE International Conference on Computer Vision (ICCV), pp. 1036–1043 (2011)

    Google Scholar 

  17. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match : video structure comparison for recognition of complex human activities. In: IEEE International Conference on Computer Vision (ICCV), pp. 1593–1600 (2009)

    Google Scholar 

  18. Santoro, A., et al.: A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems (NIPS), pp. 4967–4976 (2017)

    Google Scholar 

  19. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019 (2016)

    Google Scholar 

  20. Shi, Y., Fernando, B., Hartley, R.: Action anticipation with RBF kernelized feature mapping RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 305–322. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_19

    Chapter  Google Scholar 

  21. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (NIPS), pp. 568–576 (2014)

    Google Scholar 

  22. Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: IEEE International Conference on Computer Vision, pp. 1729–1736 (2011)

    Google Scholar 

  23. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558 (2013)

    Google Scholar 

  24. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  25. Wang, X., Ji, Q.: Hierarchical context modeling for video event recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(9), 1770–1782 (2017)

    Article  Google Scholar 

  26. Wu, H., Shao, J., Xu, X., Ji, Y., Shen, F., Shen, H.T.: Recognition and detection of two-person interactive actions using automatically selected skeleton features. IEEE Trans. Hum.-Mach. Syst. 48(3), 304–310 (2018)

    Article  Google Scholar 

  27. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 28–35 (2012)

    Google Scholar 

  28. Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_51

    Chapter  Google Scholar 

  29. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)

    Article  Google Scholar 

  30. Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI, vol. 2, pp. 3697–3703 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauricio Perez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Perez, M., Liu, J., Kot, A.C. (2020). Interaction Recognition Through Body Parts Relation Reasoning. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41404-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41403-0

  • Online ISBN: 978-3-030-41404-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics