Interaction Recognition Through Body Parts Relation Reasoning

Perez, Mauricio; Liu, Jun; Kot, Alex C.

doi:10.1007/978-3-030-41404-7_19

Interaction Recognition Through Body Parts Relation Reasoning

Conference paper
First Online: 23 February 2020

1468 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12046))

Abstract

Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. It begins with solutions based on carefully designed local-points and hand-crafted features, and then progresses to deep learning architectures, such as CNNs and LSTMS. These solutions often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. Our contribution with this work is by proposing a more simple yet very powerful architecture, named Interaction Relational Network, which utilizes minimal prior knowledge about the structure of the data. We drive the network to learn to identify how to relate the body parts of the persons interacting, in order to better discriminate among the possible interactions. By breaking down the body parts through the frames as sets of independent joints, and with a few augmentations to our architecture to explicitly extract meaningful extra information from each pair of joints, our solution is able to achieve state-of-the-art performance on the traditional interaction recognition dataset SBU, and also on the mutual actions from the large-scale dataset NTU RGB+D.

A. C. Kot—This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the National Research Foundation, Singapore, and the Infocomm Media Development Authority, Singapore.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging LSTMs to anticipate actions very early. In: IEEE International Conference on Computer Vision (ICCV), pp. 280–289 (2017)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields, pp. 1–14. arXiv preprint arXiv:1812.08008 (2018)
Chowdhury, M.I.H., Nguyen, K., Sridharan, S., Fookes, C.: Hierarchical relational attention for video question answering. In: IEEE International Conference on Image Processing (ICIP), pp. 599–603 (2018)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2625–2634 (2015)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118 (2015)
Google Scholar
Ibrahim, M.S., Mori, G.: Hierarchical relational networks for group activity recognition and retrieval. In: Springer European Conference on Computer Vision (ECCV), pp. 721–736 (2018)
Chapter Google Scholar
Ji, Y., Cheng, H., Zheng, Y., Li, H.: Learning contrastive feature distribution model for interaction recognition. J. Vis. Commun. Image Represent. 33, 340–349 (2015)
Article Google Scholar
Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6 (2014)
Google Scholar
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Leveraging structural context models and ranking score fusion for human interaction prediction. IEEE Trans. Multimedia (TMM) 20(7), 1712–1723 (2018)
Article Google Scholar
Li, W., Wen, L., Chuah, M.C., Lyu, S.: Category-blind human action recognition: a practical recognition system. In: IEEE International Conference on Computer Vision (ICCV), pp. 4444–4452, December 2015
Google Scholar
Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)
Article Google Scholar
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50
Chapter Google Scholar
Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. (TIP) 27(4), 1586–1599 (2018)
Article MathSciNet Google Scholar
Liu, J., Wang, G., Duan, L.Y., Hu, P., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647–1656 (2017)
Google Scholar
Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2650–2657 (2013)
Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming video. In: IEEE International Conference on Computer Vision (ICCV), pp. 1036–1043 (2011)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match : video structure comparison for recognition of complex human activities. In: IEEE International Conference on Computer Vision (ICCV), pp. 1593–1600 (2009)
Google Scholar
Santoro, A., et al.: A simple neural network module for relational reasoning. In: Advances in Neural Information Processing Systems (NIPS), pp. 4967–4976 (2017)
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019 (2016)
Google Scholar
Shi, Y., Fernando, B., Hartley, R.: Action anticipation with RBF kernelized feature mapping RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 305–322. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_19
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (NIPS), pp. 568–576 (2014)
Google Scholar
Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: IEEE International Conference on Computer Vision, pp. 1729–1736 (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558 (2013)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Wang, X., Ji, Q.: Hierarchical context modeling for video event recognition. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(9), 1770–1782 (2017)
Article Google Scholar
Wu, H., Shao, J., Xu, X., Ji, Y., Shen, F., Shen, H.T.: Recognition and detection of two-person interactive actions using automatically selected skeleton features. IEEE Trans. Hum.-Mach. Syst. 48(3), 304–310 (2018)
Article Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 28–35 (2012)
Google Scholar
Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_51
Chapter Google Scholar
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
Article Google Scholar
Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI, vol. 2, pp. 3697–3703 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
Mauricio Perez, Jun Liu & Alex C. Kot

Authors

Mauricio Perez
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Alex C. Kot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mauricio Perez .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
Consiglio Nazionale delle Ricerche, ICAR, Naples, Italy
Gabriella Sanniti di Baja
Chinese Academy of Sciences, Beijing, China
Liang Wang
Auckland University of Technology, Auckland, New Zealand
Wei Qi Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perez, M., Liu, J., Kot, A.C. (2020). Interaction Recognition Through Body Parts Relation Reasoning. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-41404-7_19
Published: 23 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics