Skip to main content

An Asymmetric Modeling for Action Assessment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12375))

Abstract

Action assessment is a task of assessing the performance of an action. It is widely applicable to many real-world scenarios such as medical treatment and sporting events. However, existing methods for action assessment are mostly limited to individual actions, especially lacking modeling of the asymmetric relations among agents (e.g., between persons and objects); and this limitation undermines their ability to assess actions containingasymmetrically interactive motion patterns, since there always exists subordination between agents in many interactive actions. In this work, we model the asymmetric interactions among agents for action assessment. In particular, we propose an asymmetric interaction module (AIM), to explicitly model asymmetric interactions between intelligent agents within an action, where we group these agents into a primary one (e.g., human) and secondary ones (e.g., objects). We perform experiments on JIGSAWS dataset containing surgical actions, and additionally collect a new dataset, TASD-2, for interactive sporting actions. The experimental results on two interactive action datasets show the effectiveness of our model, and our method achieves state-of-the-art performance. The extended experiment on AQA-7 dataset also demonstrates the generalization capability of our framework to conventional action assessment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For interactive actions involving more than two performers, the important people detection [11, 12, 23] can be utilized to divide performers into the primary (the most important one) and the secondary (the rest).

  2. 2.

    Details of data preprocessing can be found in the supplementary materials.

  3. 3.

    Videos can be found in the supplementary materials.

References

  1. Bertasius, G., Soo Park, H., Yu, S.X., Shi, J.: Am i a baller? basketball performance assessment from first-person videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2177–2185 (2017)

    Google Scholar 

  2. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  3. Chen, J., Wang, Y., Qin, J., Liu, L., Shao, L.: Fast person re-identification via cross-camera semantic binary transformation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  4. Doughty, H., Damen, D., Mayol-Cuevas, W.: Whoś better, whoś best: skill determination in video using deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  5. Doughty, H., Mayol-Cuevas, W., Damen, D.: The pros and cons: Rank-aware temporal attention for skill determination in long videos, June 2019

    Google Scholar 

  6. Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI Workshop: M2CAI, vol. 3, p. 3 (2014)

    Google Scholar 

  7. Gattupalli, S., Ebert, D., Papakostas, M., Makedon, F., Athitsos, V.: Cognilearn: a deep learning-based interface for cognitive behavior assessment. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 577–587. ACM (2017)

    Google Scholar 

  8. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. In: IET Conference Proceedings, vol. 5, pp. 850–855, January 1999

    Google Scholar 

  9. Ilg, W., Mezger, J., Giese, M.: Estimation of skill levels in sports based on hierarchical spatio-temporal correspondences. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 523–531. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45243-0_67

    Chapter  Google Scholar 

  10. Li, H., Cai, Y., Zheng, W.S.: Deep dual relation modeling for egocentric interaction recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  11. Li, W.H., Hong, F.T., Zheng, W.S.: Learning to learn relation for important people detection in still images. In: Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  12. Li, W.H., Li, B., Zheng, W.S.: Personrank: detecting important people in images. In: International Conference on Automatic Face & Gesture Recognition (FG 2018) (2018)

    Google Scholar 

  13. Malpani, A., Vedula, S.S., Chen, C.C.G., Hager, G.D.: Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 138–147. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07521-1_15

    Chapter  Google Scholar 

  14. Paiement, A., Tao, L., Hannuna, S., Camplani, M., Damen, D., Mirmehdi, M.: Online quality assessment of human movement from skeleton data. In: British Machine Vision Conference, pp. 153–166. BMVA Press (2014)

    Google Scholar 

  15. Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  16. Parmar, P., Morris, B.T.: What and how well you performed? a multitask learning approach to action quality assessment. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  17. Parmar, P., Tran Morris, B.: Learning to score olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)

    Google Scholar 

  18. Parmar, P., Tran Morris, B.: Action quality assessment across multiple actions. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1468–1476, January 2019. https://doi.org/10.1109/WACV.2019.00161

  19. Pérez, J.S., Meinhardt-Llopis, E., Facciolo, G.: Tv-l1 optical flow estimation. Image Processing On Line, pp. 137–150 (2013)

    Google Scholar 

  20. Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36

    Chapter  Google Scholar 

  21. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009)

    Article  Google Scholar 

  22. Sharma, Y., et al.: Video based assessment of osats using sequential motion textures. Georgia Institute of Technology (2014)

    Google Scholar 

  23. Solomon Mathialagan, C., Gallagher, A.C., Batra, D.: VIP: finding important people in images. In: Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  24. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  25. Wang, Z., Lu, J., Tao, C., Zhou, J., Tian, Q.: Learning channel-wise interactions for binary convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  26. Xu, C., Fu, Y., Zhang, B., Chen, Z., Jiang, Y.G., Xue, X.: Learning to score the figure skating sports videos. arXiv preprint arXiv:1802.02774 (2018)

  27. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  28. Zhang, Q., Li, B.: Video-based motion expertise analysis in simulation-based surgical training using hierarchical dirichlet process hidden markov model. In: Proceedings of the 2011 international ACM workshop on Medical multimedia analysis and retrieval, pp. 19–24. ACM (2011)

    Google Scholar 

  29. Zhang, Q., Li, B.: Relative hidden markov models for video-based evaluation of motion skills in surgical training. IEEE transactions on pattern analysis and machine intelligence 37(6), 1206–1218 (2015)

    Article  Google Scholar 

  30. Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int J CARS 13, 731–739 (2018)

    Article  Google Scholar 

  31. Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Clements, M.A., Essa, I.: Automated assessment of surgical skills using frequency analysis. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 430–438. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_53

    Chapter  Google Scholar 

  32. Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assisted Radiol. Surgery 13(3), 443–455 (2018)

    Article  Google Scholar 

  33. Zia, A., et al.: Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int. J. Comput. Assisted Radiol. Surgery 11(9), 1623–1636 (2016)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported partially by the National Key Research and Development Program of China (2018YFB1004903), NSFC(U1911401,U1811461), Guangdong Province Science and Technology Innovation Leading Talents (2016TX03X157), Guangdong NSF Project (No. 2018B030312002), Guangzhou Research Project (201902010037), and Research Projects of Zhejiang Lab (No. 2019KD0AB03).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei-Shi Zheng or Chengying Gao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 4 (pdf 283 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, J. et al. (2020). An Asymmetric Modeling for Action Assessment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12375. Springer, Cham. https://doi.org/10.1007/978-3-030-58577-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58577-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58576-1

  • Online ISBN: 978-3-030-58577-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics