Abstract
Action assessment is a task of assessing the performance of an action. It is widely applicable to many real-world scenarios such as medical treatment and sporting events. However, existing methods for action assessment are mostly limited to individual actions, especially lacking modeling of the asymmetric relations among agents (e.g., between persons and objects); and this limitation undermines their ability to assess actions containingasymmetrically interactive motion patterns, since there always exists subordination between agents in many interactive actions. In this work, we model the asymmetric interactions among agents for action assessment. In particular, we propose an asymmetric interaction module (AIM), to explicitly model asymmetric interactions between intelligent agents within an action, where we group these agents into a primary one (e.g., human) and secondary ones (e.g., objects). We perform experiments on JIGSAWS dataset containing surgical actions, and additionally collect a new dataset, TASD-2, for interactive sporting actions. The experimental results on two interactive action datasets show the effectiveness of our model, and our method achieves state-of-the-art performance. The extended experiment on AQA-7 dataset also demonstrates the generalization capability of our framework to conventional action assessment.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Details of data preprocessing can be found in the supplementary materials.
- 3.
Videos can be found in the supplementary materials.
References
Bertasius, G., Soo Park, H., Yu, S.X., Shi, J.: Am i a baller? basketball performance assessment from first-person videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2177–2185 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Chen, J., Wang, Y., Qin, J., Liu, L., Shao, L.: Fast person re-identification via cross-camera semantic binary transformation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Doughty, H., Damen, D., Mayol-Cuevas, W.: Whoś better, whoś best: skill determination in video using deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Doughty, H., Mayol-Cuevas, W., Damen, D.: The pros and cons: Rank-aware temporal attention for skill determination in long videos, June 2019
Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI Workshop: M2CAI, vol. 3, p. 3 (2014)
Gattupalli, S., Ebert, D., Papakostas, M., Makedon, F., Athitsos, V.: Cognilearn: a deep learning-based interface for cognitive behavior assessment. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 577–587. ACM (2017)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. In: IET Conference Proceedings, vol. 5, pp. 850–855, January 1999
Ilg, W., Mezger, J., Giese, M.: Estimation of skill levels in sports based on hierarchical spatio-temporal correspondences. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 523–531. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45243-0_67
Li, H., Cai, Y., Zheng, W.S.: Deep dual relation modeling for egocentric interaction recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Li, W.H., Hong, F.T., Zheng, W.S.: Learning to learn relation for important people detection in still images. In: Computer Vision and Pattern Recognition (2019)
Li, W.H., Li, B., Zheng, W.S.: Personrank: detecting important people in images. In: International Conference on Automatic Face & Gesture Recognition (FG 2018) (2018)
Malpani, A., Vedula, S.S., Chen, C.C.G., Hager, G.D.: Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 138–147. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07521-1_15
Paiement, A., Tao, L., Hannuna, S., Camplani, M., Damen, D., Mirmehdi, M.: Online quality assessment of human movement from skeleton data. In: British Machine Vision Conference, pp. 153–166. BMVA Press (2014)
Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Parmar, P., Morris, B.T.: What and how well you performed? a multitask learning approach to action quality assessment. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Parmar, P., Tran Morris, B.: Learning to score olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Parmar, P., Tran Morris, B.: Action quality assessment across multiple actions. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1468–1476, January 2019. https://doi.org/10.1109/WACV.2019.00161
Pérez, J.S., Meinhardt-Llopis, E., Facciolo, G.: Tv-l1 optical flow estimation. Image Processing On Line, pp. 137–150 (2013)
Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009)
Sharma, Y., et al.: Video based assessment of osats using sequential motion textures. Georgia Institute of Technology (2014)
Solomon Mathialagan, C., Gallagher, A.C., Batra, D.: VIP: finding important people in images. In: Computer Vision and Pattern Recognition (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, Z., Lu, J., Tao, C., Zhou, J., Tian, Q.: Learning channel-wise interactions for binary convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Xu, C., Fu, Y., Zhang, B., Chen, Z., Jiang, Y.G., Xue, X.: Learning to score the figure skating sports videos. arXiv preprint arXiv:1802.02774 (2018)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Zhang, Q., Li, B.: Video-based motion expertise analysis in simulation-based surgical training using hierarchical dirichlet process hidden markov model. In: Proceedings of the 2011 international ACM workshop on Medical multimedia analysis and retrieval, pp. 19–24. ACM (2011)
Zhang, Q., Li, B.: Relative hidden markov models for video-based evaluation of motion skills in surgical training. IEEE transactions on pattern analysis and machine intelligence 37(6), 1206–1218 (2015)
Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int J CARS 13, 731–739 (2018)
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Clements, M.A., Essa, I.: Automated assessment of surgical skills using frequency analysis. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 430–438. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_53
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assisted Radiol. Surgery 13(3), 443–455 (2018)
Zia, A., et al.: Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int. J. Comput. Assisted Radiol. Surgery 11(9), 1623–1636 (2016)
Acknowledgement
This work was supported partially by the National Key Research and Development Program of China (2018YFB1004903), NSFC(U1911401,U1811461), Guangdong Province Science and Technology Innovation Leading Talents (2016TX03X157), Guangdong NSF Project (No. 2018B030312002), Guangzhou Research Project (201902010037), and Research Projects of Zhejiang Lab (No. 2019KD0AB03).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, J. et al. (2020). An Asymmetric Modeling for Action Assessment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12375. Springer, Cham. https://doi.org/10.1007/978-3-030-58577-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-58577-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58576-1
Online ISBN: 978-3-030-58577-8
eBook Packages: Computer ScienceComputer Science (R0)