Skip to main content

HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12356))

Included in the following conference series:

Abstract

Predicting the class label from the partially observed activity sequence is a very hard task, as the observed early segments of different activities can be very similar. In this paper, we propose a novel Hardness-AwaRe Discrimination Network (HARD-Net) to specifically investigate the relationships between the similar activity pairs that are hard to be discriminated. Specifically, a Hard Instance-Interference Class (HI-IC) bank is designed, which dynamically records the hard similar pairs. Based on the HI-IC bank, a novel adversarial learning scheme is proposed to train our HARD-Net, which thus grants our network with the strong capability in mining subtle discrimination information for 3D early activity prediction. We evaluate our proposed HARD-Net on two public activity datasets and achieve state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aliakbarian, M., Saleh, F., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging LSTMs to anticipate actions very early (2017)

    Google Scholar 

  2. Cai, Y., Ge, L., Cai, J., Magnenat-Thalmann, N., Yuan, J.: 3D hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  3. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision, pp. 666–682 (2018)

    Google Scholar 

  4. Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2272–2281 (2019)

    Google Scholar 

  5. Cai, Y., Huang, L., Wang, Y., et al.: Learning progressive joint propagation for human motion prediction. In: Proceedings of the European Conference on Computer Vision (2020)

    Google Scholar 

  6. Felzenszwalb, P., Girshick, R., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–45 (2010)

    Article  Google Scholar 

  7. Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Predicting the future: a jointly learnt model for action anticipation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5570 (2019)

    Google Scholar 

  8. Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  9. Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (December 2015)

    Google Scholar 

  10. Hu, J.-F., Zheng, W.-S., Ma, L., Wang, G., Lai, J.: Real-time RGB-D activity prediction by soft regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 280–296. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_17

    Chapter  Google Scholar 

  11. Hu, J.F., Zheng, W.S., Ma, L., Wang, G., Lai, J., Zhang, J.: Early action prediction by soft regression. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2568–2583 (2018)

    Article  Google Scholar 

  12. Jain, A., Singh, A., Koppula, H., Soh, S., Saxena, A.: Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. Arxiv (2015)

    Google Scholar 

  13. Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Process. 29, 959–970 (2020)

    Article  MathSciNet  Google Scholar 

  14. Ke, Q., Bennamoun, M., An, S., Sohel, F.A., Boussaïd, F.: A new representation of skeleton sequences for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4570–4579 (2017)

    Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  16. Kong, Y., Tao, Z., Fu, Y.: Adversarial action prediction networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 539–553 (2020)

    Article  Google Scholar 

  17. Kong, Y., Gao, S., Sun, B., Fu, Y.: Action prediction from videos via memorizing hard-to-predict samples. In: AAAI (2018)

    Google Scholar 

  18. Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 596–611. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_39

  19. Kong, Y., Tao, Z., Fu, Y.: Deep sequential context networks for action prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3662–3670 (2017)

    Google Scholar 

  20. Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: IJCAI, pp. 786–792 (2018)

    Google Scholar 

  21. Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  22. Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)

    Article  Google Scholar 

  23. Liu, J., Shahroudy, A., Wang, G., Duan, L., Kot, A.C.: Skeleton-based online action prediction using scale selection network. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 1453–1467 (2020)

    Article  Google Scholar 

  24. Liu, J., Wang, G., Duan, L., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)

    Article  MathSciNet  Google Scholar 

  25. Liu, J., et al.: Feature boosting network for 3D pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 494–501 (2020)

    Article  Google Scholar 

  26. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50

    Chapter  Google Scholar 

  27. Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks (2015)

    Google Scholar 

  28. Lou, Y., Bai, Y., Liu, J., Wang, S., Duan, L.: Veri-wild: a large dataset and a new method for vehicle re-identification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3235–3243 (2019)

    Google Scholar 

  29. Moreno-Noguer, F.: 3D human pose estimation from a single image via distance matrix regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2823–2832 (2017)

    Google Scholar 

  30. Pang, G., Wang, X., Hu, J.F., Zhang, Q., Zheng, W.S.: DBDNet: learning bi-directional dynamics for early action prediction. In: IJCAI, pp. 897–903 (2019)

    Google Scholar 

  31. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1511.06434

  32. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)

    Google Scholar 

  33. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  34. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: CVPR (2019)

    Google Scholar 

  35. Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1304 (2011)

    Google Scholar 

  36. Shrivastava, A., Mulam, H., Girshick, R.: Training region-based object detectors with online hard example mining (2016)

    Google Scholar 

  37. Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  38. Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)

    Google Scholar 

  39. Wang, J., Liu, Z., Wu, Y., Yuan, J.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 914–927 (2013)

    Article  Google Scholar 

  40. Wang, X., Hu, J., Lai, J., Zhang, J., Zheng, W.: Progressive teacher-student learning for early action prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3551–3560 (2019)

    Google Scholar 

  41. Wang, X., Shrivastava, A., Gupta, A.: A-fast-rcnn: hard positive generation via adversary for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  42. Weng, J., Jiang, X., Zheng, W., Yuan, J.: Early action recognition with category exclusion using policy-based reinforcement learning. IEEE Trans. Circuits Syst. Video Technol. 1 (2020)

    Google Scholar 

  43. Xu, W., Yu, J., Miao, Z., Wan, L., Ji, Q.: Prediction-CGAN: human action prediction with conditional generative adversarial networks. In: Proceedings of the ACM International Conference on Multimedia (2019)

    Google Scholar 

  44. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)

    Google Scholar 

  45. Yang, S., Liu, J., Lu, S., Er, M.H., Kot, A.C.: Collaborative learning of gesture recognition and 3D hand pose estimation with multi-order feature analysis. In: Proceedings of the European Conference on Computer Vision (2020)

    Google Scholar 

  46. Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: AAAI (2016)

    Google Scholar 

Download references

Acknowledgement

This work is supported by SUTD Project PIE-SGP-Al-2020-02, SUTD Project SRG-ISTD-2020-153, the National Natural Science Foundation of China under Grant 61991411, and Grant U1913204, the National Key Research and Development Plan of China under Grant 2017YFB1300205, and the Shandong Major Scientific and Technological Innovation Project (MSTIP) under Grant 2018CXGC1503.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jun Liu or Wei Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, T., Liu, J., Zhang, W., Duan, L. (2020). HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12356. Springer, Cham. https://doi.org/10.1007/978-3-030-58621-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58621-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58620-1

  • Online ISBN: 978-3-030-58621-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics