Skip to main content

Global and Local C3D Ensemble System for First Person Interactive Action Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Abstract

Action recognition in first person videos is different from that in third person videos. In this paper, we aim to recognize interactive actions in first person videos. First person interactive actions contain two kinds of motion which are the ego-motion from the observer and the motion from the actor. To enable an observer to understand “what activity others are doing to me”, we propose a twin stream network architecture based on 3D convolution networks. The global action C3D learns interactions with ego-motion and the local salient motion C3D analyzes the motion from the actor in a salient region, especially when the action happens at a distance from the observer. We also propose a sampling method to extract clips as input to the C3D models and investigate different C3D architectures to improve the performance of C3D. We carry out experiments on the benchmark of JPL first-person interaction dataset. Experiment results prove that the ensemble of global and local networks can increase the accuracy over the state-of-the-art methods by 3.26%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)

    Google Scholar 

  2. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  3. Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  4. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition, pp. 357–360 (2007)

    Google Scholar 

  5. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks, pp. 4489–4497 (2014)

    Google Scholar 

  6. Singh, S., Arora, C., Jawahar, C.V.: First person action recognition using deep learned descriptors. In: Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  7. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  9. Fathi, A., Ren, X., Rehg, J.M.: Learning to recognize objects in egocentric activities. In: Computer Vision and Pattern Recognition, pp. 3281–3288 (2011)

    Google Scholar 

  10. Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_23

    Chapter  Google Scholar 

  11. Ma, M., Fan, H., Kitani, K.M.: Going deeper into first-person activity recognition, pp. 1894–1903 (2016)

    Google Scholar 

  12. Poleg, Y., Ephrat, A., Peleg, S., Arora, C.: Compact CNN for indexing egocentric videos. Computer Science, pp. 1–9 (2016)

    Google Scholar 

  13. Lee, J., Ryoo, M.S.: Learning robot activities from first-person human videos using convolutional future regression (2017)

    Google Scholar 

  14. Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: Computer Vision and Pattern Recognition, pp. 3241–3248 (2011)

    Google Scholar 

  15. Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2730–2737 (2013)

    Google Scholar 

  16. Choi, J., Jeon, W.J., Lee, S.C.: Spatio-temporal pyramid matching for sports videos. In: ACM International Conference on Multimedia Information Retrieval, pp. 291–297 (2008)

    Google Scholar 

  17. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: IEEE International Conference on Computer Vision, pp. 1036–1043 (2012)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the 973 Program (Project No. 2014CB347600), the Natural Science Foundation of Jiangsu Province (Grant No. BK20170856), the National Nature Science Foundation of China (Grant Nos. 61672285 and 61702265) and CCF-Tencent Open Research Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fa, L., Song, Y., Shu, X. (2018). Global and Local C3D Ensemble System for First Person Interactive Action Recognition. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73600-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73599-3

  • Online ISBN: 978-3-319-73600-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics