Skip to main content

Action Prediction Using Unsupervised Semantic Reasoning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10636))

Included in the following conference series:

Abstract

This paper aims to address the problem of predicting the category of an ongoing action in a video, which enables us to react as quickly as possible. Action prediction is a challenge problem since neither the complete semantic information nor the definite temporal progress can be obtained from a partially observed video. In this paper, we propose to predict action categories of unfinished videos by using semantic reasoning. For the purpose of exploiting mid-level semantics from videos, we present an unsupervised semantic mining approach which expresses an observed video as a sequence of semantic concepts and learns the context relationship of various concepts by using a General Mixture Transform Distribution model (GMTD). Then the invisible future semantic concepts can be automatically estimated from the observed semantic concept sequence. Finally, we develop a discriminative structural model that integrates video observations, observed semantic concepts, and inferred semantic concepts for early recognition of incomplete videos. Experimental results on the UT-Interaction dataset show that the proposed method is able to effectively predict the action category of an unfinished video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Berchtold, A., Raftery, A.E.: The mixture transition distribution model for high-order Markov chains and non-Gaussian time series. Stat. Sci. 17(3), 328–356 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Cao, Y., Wang, S., Barrett, D., Barbu, A., Narayanaswamy, S., Yu, H., Michaux, A., Lin, Y., Dickinson, S., Siskind, J.M.: Recognize human activities from partially observed videos. In: CVPR, pp. 2658–2665 (2013)

    Google Scholar 

  3. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, vol. 2, pp. 726–733 (2003)

    Google Scholar 

  4. Hu, J.-F., Zheng, W.-S., Ma, L., Wang, G., Lai, J.: Real-time RGB-D activity prediction by soft regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 280–296. Springer, Cham (2016). doi:10.1007/978-3-319-46448-0_17

    Chapter  Google Scholar 

  5. Izadinia, H., Shah, M.: Recognizing complex events using large margin joint low-level event model. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 430–444. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_31

    Chapter  Google Scholar 

  6. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)

    Google Scholar 

  7. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)

    Google Scholar 

  8. Kong, Y., Fu, Y.: Max-margin action prediction machine. T-PAMI 38(9), 1844–1858 (2015)

    Article  MathSciNet  Google Scholar 

  9. Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 596–611. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_39

    Google Scholar 

  10. Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_45

    Google Scholar 

  11. Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. T-PAMI 36(8), 1644–1657 (2014)

    Article  Google Scholar 

  12. Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 286–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_21

    Chapter  Google Scholar 

  13. Liu, C., Wu, X., Jia, Y.: A hierarchical video description for complex activity understanding. IJCV 118(2), 240–255 (2016)

    Article  MathSciNet  Google Scholar 

  14. Pirsiavash, H., Ramanan, D.: Parsing videos of actions with segmental grammars. In: CVPR, pp. 612–619 (2014)

    Google Scholar 

  15. Ryoo, M.S., Aggarwal, J.K.: UT-interaction dataset, ICPR contest on Semantic Description of Human Activities (SDHA) (2010). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html

  16. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011)

    Google Scholar 

  17. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)

    Google Scholar 

  18. Sun, C., Nevatia, R.: Active: activity concept transitions in video event classification. In: ICCV, pp. 913–920 (2013)

    Google Scholar 

  19. Tang, K., Li, F.F., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR, pp. 1250–1257 (2012)

    Google Scholar 

  20. Wang, H., Yang, W., Yuan, C., Ling, H., Hu, W.: Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225, 139–147 (2017)

    Article  Google Scholar 

  21. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  22. Wang, L., Qiao, Y., Tang, X.: Latent hierarchical model of temporal structure for complex activity classification. T-IP 23(2), 810–22 (2014)

    Article  MathSciNet  Google Scholar 

  23. Wang, L., Qiao, Y., Tang, X.: Mining motion atoms and phrases for complex action recognition. In: ICCV, pp. 2680–2687 (2013)

    Google Scholar 

  24. Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR, pp. 489–496 (2011)

    Google Scholar 

  25. Xu, Z., Qing, L., Miao, J.: Activity auto-completion: predicting human activities from partial videos. In: ICCV, pp. 3191–3199 (2015)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61602320 and No. 61170185, Liaoning Doctoral Startup Project under Grant No. 201601172 and No. 201601180, Foundation of Liaoning Education al Committee under Grant No. L201607 and No. L2015403, and the Young Scholars Research Fund of SAU under Grants No. 15YB37.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cuiwei Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Liu, C., Lu, Y., Shi, X., Li, Z., Zhao, L. (2017). Action Prediction Using Unsupervised Semantic Reasoning. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70090-8_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70089-2

  • Online ISBN: 978-3-319-70090-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics