Action Prediction Using Unsupervised Semantic Reasoning

Liu, Cuiwei; Lu, Yaguang; Shi, Xiangbin; Li, Zhaokui; Zhao, Liang

doi:10.1007/978-3-319-70090-8_50

Cuiwei Liu¹⁸,
Yaguang Lu¹⁹,
Xiangbin Shi^18,19,
Zhaokui Li¹⁸ &
…
Liang Zhao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10636))

Included in the following conference series:

International Conference on Neural Information Processing

4469 Accesses

Abstract

This paper aims to address the problem of predicting the category of an ongoing action in a video, which enables us to react as quickly as possible. Action prediction is a challenge problem since neither the complete semantic information nor the definite temporal progress can be obtained from a partially observed video. In this paper, we propose to predict action categories of unfinished videos by using semantic reasoning. For the purpose of exploiting mid-level semantics from videos, we present an unsupervised semantic mining approach which expresses an observed video as a sequence of semantic concepts and learns the context relationship of various concepts by using a General Mixture Transform Distribution model (GMTD). Then the invisible future semantic concepts can be automatically estimated from the observed semantic concept sequence. Finally, we develop a discriminative structural model that integrates video observations, observed semantic concepts, and inferred semantic concepts for early recognition of incomplete videos. Experimental results on the UT-Interaction dataset show that the proposed method is able to effectively predict the action category of an unfinished video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic Sequence Analysis for Human Activity Prediction

A Hierarchical Video Description for Complex Activity Understanding

Article 22 March 2016

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

References

Berchtold, A., Raftery, A.E.: The mixture transition distribution model for high-order Markov chains and non-Gaussian time series. Stat. Sci. 17(3), 328–356 (2002)
Article MATH MathSciNet Google Scholar
Cao, Y., Wang, S., Barrett, D., Barbu, A., Narayanaswamy, S., Yu, H., Michaux, A., Lin, Y., Dickinson, S., Siskind, J.M.: Recognize human activities from partially observed videos. In: CVPR, pp. 2658–2665 (2013)
Google Scholar
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, vol. 2, pp. 726–733 (2003)
Google Scholar
Hu, J.-F., Zheng, W.-S., Ma, L., Wang, G., Lai, J.: Real-time RGB-D activity prediction by soft regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 280–296. Springer, Cham (2016). doi:10.1007/978-3-319-46448-0_17
Chapter Google Scholar
Izadinia, H., Shah, M.: Recognizing complex events using large margin joint low-level event model. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 430–444. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_31
Chapter Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)
Google Scholar
Kong, Y., Fu, Y.: Max-margin action prediction machine. T-PAMI 38(9), 1844–1858 (2015)
Article MathSciNet Google Scholar
Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 596–611. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_39
Google Scholar
Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_45
Google Scholar
Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. T-PAMI 36(8), 1644–1657 (2014)
Article Google Scholar
Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 286–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_21
Chapter Google Scholar
Liu, C., Wu, X., Jia, Y.: A hierarchical video description for complex activity understanding. IJCV 118(2), 240–255 (2016)
Article MathSciNet Google Scholar
Pirsiavash, H., Ramanan, D.: Parsing videos of actions with segmental grammars. In: CVPR, pp. 612–619 (2014)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: UT-interaction dataset, ICPR contest on Semantic Description of Human Activities (SDHA) (2010). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Sun, C., Nevatia, R.: Active: activity concept transitions in video event classification. In: ICCV, pp. 913–920 (2013)
Google Scholar
Tang, K., Li, F.F., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR, pp. 1250–1257 (2012)
Google Scholar
Wang, H., Yang, W., Yuan, C., Ling, H., Hu, W.: Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225, 139–147 (2017)
Article Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, L., Qiao, Y., Tang, X.: Latent hierarchical model of temporal structure for complex activity classification. T-IP 23(2), 810–22 (2014)
Article MathSciNet Google Scholar
Wang, L., Qiao, Y., Tang, X.: Mining motion atoms and phrases for complex action recognition. In: ICCV, pp. 2680–2687 (2013)
Google Scholar
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR, pp. 489–496 (2011)
Google Scholar
Xu, Z., Qing, L., Miao, J.: Activity auto-completion: predicting human activities from partial videos. In: ICCV, pp. 3191–3199 (2015)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61602320 and No. 61170185, Liaoning Doctoral Startup Project under Grant No. 201601172 and No. 201601180, Foundation of Liaoning Education al Committee under Grant No. L201607 and No. L2015403, and the Young Scholars Research Fund of SAU under Grants No. 15YB37.

Author information

Authors and Affiliations

School of Computer Science, Shenyang Aerospace University, Shenyang, Liaoning, People’s Republic of China
Cuiwei Liu, Xiangbin Shi, Zhaokui Li & Liang Zhao
School of Information, Liaoning University, Shenyang, Liaoning, People’s Republic of China
Yaguang Lu & Xiangbin Shi

Authors

Cuiwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yaguang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangbin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhaokui Li
View author publications
You can also search for this author in PubMed Google Scholar
Liang Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cuiwei Liu .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Lu, Y., Shi, X., Li, Z., Zhao, L. (2017). Action Prediction Using Unsupervised Semantic Reasoning. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-70090-8_50
Published: 28 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics