Abstract
This paper aims to address the problem of predicting the category of an ongoing action in a video, which enables us to react as quickly as possible. Action prediction is a challenge problem since neither the complete semantic information nor the definite temporal progress can be obtained from a partially observed video. In this paper, we propose to predict action categories of unfinished videos by using semantic reasoning. For the purpose of exploiting mid-level semantics from videos, we present an unsupervised semantic mining approach which expresses an observed video as a sequence of semantic concepts and learns the context relationship of various concepts by using a General Mixture Transform Distribution model (GMTD). Then the invisible future semantic concepts can be automatically estimated from the observed semantic concept sequence. Finally, we develop a discriminative structural model that integrates video observations, observed semantic concepts, and inferred semantic concepts for early recognition of incomplete videos. Experimental results on the UT-Interaction dataset show that the proposed method is able to effectively predict the action category of an unfinished video.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Berchtold, A., Raftery, A.E.: The mixture transition distribution model for high-order Markov chains and non-Gaussian time series. Stat. Sci. 17(3), 328–356 (2002)
Cao, Y., Wang, S., Barrett, D., Barbu, A., Narayanaswamy, S., Yu, H., Michaux, A., Lin, Y., Dickinson, S., Siskind, J.M.: Recognize human activities from partially observed videos. In: CVPR, pp. 2658–2665 (2013)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, vol. 2, pp. 726–733 (2003)
Hu, J.-F., Zheng, W.-S., Ma, L., Wang, G., Lai, J.: Real-time RGB-D activity prediction by soft regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 280–296. Springer, Cham (2016). doi:10.1007/978-3-319-46448-0_17
Izadinia, H., Shah, M.: Recognizing complex events using large margin joint low-level event model. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 430–444. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_31
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732 (2014)
Kong, Y., Fu, Y.: Max-margin action prediction machine. T-PAMI 38(9), 1844–1858 (2015)
Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 596–611. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_39
Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_45
Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. T-PAMI 36(8), 1644–1657 (2014)
Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 286–299. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_21
Liu, C., Wu, X., Jia, Y.: A hierarchical video description for complex activity understanding. IJCV 118(2), 240–255 (2016)
Pirsiavash, H., Ramanan, D.: Parsing videos of actions with segmental grammars. In: CVPR, pp. 612–619 (2014)
Ryoo, M.S., Aggarwal, J.K.: UT-interaction dataset, ICPR contest on Semantic Description of Human Activities (SDHA) (2010). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Sun, C., Nevatia, R.: Active: activity concept transitions in video event classification. In: ICCV, pp. 913–920 (2013)
Tang, K., Li, F.F., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR, pp. 1250–1257 (2012)
Wang, H., Yang, W., Yuan, C., Ling, H., Hu, W.: Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225, 139–147 (2017)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)
Wang, L., Qiao, Y., Tang, X.: Latent hierarchical model of temporal structure for complex activity classification. T-IP 23(2), 810–22 (2014)
Wang, L., Qiao, Y., Tang, X.: Mining motion atoms and phrases for complex action recognition. In: ICCV, pp. 2680–2687 (2013)
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR, pp. 489–496 (2011)
Xu, Z., Qing, L., Miao, J.: Activity auto-completion: predicting human activities from partial videos. In: ICCV, pp. 3191–3199 (2015)
Acknowledgments
This work was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61602320 and No. 61170185, Liaoning Doctoral Startup Project under Grant No. 201601172 and No. 201601180, Foundation of Liaoning Education al Committee under Grant No. L201607 and No. L2015403, and the Young Scholars Research Fund of SAU under Grants No. 15YB37.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Liu, C., Lu, Y., Shi, X., Li, Z., Zhao, L. (2017). Action Prediction Using Unsupervised Semantic Reasoning. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-70090-8_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)