Skip to main content
Log in

Recognizing human actions by two-level Beta process hidden Markov model

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

We propose a method for human action recognition using latent topic models. There are two main differences between our method and previous latent topic models used for recognition problems. First, our model is trained in a supervised way, and we propose a two-level Beta process hidden Markov model which automatically identifies latent topics of action in video sequences. Second, we use the human skeleton to refine the spatial–temporal interest points that are extracted from video sequences. Because latent topics are derived from these interest points, the refined interest points can improve the precision of action recognition. Experimental results using the publicly available “Weizmann”, “KTH”, “UCF sport action”, “Hollywood2”, and “HMDB51” datasets demonstrate that our method outperforms other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of the IEEE International Conference on Computer Vision (2008)

  2. Dollar, P., Rabaud, V., Cottrell G., Belongie, S.: Behavior Recognition via Sparse Spatio-Temporal Feature. In: Proceedings of VS-PETS (2005)

  3. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2010)

  4. Yao, B., Jiang, X., Khosla, A., Lin, A.L.: Human action recognition by learning bases of action attributes and parts. In: Proceedings of the IEEE International Conference on Computer Vision (2011)

  5. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2012)

  6. Sharma, G., Jurie, F., Schmid, C.: Expanded Parts Model for Human Attribute and Action Recognition in Still Images In: Proceedings of the IEEE International Conference on Computer Vision (2013)

  7. Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE. Trans. Pattern. Anal. Mach. Intell. 31, 1762–1774 (2009)

    Article  Google Scholar 

  8. Fox, E.B., Jordan, M.I., Sudderth, E.B., Willsky, A.S.: Sharing Features among Dynamical Systems with Beta Processes. In: Proceeding of Neural Information Processing Systems, Vancouver, Canada, December (2009)

  9. Li, H., Liu, J., Zhang, S.: Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1297–1300 (2011)

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101, 5228–5235 (2004)

    Article  Google Scholar 

  12. Wang, C., Blei, D.M., Li, F.-F.: Simultaneous image classification and annotation. In: CVPR (2009)

  13. Zhu, J., Ahmed, A., Xing, E.P.: MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. In: ICML, Montreal, Canada (2009)

  14. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a Supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009)

  15. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-part. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)

  16. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)

  17. Tsochantaridis T. Hofmann, T. Joachims, Altun, Y.: Support Vector Learning for Interdependent and Structured Output Spaces. In: Proceedings of ICML, pp. 104–110 (2004)

  18. Laptev, I.: On Space-time interest points”. Int. J. Comput. Vis. 64(2), 107–123 (2005)

    Article  MathSciNet  Google Scholar 

  19. Blank, M., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceeding of International Conference On Computer Vision (2005)

  20. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)

  21. Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion context: a new representation for human action recognition. In: Proceedings of European Conference on Computer Vision (2008)

  22. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection Using volumetric features. In: IEEE Computer Society (2005)

  23. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

    Article  Google Scholar 

  24. Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatiotemporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  25. Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)

  26. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre T.: HMDB: a large video database for human motion recognition. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2556–2563 (2011)

  27. Bilinski, P., Bremond, F.: Contextual statistics of space-time ordered features for human action recognition. In: Proceedings of 9th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 228–233 (2012)

  28. Liu, J., Yang, Y., Saleemi, I., Shah, M.: Learning semantic features for action recognition via diffusion maps. Comput. Vis. Image Underst. 116(3), 361–377 (2012)

    Article  Google Scholar 

  29. Zhao, D., Shao, L., Zhen, X., Liu, Y.: Combining appearance and structural features for human action recognition. Neurocomputing 113, 88–96 (2013)

    Article  Google Scholar 

  30. Yuan, C., Li, X., Hu, W., Ling, H., Maybank, S.: 3D transform on spatio-temporal interest points for action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 724–730 (2013)

  31. Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 915–922 (2013)

  32. Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 489–496 (2011)

  33. Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  34. Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)

    Article  Google Scholar 

  35. Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3361–3368 (2011)

  36. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

  37. Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory based modeling of human actions with motion reference points.In: Proceedings of European Conference on Computer Vision, pp. 425–438 (2012)

  38. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2555– 2562 (2013)

  39. Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: CVPR’13 (2013)

  40. Liu, A., Gao, Z., Hao, T., Su, Y., Yang, Z.: Partwise bag of words-based multi-task learning for human action recognition. Electron. Lett. 49(13), 803–805 (2013)

    Article  Google Scholar 

  41. Poppe, R.: A survey on vision-based human action recognition [J]. Image Vis. Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  42. Wang, H., Klaser, A., Schmid, C., et al.: Action recognition by dense trajectories[C]. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, pp. 3169–3176 (2011)

  43. Song, Y., Zheng, Y.T., Tang, S., et al.: Localized multiple kernel learning for realistic human action recognition in videos[J]. IEEE. Trans. Circuits. Syst. Video. Technol. 21(9), 1193–1202 (2011)

    Article  Google Scholar 

  44. Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T.: Hidden conditional random fields. IEEE. Trans. Pattern. Anal. Mach. Intell. 29(10), 1848–1852 (2007)

    Article  Google Scholar 

  45. Morency, L., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: CVPR’07. IEEE, pp. 1–8 (2007)

  46. Liu, W., Zhang, Y., Tang, S., Tang, J., Hong, R., Li, J.: Accurate estimation of human body orientation from RGB-D sensors. IEEE. Trans. Cybern. 43(5), 1442–1452 (2013)

    Article  Google Scholar 

Download references

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhan Yi-Ju.

Additional information

Communicated by Y. Zhang.

Appendix

Appendix

In this appendix, we provide detailed derivations of Eqs. (18) and (22).

 Equation (18):

$$\begin{aligned} r(c_{ + ik}^{\prime} ,\mu_{ + }^{\prime} |c_{ + ik} ,\mu_{ + } ) =& \frac{{P(c_{ + ik}^{\prime} ,\mu_{ + }^{\prime} |c_{ + ik} ,\pi^{(i)} ,\mu^{(i)} )}}{{P(c_{ + ik} ,\mu_{ + } |c_{ + ik} ,\pi^{(i)} ,\mu^{(i)} )}}.\frac{{q(c_{ + ik} ,\mu_{ + } |c_{ + ik}^{\prime} ,\mu_{ + }^{\prime} )}}{{q(c_{ + ik}^{\prime} ,\mu_{ + }^{\prime} |c_{ + ik} ,\mu_{ + } )}} \hfill \\ =& \frac{{P(\pi^{(i)} |[c_{ + ik} ,c_{ + ik}^{\prime} ],\mu^{(i)} ,\mu_{ + }^{\prime} )}}{{P(\pi^{(i)} |[c_{ + ik} ,c_{ + ik} ],\mu^{(i)} ,\mu_{ + } )}}\frac{{p(c_{ + ik}^{\prime} )}}{{p(c_{ + ik} )}}\frac{{p(\mu_{ + }^{\prime} )}}{{p(\mu_{ + } )}}\frac{{q_{c} (c_{ + ik} |c_{ + ik}^{\prime} )}}{{q_{c} (c_{ + ik}^{\prime} |c_{ + ik} )}}\frac{{q_{\mu } (\mu_{ + } |c_{ + ik}^{\prime} ,c_{ + ik} ,\mu_{ + }^{\prime} )}}{{q_{\mu } (\mu_{ + }^{\prime} |c_{ + ik}^{\prime} ,c_{ + ik} ,\mu_{ + } )}} \hfill \\ =& \frac{{P(\pi^{(i)} |[c_{ + ik} ,c_{ + ik}^{\prime} ],\mu^{(i)} ,\mu_{ + }^{\prime} )}}{{P(\pi^{(i)} |[c_{ + ik} ,c_{ + ik} ],\mu^{(i)} ,\mu_{ + } )}}\frac{{{\text{Poisson}}(w_{ik} + 1;{\raise0.7ex\hbox{${\gamma_{kj} }$} \!\mathord{\left/ {\vphantom {{\gamma_{kj} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}})\prod\nolimits_{l = 1}^{{w_{ik} + 1}} {P(\mu_{ + }^{\prime} ,l)} }}{{{\text{Poisson}}(w_{ik} ;{\raise0.7ex\hbox{${\gamma_{kj} }$} \!\mathord{\left/ {\vphantom {{\gamma_{kj} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}})\prod\nolimits_{l = 1}^{{w_{ik}^{{}} }} {P(\mu_{ + } ,l)} }} \hfill \\ &\cdot \frac{{q_{c} (w_{ik} \leftarrow w_{ik} + 1)}}{{q_{c} (w_{ik} + 1 \leftarrow w_{ik} )p(\mu_{{ + ,w_{ik} + 1}}^{\prime} )}}\frac{{\prod\nolimits_{l = 1}^{{w_{ik} + 1}} {\delta_{{\mu_{ + }^{\prime} ,l}} (\mu_{ + } ,l)} }}{{\prod\nolimits_{l = 1}^{{w_{ik} }} {\delta_{{_{{\mu_{ + } ,l}} }} (\mu_{ + }^{\prime} ,l)} }} \hfill \\ =& \frac{{P(\pi^{(i)} |[c_{ - ik} ,c_{ + ik}^{\prime} ],\mu^{(i)} ,\mu_{ + }^{\prime} ){\text{Poisson}}(w_{ik}^{\prime} |{\raise0.7ex\hbox{${\gamma_{kj} }$} \!\mathord{\left/ {\vphantom {{\gamma_{kj} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}})q_{c} (w_{ + ik} |w_{ + ik}^{\prime} )}}{{P(\pi^{(i)} |[c_{ - ik} ,c_{ + ik}^{\prime} ],\mu^{(i)} ,\mu_{ + } ){\text{Poisson}}(w_{ik} |{\raise0.7ex\hbox{${\gamma_{kj} }$} \!\mathord{\left/ {\vphantom {{\gamma_{kj} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}})q_{c} (w_{ + ik}^{\prime} |w_{ + ik} )}} \hfill \\ \end{aligned}$$

Equation (22):

$$\begin{aligned} r(f_{ + i}^{\prime} ,\theta_{ + }^{\prime} ,\eta_{ + }^{\prime} |f_{ + i} ,\theta_{ + } ,\eta_{ + } ) =& \frac{{P(f_{ + i}^{\prime} ,\theta_{ + }^{\prime} ,\eta_{ + }^{\prime} |f_{ - i} ,x_{{1:T_{i} }}^{(i)} ,\theta_{{1:K_{ + }^{ - i} }} ,\eta^{(i)} )}}{{P(f_{ + i} ,\theta_{ + } ,\eta_{ + } |f_{ - i} ,x_{{1:T_{i} }}^{(i)} ,\theta_{{1:K_{ + }^{ - i} }} ,\eta^{(i)} )}}.\frac{{q(f_{ + i} ,\theta_{ + } ,\eta_{ + } |f_{ + i}^{\prime} ,\theta_{ + }^{\prime} ,\eta_{ + }^{\prime} )}}{{q(f_{ + i}^{\prime} ,\theta_{ + }^{\prime} ,\eta_{ + }^{\prime} |f_{ + i} ,\theta_{ + } ,\eta_{ + } )}} \hfill \\ =& \frac{{P(x_{{1:T_{i} }}^{(i)} |[f_{ - i} ,f_{ + i}^{\prime} ],\theta_{{1:K_{ + }^{ - i} }} ,\eta^{(i)} ,\theta_{ + }^{\prime} ,\eta_{ + }^{\prime} )}}{{P(x_{{1:T_{i} }}^{(i)} |[f_{ - i} ,f_{ + i} ],\theta_{{1:K_{ + }^{ - i} }} ,\eta^{(i)} ,\theta_{ + } ,\eta_{ + } )}}\frac{{p(f_{ + i}^{\prime} )}}{{p(f_{ + i} )}}\frac{{p(\theta_{ + }^{\prime} )}}{{p(\theta_{ + } )}}\frac{{p(\eta_{ + }^{\prime} )}}{{p(\eta_{ + } )}} \hfill \\ &\cdot \frac{{q_{f} (f_{ + i} |f_{ + i}^{\prime} )}}{{q_{f} (f_{ + i}^{\prime} |f_{ + i} )}}\frac{{q_{\theta } (\theta_{ + } |f_{ + i}^{\prime} ,f_{ + i} ,\theta_{ + }^{\prime} )}}{{q_{\theta } (\theta_{ + }^{\prime} |f_{ + i}^{\prime} ,f_{ + i} ,\theta_{ + } )}}\frac{{q_{\eta } (\eta_{ + } |f_{ + i}^{\prime} ,f_{ + i} ,\eta_{ + }^{\prime} )}}{{q_{\eta } (\eta_{ + }^{\prime} |f_{ + i}^{\prime} ,f_{ + i} ,\eta_{ + } )}} \hfill \\ =& \frac{{P(x_{{1:T_{i} }}^{(i)} |[f_{ - i} ,f_{ + i}^{\prime} ],\theta_{{1:K_{ + }^{ - i} }} ,\eta^{(i)} ,\theta_{ + }^{\prime} ,\eta_{ + }^{\prime} )}}{{P(x_{{1:T_{i} }}^{(i)} |[f_{ - i} ,f_{ + i} ],\theta_{{1:K_{ + }^{ - i} }} ,\eta^{(i)} ,\theta_{ + } ,\eta_{ + } )}}\frac{{{\text{Poisson}}\left( {\eta_{ + } + 1;{\raise0.7ex\hbox{${b_{k} }$} \!\mathord{\left/ {\vphantom {{b_{k} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}}} \right)\prod\nolimits_{k = 1}^{{n_{i} + 1}} {P(\theta_{ + }^{\prime} ,k)P(\eta_{ + }^{\prime} ,k)} }}{{{\text{Poisson}}(n_{i} ;{\raise0.7ex\hbox{${b_{k} }$} \!\mathord{\left/ {\vphantom {{b_{k} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}})\prod\nolimits_{k = 1}^{{n_{i} }} {P(\theta_{ + }^{\prime} ,k)P(\eta_{ + }^{\prime} ,k)} }} \hfill \\ &\cdot \frac{{q_{f} (n_{i} \leftarrow n_{i} + 1)}}{{q_{f} (n_{i} + 1 \leftarrow n_{i} )p(\theta_{{ + ,n_{i} + 1}}^{\prime} )p(\eta_{{ + ,n_{i} + 1}}^{\prime} )}}\frac{{\prod\nolimits_{k = 1}^{{n_{i} }} {\delta_{{\theta_{ + }^{\prime} ,k}} (\theta_{ + } ,k)\delta_{{\eta_{ + }^{\prime} ,k}} (\eta_{ + } ,k)} }}{{\prod\nolimits_{k = 1}^{{n_{i} }} {\delta_{{\theta_{ + } ,k}} (\theta_{ + }^{\prime} ,k)\delta_{{n_{ + } ,k}} (\eta_{ + }^{\prime} ,k)} }} \hfill \\ =& \frac{{P(x_{{1:T_{i} }}^{(i)} |[f_{ - i} ,f_{ + i}^{\prime} ],\eta^{(i)} ,\eta_{ + }^{\prime} ,\theta_{{1:K_{ + }^{ - i} }} ,\theta_{ + }^{\prime} ){\text{Poisson}}\left( {n_{i}^{\prime} |{\raise0.7ex\hbox{${b_{k} }$} \!\mathord{\left/ {\vphantom {{b_{k} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}}} \right)q_{f} (f_{ + i} |f_{ + i}^{\prime} )}}{{P(x_{{1:T_{i} }}^{(i)} |[f_{ - i} ,f_{ + i} ],\eta^{(i)} ,\eta_{ + } ,\theta_{{1:K_{ + }^{ - i} }} ,\theta_{ + } ){\text{Poisson}}\left( {n_{i} |{\raise0.7ex\hbox{${b_{k} }$} \!\mathord{\left/ {\vphantom {{b_{k} } N}}\right.\kern-0pt} \!\lower0.7ex\hbox{$N$}}} \right)q_{f} (f_{ + i}^{\prime} |f_{ + i} )}} \hfill \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, L., Yi-Ju, Z., Qing, J. et al. Recognizing human actions by two-level Beta process hidden Markov model. Multimedia Systems 23, 183–194 (2017). https://doi.org/10.1007/s00530-015-0474-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-015-0474-5

Keywords

Navigation