Skip to main content
Log in

Recognizing interactions between human performers by ‘Dominating Pose Doublet’

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

A graph theoretic approach is proposed to recognize interactions (e.g., handshaking, punching, etc.) between two human performers in a video. Pose descriptors corresponding to each performer in the video are generated and clustered to form initial codebooks of human poses. Compact codebooks of dominating poses for each of the two performers are created by ranking the poses of the initial codebooks using two different methods. First, an average centrality measure of graph connectivity is introduced where poses are nodes in the graph. The dominating poses are graph nodes sharing a close semantic relationship with all other pose nodes and hence are expected to be at the central part of the graph. Second, a novel similarity measure is introduced for ranking dominating poses. The ‘pose doublets’, all possible combinations of dominating poses of the two performers, are ranked using an improved centrality measure of a bipartite graph. The set of ‘dominating pose doublets’ that best represents the corresponding interaction are selected using the perceptual analysis technique. The recognition results on standard interaction datasets show the efficacy of the proposed approach compared to the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Lan, T., Sigal, L.: Mori. G.Social roles in hierarchical models for human activity recognition. In: IEEE-CVPR, pp. 1355–1362 (2012)

  2. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing human action at a distance in video by key poses. IEEE T-CSVT 21(9), 1228–1241 (2011)

    Google Scholar 

  3. Narayan, B.L., Murthy, C.A., Pal, S.K.: Maxdiff kd-trees for data condensation. Pattern Recognit. Lett. 27(3), 187–200 (2006)

    Article  Google Scholar 

  4. Yao, B., Fei-Fei, L.: Action recognition with exemplar based 2.5D graph matching. In: ECCV, LNCS 7575, pp. 173–186. Springer, Berlin (2012)

  5. Desolneux, A., Moisan, L., Morel, J.-M.: From Gestalt Theory to Image Analysis: A Probabilistic Approach. Springer, Berlin (2008)

    Book  Google Scholar 

  6. Liu, J, Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR. IEEE Computer Society (2008)

  7. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

    Article  Google Scholar 

  8. Liu, C., Yuen, P.C.: Human action recognition using boosted eigenactions. Image Vis. Comput. 28(5), 825–835 (2010)

    Article  Google Scholar 

  9. Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR, vol. 2, pp. 326–333. IEEE Computer Society (2004)

  10. Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: ECCV, LNCS 2352, , vol. 3. pp. 666–680. Springer (2002)

  11. Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: ICCV, pp. 1933–1940. IEEE Computer Society (2009)

  12. Wang, Y., Mori, G.: Hidden part models for human action recognition. Probabilistic vs. max-margin. IEEE T-PAMI 33(7), 1310–1323 (2011)

    Article  Google Scholar 

  13. Poppe, R.: Machine recognition of human activities: a survey. Image Vis. Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  14. Ryoo, M.S., Aggarwal, J.K., et al.: Human activity analysis: a review. ACM Comput. Sur. 43(3), 16:1–16:43 (2011)

    Google Scholar 

  15. Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE T-PAMI 31(10), 1762–1774 (2009)

    Article  Google Scholar 

  16. Ikizler, N., Duygulu, P.: Histogram of oriented rectangles: a new pose descriptor for human action recognition. Image Vis. Comput. 27(10), 1515–1526 (2009)

    Article  Google Scholar 

  17. Fengjun, L., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path seraching. In: CVPR. IEEE Computer Society (2007)

  18. Du, Y., Chen, F., Xu, W.: Human interaction representation and recognition through motion decomposition. IEEE Signal Process. Lett. 14(12), 952–955 (2007)

    Article  Google Scholar 

  19. Ryoo, M.S. , Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE ICCV (2009)

  20. Ni, B., Pei, Y., Liang, Z., Lin, L., Moulin, P.: Integrating Multi-stage depth-induced contextual information for human action recognition and localization. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–8 (2013)

  21. Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: British Machine Vision Conference (2010)

  22. Meng, L., Qing, L., Yang, P., Miao, J., Chen, X., Metaxas, D.N.: Activity recognition based on semantic spatial relation. In: International Conference on Pattern Recognition (IEEE-ICPR), pp. 609–612 (2012)

  23. Tang, K., Fei-Fei, Koller, D.: Learning latent temporal structure for complex event detection. In: IEEE-CVPR, pp. 1–8 (2012)

  24. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interaction between human performers using ‘key pose doublet’. In: ACM Multimedia Conference, Arizona, pp. 1329–1332. ACM (2011)

  25. Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html. Accessed as on 2012

  26. Wolf, C. ,Mille, J., Lombardi, L.E, Celiktutan, O., Jiu, M., Baccouche, M., Dellandrea, E., Bichot, C., Garcia, C.-E., Sankur, B.: The LIRIS Human Activities Dataset and the ICPR 2012 Human Activities Recognition and Localization Competition. Technical Report RR-LIRIS-2012-004, LIRIS Laboratory (2012)

  27. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893. IEEE Computer Society (2005)

  28. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2003)

    Google Scholar 

  29. Navigli, R., Lapata, M.: An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE T-PAMI 32(4), 678–692 (2010)

    Article  Google Scholar 

  30. Hoeffding, W.: Probability inequalities for sum of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  31. Varadhan, S.R.S.: Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19(3), 261–286 (1966)

    Google Scholar 

  32. Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)

Download references

Acknowledgments

Certain commercial equipment, instruments, software, or materials are identified in this article to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snehasis Mukherjee.

Appendix A: Deriving equation (10) [30]

Appendix A: Deriving equation (10) [30]

In our problem of detecting meaningful ‘pose doublets’ for an interaction \(A_n, \varDelta \) is the number of ‘pose doublets’ in \(A_n\). Then, we can formulate the problem by a sequence of i.i.d. random variables \(\{X_q\}_{q=1,2,3,\ldots ,\varDelta }\), such that \(0\le X_q\le 1\). Let us define \(X_q\) as,

$$\begin{aligned} X_q ={\left\{ \begin{array}{ll} 1 &{} \hbox {when} \; E(u,v)<\eta \; \hbox {for the `pose doublet'}\; I_q \\ 0 &{} \hbox {otherwise}, \end{array}\right. } \end{aligned}$$
(17)

for a given \(\eta \), where \(I_q\) is the qth ‘pose doublet’ in \(A_n\). We set \(S_{\varDelta }=\sum _{q=1}^{\varDelta }X_q\) (\(S_{\varDelta }\) is the number of ‘pose doublets’ in \(A_n\) having \(E(u,v)\) less than \(\eta \)) and \(\nu \varDelta =E\left[ S_{\varDelta }\right] \). Then for \(\nu \varDelta < t < \varDelta \) (since \(\nu \) is a probability value \(<\)1), putting \(\sigma =\frac{t}{\varDelta }\) as in [5], according to Hoeffding’s inequality [30],

$$\begin{aligned} P_n^{\eta }=P(S_{\varDelta }\ge t)\le e^{-\varDelta \left( \sigma \log {\frac{\sigma }{\nu }}+(1-\sigma ) \log {\frac{1-\sigma }{1-\nu }}\right) }. \end{aligned}$$
(18)

In addition, the right hand term of this inequality satisfies

$$\begin{aligned} e^{-\varDelta \left( \sigma \log {\frac{\sigma }{\nu }}+(1-\sigma ) \log {\frac{1-\sigma }{1-\nu }}\right) }\le e^{-\varDelta (\sigma -\nu )^2H(\nu )}, \end{aligned}$$
(19)

where,

$$\begin{aligned} H(\nu ) ={\left\{ \begin{array}{ll}\frac{1}{1-2\nu }\log {\frac{1-\nu }{\nu }} &{} \hbox {when}\; 0<\nu <\frac{1}{2} \\ \frac{1}{2\nu (1-\nu )} &{} \hbox {when}\; \frac{1}{2}\le \nu <1 \end{array}\right. } \end{aligned}$$
(20)

This is Hoeffding’s inequality [30]. Hoeffding’s inequality is a popular theorem in Statistics. For proof of the theorem, the reader may refer to [30].

We then apply (18), (19) and (20) for finding the sufficient condition of \(\epsilon \)-meaningfulness. If we take \(t\) satisfying both of the inequalities \(t\ge \nu \varDelta +\sqrt{\frac{\log {\tau } - \log {\epsilon }}{H(\nu )}}\sqrt{\varDelta }\) and \(t<\varDelta \), then using(18) and (19) and putting \(\sigma =\frac{t}{\varDelta }\) we get

$$\begin{aligned} \varDelta (\sigma -\nu )^2 \ge \frac{\log {\tau }-\log {\epsilon }}{H(\nu )}. \end{aligned}$$
(21)

Then using (18) and (21), we get

$$\begin{aligned} P_n^{\eta }\le e^{-\varDelta (\sigma -\nu )^2H(\nu )}\le e^{-\log {\tau }+\log {\epsilon }}=\frac{\epsilon }{\tau }. \end{aligned}$$
(22)

This means by definition of meaningfulness, the cut-off \(\eta \) is \(\epsilon \)-meaningful (according to Eq. (8)).

Since for \(\nu \) in \((0,1), H(\nu )\ge 2\) (according to Eq. (20)) so from (21) we get the sufficient condition of meaningfulness as (10).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, S., Biswas, S.K. & Mukherjee, D.P. Recognizing interactions between human performers by ‘Dominating Pose Doublet’. Machine Vision and Applications 25, 1033–1052 (2014). https://doi.org/10.1007/s00138-013-0589-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-013-0589-7

Keywords

Navigation