Abstract
A graph theoretic approach is proposed to recognize interactions (e.g., handshaking, punching, etc.) between two human performers in a video. Pose descriptors corresponding to each performer in the video are generated and clustered to form initial codebooks of human poses. Compact codebooks of dominating poses for each of the two performers are created by ranking the poses of the initial codebooks using two different methods. First, an average centrality measure of graph connectivity is introduced where poses are nodes in the graph. The dominating poses are graph nodes sharing a close semantic relationship with all other pose nodes and hence are expected to be at the central part of the graph. Second, a novel similarity measure is introduced for ranking dominating poses. The ‘pose doublets’, all possible combinations of dominating poses of the two performers, are ranked using an improved centrality measure of a bipartite graph. The set of ‘dominating pose doublets’ that best represents the corresponding interaction are selected using the perceptual analysis technique. The recognition results on standard interaction datasets show the efficacy of the proposed approach compared to the state-of-the-art.








Similar content being viewed by others
References
Lan, T., Sigal, L.: Mori. G.Social roles in hierarchical models for human activity recognition. In: IEEE-CVPR, pp. 1355–1362 (2012)
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing human action at a distance in video by key poses. IEEE T-CSVT 21(9), 1228–1241 (2011)
Narayan, B.L., Murthy, C.A., Pal, S.K.: Maxdiff kd-trees for data condensation. Pattern Recognit. Lett. 27(3), 187–200 (2006)
Yao, B., Fei-Fei, L.: Action recognition with exemplar based 2.5D graph matching. In: ECCV, LNCS 7575, pp. 173–186. Springer, Berlin (2012)
Desolneux, A., Moisan, L., Morel, J.-M.: From Gestalt Theory to Image Analysis: A Probabilistic Approach. Springer, Berlin (2008)
Liu, J, Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR. IEEE Computer Society (2008)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Liu, C., Yuen, P.C.: Human action recognition using boosted eigenactions. Image Vis. Comput. 28(5), 825–835 (2010)
Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR, vol. 2, pp. 326–333. IEEE Computer Society (2004)
Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: ECCV, LNCS 2352, , vol. 3. pp. 666–680. Springer (2002)
Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: ICCV, pp. 1933–1940. IEEE Computer Society (2009)
Wang, Y., Mori, G.: Hidden part models for human action recognition. Probabilistic vs. max-margin. IEEE T-PAMI 33(7), 1310–1323 (2011)
Poppe, R.: Machine recognition of human activities: a survey. Image Vis. Comput. 28(6), 976–990 (2010)
Ryoo, M.S., Aggarwal, J.K., et al.: Human activity analysis: a review. ACM Comput. Sur. 43(3), 16:1–16:43 (2011)
Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE T-PAMI 31(10), 1762–1774 (2009)
Ikizler, N., Duygulu, P.: Histogram of oriented rectangles: a new pose descriptor for human action recognition. Image Vis. Comput. 27(10), 1515–1526 (2009)
Fengjun, L., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path seraching. In: CVPR. IEEE Computer Society (2007)
Du, Y., Chen, F., Xu, W.: Human interaction representation and recognition through motion decomposition. IEEE Signal Process. Lett. 14(12), 952–955 (2007)
Ryoo, M.S. , Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE ICCV (2009)
Ni, B., Pei, Y., Liang, Z., Lin, L., Moulin, P.: Integrating Multi-stage depth-induced contextual information for human action recognition and localization. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–8 (2013)
Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: British Machine Vision Conference (2010)
Meng, L., Qing, L., Yang, P., Miao, J., Chen, X., Metaxas, D.N.: Activity recognition based on semantic spatial relation. In: International Conference on Pattern Recognition (IEEE-ICPR), pp. 609–612 (2012)
Tang, K., Fei-Fei, Koller, D.: Learning latent temporal structure for complex event detection. In: IEEE-CVPR, pp. 1–8 (2012)
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interaction between human performers using ‘key pose doublet’. In: ACM Multimedia Conference, Arizona, pp. 1329–1332. ACM (2011)
Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html. Accessed as on 2012
Wolf, C. ,Mille, J., Lombardi, L.E, Celiktutan, O., Jiu, M., Baccouche, M., Dellandrea, E., Bichot, C., Garcia, C.-E., Sankur, B.: The LIRIS Human Activities Dataset and the ICPR 2012 Human Activities Recognition and Localization Competition. Technical Report RR-LIRIS-2012-004, LIRIS Laboratory (2012)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893. IEEE Computer Society (2005)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2003)
Navigli, R., Lapata, M.: An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE T-PAMI 32(4), 678–692 (2010)
Hoeffding, W.: Probability inequalities for sum of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Varadhan, S.R.S.: Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19(3), 261–286 (1966)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
Acknowledgments
Certain commercial equipment, instruments, software, or materials are identified in this article to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Deriving equation (10) [30]
Appendix A: Deriving equation (10) [30]
In our problem of detecting meaningful ‘pose doublets’ for an interaction \(A_n, \varDelta \) is the number of ‘pose doublets’ in \(A_n\). Then, we can formulate the problem by a sequence of i.i.d. random variables \(\{X_q\}_{q=1,2,3,\ldots ,\varDelta }\), such that \(0\le X_q\le 1\). Let us define \(X_q\) as,
for a given \(\eta \), where \(I_q\) is the qth ‘pose doublet’ in \(A_n\). We set \(S_{\varDelta }=\sum _{q=1}^{\varDelta }X_q\) (\(S_{\varDelta }\) is the number of ‘pose doublets’ in \(A_n\) having \(E(u,v)\) less than \(\eta \)) and \(\nu \varDelta =E\left[ S_{\varDelta }\right] \). Then for \(\nu \varDelta < t < \varDelta \) (since \(\nu \) is a probability value \(<\)1), putting \(\sigma =\frac{t}{\varDelta }\) as in [5], according to Hoeffding’s inequality [30],
In addition, the right hand term of this inequality satisfies
where,
This is Hoeffding’s inequality [30]. Hoeffding’s inequality is a popular theorem in Statistics. For proof of the theorem, the reader may refer to [30].
We then apply (18), (19) and (20) for finding the sufficient condition of \(\epsilon \)-meaningfulness. If we take \(t\) satisfying both of the inequalities \(t\ge \nu \varDelta +\sqrt{\frac{\log {\tau } - \log {\epsilon }}{H(\nu )}}\sqrt{\varDelta }\) and \(t<\varDelta \), then using(18) and (19) and putting \(\sigma =\frac{t}{\varDelta }\) we get
Then using (18) and (21), we get
This means by definition of meaningfulness, the cut-off \(\eta \) is \(\epsilon \)-meaningful (according to Eq. (8)).
Since for \(\nu \) in \((0,1), H(\nu )\ge 2\) (according to Eq. (20)) so from (21) we get the sufficient condition of meaningfulness as (10).
Rights and permissions
About this article
Cite this article
Mukherjee, S., Biswas, S.K. & Mukherjee, D.P. Recognizing interactions between human performers by ‘Dominating Pose Doublet’. Machine Vision and Applications 25, 1033–1052 (2014). https://doi.org/10.1007/s00138-013-0589-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-013-0589-7