Recognizing interactions between human performers by ‘Dominating Pose Doublet’

Mukherjee, Snehasis; Biswas, Sujoy Kumar; Mukherjee, Dipti Prasad

doi:10.1007/s00138-013-0589-7

Recognizing interactions between human performers by ‘Dominating Pose Doublet’

Original Paper
Published: 03 January 2014

Volume 25, pages 1033–1052, (2014)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Snehasis Mukherjee¹,
Sujoy Kumar Biswas² &
Dipti Prasad Mukherjee³

396 Accesses
17 Citations
Explore all metrics

Abstract

A graph theoretic approach is proposed to recognize interactions (e.g., handshaking, punching, etc.) between two human performers in a video. Pose descriptors corresponding to each performer in the video are generated and clustered to form initial codebooks of human poses. Compact codebooks of dominating poses for each of the two performers are created by ranking the poses of the initial codebooks using two different methods. First, an average centrality measure of graph connectivity is introduced where poses are nodes in the graph. The dominating poses are graph nodes sharing a close semantic relationship with all other pose nodes and hence are expected to be at the central part of the graph. Second, a novel similarity measure is introduced for ranking dominating poses. The ‘pose doublets’, all possible combinations of dominating poses of the two performers, are ranked using an improved centrality measure of a bipartite graph. The set of ‘dominating pose doublets’ that best represents the corresponding interaction are selected using the perceptual analysis technique. The recognition results on standard interaction datasets show the efficacy of the proposed approach compared to the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Action Recognition Using Dominant Pose Duplet

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Article Open access 01 March 2018

References

Lan, T., Sigal, L.: Mori. G.Social roles in hierarchical models for human activity recognition. In: IEEE-CVPR, pp. 1355–1362 (2012)
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing human action at a distance in video by key poses. IEEE T-CSVT 21(9), 1228–1241 (2011)
Google Scholar
Narayan, B.L., Murthy, C.A., Pal, S.K.: Maxdiff kd-trees for data condensation. Pattern Recognit. Lett. 27(3), 187–200 (2006)
Article Google Scholar
Yao, B., Fei-Fei, L.: Action recognition with exemplar based 2.5D graph matching. In: ECCV, LNCS 7575, pp. 173–186. Springer, Berlin (2012)
Desolneux, A., Moisan, L., Morel, J.-M.: From Gestalt Theory to Image Analysis: A Probabilistic Approach. Springer, Berlin (2008)
Book Google Scholar
Liu, J, Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR. IEEE Computer Society (2008)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Article Google Scholar
Liu, C., Yuen, P.C.: Human action recognition using boosted eigenactions. Image Vis. Comput. 28(5), 825–835 (2010)
Article Google Scholar
Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR, vol. 2, pp. 326–333. IEEE Computer Society (2004)
Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: ECCV, LNCS 2352, , vol. 3. pp. 666–680. Springer (2002)
Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: ICCV, pp. 1933–1940. IEEE Computer Society (2009)
Wang, Y., Mori, G.: Hidden part models for human action recognition. Probabilistic vs. max-margin. IEEE T-PAMI 33(7), 1310–1323 (2011)
Article Google Scholar
Poppe, R.: Machine recognition of human activities: a survey. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Ryoo, M.S., Aggarwal, J.K., et al.: Human activity analysis: a review. ACM Comput. Sur. 43(3), 16:1–16:43 (2011)
Google Scholar
Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE T-PAMI 31(10), 1762–1774 (2009)
Article Google Scholar
Ikizler, N., Duygulu, P.: Histogram of oriented rectangles: a new pose descriptor for human action recognition. Image Vis. Comput. 27(10), 1515–1526 (2009)
Article Google Scholar
Fengjun, L., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path seraching. In: CVPR. IEEE Computer Society (2007)
Du, Y., Chen, F., Xu, W.: Human interaction representation and recognition through motion decomposition. IEEE Signal Process. Lett. 14(12), 952–955 (2007)
Article Google Scholar
Ryoo, M.S. , Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE ICCV (2009)
Ni, B., Pei, Y., Liang, Z., Lin, L., Moulin, P.: Integrating Multi-stage depth-induced contextual information for human action recognition and localization. In: IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 1–8 (2013)
Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: British Machine Vision Conference (2010)
Meng, L., Qing, L., Yang, P., Miao, J., Chen, X., Metaxas, D.N.: Activity recognition based on semantic spatial relation. In: International Conference on Pattern Recognition (IEEE-ICPR), pp. 609–612 (2012)
Tang, K., Fei-Fei, Koller, D.: Learning latent temporal structure for complex event detection. In: IEEE-CVPR, pp. 1–8 (2012)
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interaction between human performers using ‘key pose doublet’. In: ACM Multimedia Conference, Arizona, pp. 1329–1332. ACM (2011)
Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html. Accessed as on 2012
Wolf, C. ,Mille, J., Lombardi, L.E, Celiktutan, O., Jiu, M., Baccouche, M., Dellandrea, E., Bichot, C., Garcia, C.-E., Sankur, B.: The LIRIS Human Activities Dataset and the ICPR 2012 Human Activities Recognition and Localization Competition. Technical Report RR-LIRIS-2012-004, LIRIS Laboratory (2012)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893. IEEE Computer Society (2005)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2003)
Google Scholar
Navigli, R., Lapata, M.: An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE T-PAMI 32(4), 678–692 (2010)
Article Google Scholar
Hoeffding, W.: Probability inequalities for sum of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MATH MathSciNet Google Scholar
Varadhan, S.R.S.: Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19(3), 261–286 (1966)
Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)

Download references

Acknowledgments

Certain commercial equipment, instruments, software, or materials are identified in this article to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the equipment, instruments, software or materials are necessarily the best available for the purpose.

Author information

Authors and Affiliations

Information Access Division, National Institute of Standards and Technology (NIST), 100 Bureau Drive, Gaithersburg, MD , 20899-8940, USA
Snehasis Mukherjee
Electrical Engineering Department, University of California, Santa Cruz, USA
Sujoy Kumar Biswas
Electronics and Communication Sciences Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India
Dipti Prasad Mukherjee

Authors

Snehasis Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Sujoy Kumar Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Dipti Prasad Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Snehasis Mukherjee.

Appendix A: Deriving equation (10) [30]

In our problem of detecting meaningful ‘pose doublets’ for an interaction $A_n, \varDelta $ is the number of ‘pose doublets’ in $A_n$. Then, we can formulate the problem by a sequence of i.i.d. random variables $\{X_q\}_{q=1,2,3,\ldots ,\varDelta }$, such that $0\le X_q\le 1$. Let us define $X_q$ as,

$$\begin{aligned} X_q ={\left\{ \begin{array}{ll} 1 &{} \hbox {when} \; E(u,v)<\eta \; \hbox {for the `pose doublet'}\; I_q \\ 0 &{} \hbox {otherwise}, \end{array}\right. } \end{aligned}$$

(17)

for a given $\eta $, where $I_q$ is the qth ‘pose doublet’ in $A_n$. We set $S_{\varDelta }=\sum _{q=1}^{\varDelta }X_q$ ($S_{\varDelta }$ is the number of ‘pose doublets’ in $A_n$ having $E(u,v)$ less than $\eta $) and $\nu \varDelta =E\left[ S_{\varDelta }\right] $. Then for $\nu \varDelta < t < \varDelta $ (since $\nu $ is a probability value $<$1), putting $\sigma =\frac{t}{\varDelta }$ as in [5], according to Hoeffding’s inequality [30],

$$\begin{aligned} P_n^{\eta }=P(S_{\varDelta }\ge t)\le e^{-\varDelta \left( \sigma \log {\frac{\sigma }{\nu }}+(1-\sigma ) \log {\frac{1-\sigma }{1-\nu }}\right) }. \end{aligned}$$

(18)

In addition, the right hand term of this inequality satisfies

$$\begin{aligned} e^{-\varDelta \left( \sigma \log {\frac{\sigma }{\nu }}+(1-\sigma ) \log {\frac{1-\sigma }{1-\nu }}\right) }\le e^{-\varDelta (\sigma -\nu )^2H(\nu )}, \end{aligned}$$

(19)

where,

$$\begin{aligned} H(\nu ) ={\left\{ \begin{array}{ll}\frac{1}{1-2\nu }\log {\frac{1-\nu }{\nu }} &{} \hbox {when}\; 0<\nu <\frac{1}{2} \\ \frac{1}{2\nu (1-\nu )} &{} \hbox {when}\; \frac{1}{2}\le \nu <1 \end{array}\right. } \end{aligned}$$

(20)

This is Hoeffding’s inequality [30]. Hoeffding’s inequality is a popular theorem in Statistics. For proof of the theorem, the reader may refer to [30].

We then apply (18), (19) and (20) for finding the sufficient condition of $\epsilon $-meaningfulness. If we take $t$ satisfying both of the inequalities $t\ge \nu \varDelta +\sqrt{\frac{\log {\tau } - \log {\epsilon }}{H(\nu )}}\sqrt{\varDelta }$ and $t<\varDelta $, then using(18) and (19) and putting $\sigma =\frac{t}{\varDelta }$ we get

$$\begin{aligned} \varDelta (\sigma -\nu )^2 \ge \frac{\log {\tau }-\log {\epsilon }}{H(\nu )}. \end{aligned}$$

(21)

Then using (18) and (21), we get

$$\begin{aligned} P_n^{\eta }\le e^{-\varDelta (\sigma -\nu )^2H(\nu )}\le e^{-\log {\tau }+\log {\epsilon }}=\frac{\epsilon }{\tau }. \end{aligned}$$

(22)

This means by definition of meaningfulness, the cut-off $\eta $ is $\epsilon $-meaningful (according to Eq. (8)).

Since for $\nu $ in $(0,1), H(\nu )\ge 2$ (according to Eq. (20)) so from (21) we get the sufficient condition of meaningfulness as (10).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, S., Biswas, S.K. & Mukherjee, D.P. Recognizing interactions between human performers by ‘Dominating Pose Doublet’. Machine Vision and Applications 25, 1033–1052 (2014). https://doi.org/10.1007/s00138-013-0589-7

Download citation

Received: 18 May 2013
Revised: 22 October 2013
Accepted: 02 December 2013
Published: 03 January 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s00138-013-0589-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognizing interactions between human performers by ‘Dominating Pose Doublet’

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition Using Dominant Pose Duplet

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Deriving equation (10) [30]

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognizing interactions between human performers by ‘Dominating Pose Doublet’

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition Using Dominant Pose Duplet

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Deriving equation (10) [30]

Appendix A: Deriving equation (10) [30]

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation