Abstract
We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided with different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person-specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We use software from http://cmp.felk.cvut.cz/~fisarond/demo/
- 2.
References
Trecvid multimedia event detection track. http://www.nist.gov/itl/iad/mig/med11.cfm (2011)
Biddle, B.J.: Recent development in role theory. Ann. Rev. Sociol. 12, 67–92 (1986)
Burgos-Artizzu, X., Dollar, P., Lin, D., Anderson, D., Perona, P.: Social behavior recognition in continuous videos. In: CVPR (2012)
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: ECCV (2012)
Cristani, M., Paggetti, G., Fossati, A., Bazzani, L., Tosato, D., Bue, A.D., Menegaz, G., Murino, V.: Social interaction discovery by statistical analysis of f-formations. In: BMVC (2011)
Ding, L., Yilmaz, A.: Learning relations among movie characters: a social network perspective. In: ECCV (2010)
Ding, L., Yilmaz, A.: Inferring social relations from visual concepts. In: ICCV (2011)
Direkolu, C., OConnor, N.: Team activity recognition in sports. In: ECCV (2012)
Fathi, A., Hoggins, J.K., Rehg, J.M.: Social interactions: a first person perspective. In: CVPR (2012)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Fu, Y., Hospedales, T., Xiang, T., Gong, S.: Attribute learning for understanding unstructured social activity, In: ECCV (2012)
Gallagher, A.C., Chen, T.: Understanding images of groups of people. In: CVPR (2009)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)
Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: CVPR (2011)
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)
Lan, T., Wang, Y., Yang, W., Robinovitch, S., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2012)
Li, L.-J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPS (2010)
Li, R., Porfilio, P., Zickler, T.: Finding group interactions in social clutter. In: CVPR (2013)
Liu, D., Dong, C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Math. Program. 45, 503–528 (1989)
Marin-Jimenez, M., Zisserman, A., Ferrari. V.: Heres looking at you, kid-detecting people looking at each other in videos. In: BMVC (2011)
Perez, A.P., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in tv shows. In: BMVC (2010)
Qin, Z., Shelton, C.R.: Improving multi-target tracking via social grouping. In: CVPR (2012)
Ramanathan, V., Yao, B., Fei-Fei, L.: Social role discover in human events. In: CVPR (2013)
Song, Z., Wang, M., Hua, X., Yan, S.: Predicting occupation via human clothing and contexts. In: ICCV (2011)
Stone, Z., Zickler, T., Darrell, T.: Toward large-scale face recognition using social network context. Proc. IEEE 98(8), 1408 (2010)
Vondrick, C., Ramanan, D.: Video annotation and tracking with active learning. In: NIPS (2011)
Wang, G., Gallagher, A., Luo, J., Forsyth, D.: Seeing people in social context: recognizing people and social relationships. In: ECCV (2010)
Weng, C.-Y., Chu, W.-T., Rolenet, J-LWu: Movie analysis from the perspective of social networks. IEEE Trans. Multimedia 2, 256–271 (2009)
Yang, Y., Baker, S., Kannan, A., Ramanan, D.: Recognizing proxemics in personal photos. In: CVPR (2012)
Yu, T., Lim, S.-N., Patwardhan, K., Krahnstoever, N.: Monitoring, recognizing and discovering social networks. In: CVPR (2009)
Zhu, J., Xing, E.P.: Conditional topic random fields. In: ICML (2010)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)
Acknowledgments
We thank A. Alahi, J. Krause, and K. Tang for helpful comments. This research is partially supported by the DARPA-Mind’s Eye grant, and the IARPA-Aladdin grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ramanathan, V., Yao, B., Fei-Fei, L. (2014). Social Role Recognition for Human Event Understanding. In: Fu, Y. (eds) Human-Centered Social Media Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-05491-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-05491-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05490-2
Online ISBN: 978-3-319-05491-9
eBook Packages: Computer ScienceComputer Science (R0)