Harnessing Lab Knowledge for Real-World Action Recognition

Ma, Zhigang; Yang, Yi; Nie, Feiping; Sebe, Nicu; Yan, Shuicheng; Hauptmann, Alexander G.

doi:10.1007/s11263-014-0717-5

Harnessing Lab Knowledge for Real-World Action Recognition

Published: 16 April 2014

Volume 109, pages 60–73, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Zhigang Ma¹,
Yi Yang²,
Feiping Nie³,
Nicu Sebe⁴,
Shuicheng Yan⁵ &
…
Alexander G. Hauptmann¹

1261 Accesses
36 Citations
Explore all metrics

Abstract

Much research on human action recognition has been oriented toward the performance gain on lab-collected datasets. Yet real-world videos are more diverse, with more complicated actions and often only a few of them are precisely labeled. Thus, recognizing actions from these videos is a tough mission. The paucity of labeled real-world videos motivates us to “borrow” strength from other resources. Specifically, considering that many lab datasets are available, we propose to harness lab datasets to facilitate the action recognition in real-world videos given that the lab and real-world datasets are related. As their action categories are usually inconsistent, we design a multi-task learning framework to jointly optimize the classifiers for both sides. The general Schatten $p$-norm is exerted on the two classifiers to explore the shared knowledge between them. In this way, our framework is able to mine the shared knowledge between two datasets even if the two have different action categories, which is a major virtue of our method. The shared knowledge is further used to improve the action recognition in the real-world videos. Extensive experiments are performed on real-world datasets with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly-Supervised Action Recognition and Localization via Knowledge Transfer

MMA: a multi-view and multi-modality benchmark dataset for human action recognition

Article 21 March 2018

SITAR: Semi-supervised Image Transformer for Action Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Equation (7) is non-differentiable in the neighborhood of the optimum. Hence, in the implementation, we can define $D$ as $D=\frac{p}{2}(PP^T + \varsigma I)^{\frac{{p - 2}}{2}}$ where $\varsigma $ is a small constant and $I$ is a diagonal matrix.

References

Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning Research, 73(3), 243–272.
Article Google Scholar
Argyriou, A., Micchelli, C. A., Pontil, M., & Ying, Y. (2010). A spectral regularization framework for multi-task structure learning. Journal of Machine Learning Research, 11, 935–953.
MATH Google Scholar
Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In International conference on computer vision (pp. 2252–2259).
Cao, L., Liu, Z., & Huang, T. S. (2010). Cross-dataset action detection. In IEEE conference on computer vision and pattern recognition (pp. 1998–2005).
Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., & Xiao, J. (2011). Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1676–1689.
Article Google Scholar
Chen, M.-Y., & Hauptmann, A. (2009). Mosift: Recognizing human actions in surveillance videos. In Technical Report CMU-CS-09-161, Carnegie Mellon University.
Deselaers, T., Alexe, B., & Ferrari, V. (2012). Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision, 100(3), 275–293.
Article MathSciNet Google Scholar
Duan, L., Xu, D., Tsang, I. W.-H., & Luo, J. (2012). Visual event recognition in videos by learning from web data. IEEE Transactions Pattern Analysis and Machine Intelligence, 34(9), 1667–1680.
Article Google Scholar
Farhadi, A., & Tabrizi, M. K. (2008) Learning to recognize activities from the wrong view point. In European conference on computer vision (pp. 154–166).
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. In Technical Report: Department of Computer Science, National Taiwan University.
Jhuo, I.-H., Liu, D., Lee, D. T., & Chang, S.-F. (2012). Robust visual domain adaptation with low-rank reconstruction. In IEEE conference on computer vision and pattern recognition (pp. 2168–2175).
Kläser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In British machine vision conference.
Kovashka, A., & Grauman, K. (2010). Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In IEEE conference on computer vision and pattern recognition (pp. 2046–2053).
Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE conference on computer vision and pattern recognition (pp. 1785–1792).
Laptev, I., & Lindeberg, T. (2003). Space-time interest points. In International conference on computer vision (pp. 432–439).
Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In IEEE conference on computer vision and pattern recognition.
Liu, J., Luo, J., & Shah, M. (2009). Recognizing realistic actions from videos. In IEEE conference on computer vision and pattern recognition (pp. 1996–2003).
Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. In IEEE conference on computer vision and pattern recognition (pp. 3209–3216).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Luo, J., Tommasi, T., & Caputo, B. (2011). Multiclass transfer learning from unconstrained priors. In International conference on computer vision (pp. 1863–1870).
Ma, Z., Yang, Y., Cai, Y., Sebe, N., & Hauptmann, A. G. (2012). Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In ACM MM (pp. 469–478).
Nie, F., Huang, H., & Ding, C. (2012). Low-rank matrix recovery via efficient schatten p-norm minimization. In AAAI conference on artificial intelligence.
Obozinski, G., Taskar, B., & Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2), 231–252.
Article MathSciNet Google Scholar
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6), 976–990.
Article Google Scholar
Qi, G., Aggarwal, C., Rui, Y., Tian, Q., Chang, S., & Huang, T. (2011). Towards cross-category knowledge propagation for learning visual concepts. In IEEE conference on computer vision and pattern recognition (pp. 897–904).
Saberian, M. J., Masnadi-Shirazi, H., & Vasconcelos, N. (2011). Taylorboost: First and second-order boosting algorithms with explicit margin control. In IEEE conference on computer vision and pattern recognition (pp. 2929–2934).
Salakhutdinov, R., Torralba, A., & Tenenbaum, J. (2011). Learning to share visual appearance for multiclass object detection. In IEEE conference on computer vision and pattern recognition (pp. 1481–1488).
Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
Article Google Scholar
Schüldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local svm approach. In International conference on pattern recognition (pp. 32–36).
Shi, Q., Cheng, L., Wang, L., & Smola, A. (2011). Human action segmentation and recognition using discriminative semi-Markov models. International Journal of Computer Vision, 93(1), 22–32.
Article MATH Google Scholar
Sigal, L., Balan, A. O., & Black, M. J. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Internatinal Journal of Computer Vision, 87(1–2), 4–27.
Article Google Scholar
Torresani, L., Szummer, M., & Fitzgibbon, A. W. (2010). Efficient object category recognition using classemes. In European conference on computer vision (pp. 776–789).
Wang, H., Ullah, M. M., Kläser, A., Laptev, I., & Schmid, C. (2009) Evaluation of local spatio-temporal features for action recognition. In British machine vision conference.
Wang, L., Wang, Y., & Gao, W. (2011). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.
Article MATH MathSciNet Google Scholar
Wang, S., Yang, Y., Ma, Z., Li, X., Pang, C., & Hauptmann, A. (2012). Action recognition by exploring data distribution and feature correlation. In IEEE conference on computer vision and pattern recognition (pp. 1370–1377).
Willems, G., Tuytelaars, T., & Gool, L. J. V. (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In European conference on computer vision (pp. 650–663).
Wu, X., Xu, D., Duan, L., & Luo, J. (2011). Recognizing realistic actions from videos. In IEEE conference on computer vision and pattern recognition (pp. 489–496).
Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In ACM international conference on multimedia (pp. 188–197).
Yang, Y., Ma, Z., Hauptmann, A. G., & Sebe, N. (2013). Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Transactions on Multimedia, 15(3), 661–669.
Article Google Scholar
You, D., Martínez, A. M. (2010). Bayes optimal kernel discriminant analysis. In IEEE conference on computer vision and pattern recognition (pp. 3533–3538).
Yu, X., & Aloimonos, Y. (2010). Attribute-based transfer learning for object categorization with zero/one training example. In European conference on computer vision (pp. 127–140).

Download references

Acknowledgments

This paper was partially supported by the US Department of Defense, the U.S. Army Research Office (W911NF-13-1-0277) and by the National Science Foundation under Grant No. IIS-1251187, the xLiMe EC project, the ARC Project DE130101311 and the Singapore National Research Foundation under its International Research Centre @Singapore Funding Initiative and administered by the IDM Programme Office. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.

Disclaimer The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ARO, the National Science Foundation or the U.S. Government.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Zhigang Ma & Alexander G. Hauptmann
ITEE, The University of Queensland, Brisbane, Australia
Yi Yang
University of Texas at Arlington, Arlington, TX, USA
Feiping Nie
University of Trento, Trento, Italy
Nicu Sebe
National University of Singapore, Singapore, Singapore
Shuicheng Yan

Authors

Zhigang Ma
View author publications
You can also search for this author inPubMed Google Scholar
Yi Yang
View author publications
You can also search for this author inPubMed Google Scholar
Feiping Nie
View author publications
You can also search for this author inPubMed Google Scholar
Nicu Sebe
View author publications
You can also search for this author inPubMed Google Scholar
Shuicheng Yan
View author publications
You can also search for this author inPubMed Google Scholar
Alexander G. Hauptmann
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yi Yang.

Additional information

Communicated by Hal Daumé.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, Z., Yang, Y., Nie, F. et al. Harnessing Lab Knowledge for Real-World Action Recognition. Int J Comput Vis 109, 60–73 (2014). https://doi.org/10.1007/s11263-014-0717-5

Download citation

Received: 15 March 2013
Accepted: 17 March 2014
Published: 16 April 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s11263-014-0717-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Harnessing Lab Knowledge for Real-World Action Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Weakly-Supervised Action Recognition and Localization via Knowledge Transfer

MMA: a multi-view and multi-modality benchmark dataset for human action recognition

SITAR: Semi-supervised Image Transformer for Action Recognition

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now