Skip to main content
Log in

Compact representation for large-scale unconstrained video analysis

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performance in large-scale video analysis systems that aims to understand the contents in videos, such as concept recognition and event detection. However, these features are in high-dimensional representations, which remarkably increases computation costs and correspondingly deteriorates the performance of subsequent learning tasks. Notably, the situation becomes even worse when dealing with large-scale video data where the number of class labels are limited. To address this problem, we propose a novel algorithm to compactly represent huge amounts of unconstrained video data. Specifically, redundant feature dimensions are removed by using our proposed feature selection algorithm. Considering unlabeled videos that are easy to obtain on the web, we apply this feature selection algorithm in a semi-supervised framework coping with a shortage of class information. Different from most of the existing semi-supervised feature selection algorithms, our proposed algorithm does not rely on manifold approximation, i.e. graph Laplacian, which is quite expensive for a large number of data. Thus, it is possible to apply the proposed algorithm to a real large-scale video analysis system. Besides, due to the difficulty of solving the non-smooth objective function, we develop an efficient iterative approach to seeking the global optimum. Extensive experiments are conducted on several real-world video datasets, including KTH, CCV, and HMDB. The experimental results have demonstrated the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Notes

  1. https://www.youtube.com/yt/press/statistics.html

References

  1. Chang, X., Nie, F., Ma, Z., Yang, Y., Zhou, X.: A convex formulation for spectral shrunk clustering. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

  2. Chang, X., Nie, F., Yang, Y., Huang, H.: A convex formulation for semi-supervised multi-label feature selection. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

  3. Chang, X., Shen, H., Wang, S., Liu, J., Li, X.: Semi-supervised feature analysis for multimedia annotation by mining label correlation. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 74–85 (2014)

  4. Chen, D., Cao, X., Wen, F., Sun, J.: Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3025–3032 (2013)

  5. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Son (2012)

  6. Han, Y., Wu, F., Tao, D., Shao, J., Zhuang, Y., Jiang, J.: Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans. Circuits Syst. Video Technol. 22(10), 1485–1496 (2012)

    Article  Google Scholar 

  7. Han, Y., Yang, Y., Yan, Y., Ma, Z., Sebe, N., Zhou, X.: Semisupervised feature selection via spline regression for video semantic recognition. IEEE Transactions on Neural Networks and Learning Systems 26(2), 252–264 (2015)

    Article  Google Scholar 

  8. Han, Y., Yang, Y., Zhou, X.: Co-regularized ensemble for feature selection. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1380–1386 (2013)

  9. Han, Y., Zhang, J., Xu, Z., Yu, S.: Discriminative multi-task feature selection. In: Late-Breaking Developments in the Field of Artificial Intelligence, AAAI (2013)

  10. Jėgou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)

    Article  Google Scholar 

  11. Jėgou, H., Perronnin, F., Douze, M., Sȧnchez, J., Pėrez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012)

    Article  Google Scholar 

  12. Jiang, Y., Ye, G., Chang, S., Ellis, D.P.W., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: International Conference on Multimedia Retrieval, p. 29 (2011)

  13. Kong, D., Ding, C.H.Q.: Efficient algorithms for selecting features with arbitrary group constraints via group lasso. In: IEEE 13th International Conference on Data Mining, pp. 379–388 (2013)

  14. Kong, D., Ding, C.H.Q., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2352–2359 (2012)

  15. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A large video database for human motion recognition. In: IEEE International Conference on Computer Vision, pp. 2556–2563 (2011)

  16. Ma, Z., Nie, F., Yang, Y., Uijlings, J.R.R., Sebe, N.: Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans. Multimedia 14(4), 1021–1030 (2012)

    Article  Google Scholar 

  17. Ma, Z., Nie, F., Yang, Y., Uijlings, J.R.R., Sebe, N., Hauptmann, A.G.: Discriminating joint feature analysis for multimedia data understanding. IEEE Trans. Multimedia 14(6), 1662–1672 (2012)

    Article  Google Scholar 

  18. Ma, Z., Yang, Y., Nie, F., Sebe, N., Yan, S., Hauptmann, A.G.: Harnessing lab knowledge for real-world action recognition. Int. J. Comput. Vis. 109 (1-2), 60–73 (2014)

    Article  Google Scholar 

  19. Ma, Z., Yang, Y., Sebe, N., Hauptmann, A.G.: Knowledge adaptation with partially shared features for event detection using few exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 36(9), 1789–1802 (2014)

    Article  Google Scholar 

  20. Ma, Z., Yang, Y., Xu, Z., Yan, S., Sebe, N., Hauptmann, A.G.: Complex event detection via multi-source video attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2627–2633 (2013)

  21. Neufeld, J., Yu, Y., Zhang, X., Kiros, R., Schuurmans, D.: Regularizers versus losses for nonlinear dimensionality reduction: A factored view with new convex relaxations. In: Proceedings of the 29th International Conference on Machine Learning (2012)

  22. Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint l2, 1-norms minimization. In: Advances in Neural Information Processing Systems, pp. 1813–1821 (2010)

  23. Oneata, D., Verbeek, J.J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE International Conference on Computer Vision, pp. 1817–1824 (2013)

  24. Sȧnchez, J., Perronnin, F.: High-dimensional signature compression for large-scale image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1665–1672 (2011)

  25. Schu̇ldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition, pp. 32–36 (2004)

  26. Schwartz, W.R., Kembhavi, A., Harwood, D., Davis, L.S.: Human detection using partial least squares analysis. In: IEEE International Conference on Computer Vision, pp. 24–31 (2009)

  27. Shao, L., Mattivi, R.: Feature detector and descriptor evaluation in human action recognition. In: ACM International Conference on Image and Video Retrieval, pp. 477–484 (2010)

  28. Soares, R.G.F., Chen, H., Yao, X.: Semisupervised classification with cluster regularization. IEEE Transactions on Neural Networks and Learning Systems 23(11), 1779–1792 (2012)

    Article  Google Scholar 

  29. Vedaldi, A., Zisserman, A.: Sparse kernel approximations for efficient classification and detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2320–2327 (2012)

  30. Wang, D., Nie, F., Huang, H.: Unsupervised feature selection via unified trace ratio formulation and k-means clustering (TRACK). In: European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 306–321 (2014)

  31. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

  32. Wang, S., Chang, X., Li, X., Shen, Q.Z., Chen, W.: Multi-task support vector machines for feature selection. Signal Process. (2015). doi:10.1016/j.sigpro.2014.12.012

    Google Scholar 

  33. Wang, S., Ma, Z., Yang, Y., Li, X., Pang, C., Hauptmann, A.G.: Semi-supervised multiple feature analysis for action recognition. IEEE Trans. Multimedia 16(2), 289–298 (2014)

    Article  Google Scholar 

  34. Yan, Y., Liu, G., Wang, S., Zhang, J., Zheng, K.: Graph-based clustering and ranking for diversified image search. Multimedia Systems, 1–12 (2014)

  35. Yang, Y., Ma, Z., Hauptmann, A.G., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15(3), 661–669 (2013)

    Article  Google Scholar 

  36. Yang, Y., Ma, Z., Xu, Z., Yan, S., Hauptmann, A.G.: How related exemplars help complex event detection in web videos?. In: IEEE International Conference on Computer Vision, pp. 2104–2111 (2013)

  37. Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on Multimedia, pp. 175–184. ACM (2009)

  38. Yang, Y., Xu, D., Nie, F., Yan, S., Zhuang, Y.: Image clustering using local discriminant models and global integration. IEEE Trans. Image Process. 19(10), 2761–2773 (2010)

    Article  MathSciNet  Google Scholar 

  39. Yang, Y., Zhuang, Y.T., Wu, F., Pan, Y.H.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)

    Article  Google Scholar 

  40. Zhang, X., Yu, Y., White, M., Huang, R., Schuurmans, D.: Convex sparse coding, subspace learning, and semi-supervised extensions. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence (2011)

  41. Zhang, Y., Wu, J., Cai, J.: Compact representation for image classification: To choose or to compress?. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 907–914 (2014)

  42. Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: SIAM International Conference on Data Mining, pp. 641–646 (2007)

  43. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning, pp. 1151–1157 (2007)

  44. Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: The 11th European Conference on Computer Vision, pp. 141–154 (2010)

  45. Zhu, X.: Semi-supervised learning literature survey, Tech. rep., University of WisconsinMadison (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sen Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Pan, P., Long, G. et al. Compact representation for large-scale unconstrained video analysis. World Wide Web 19, 231–246 (2016). https://doi.org/10.1007/s11280-015-0354-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0354-0

Keywords

Navigation