Abstract
Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video concept detection. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video concept detection, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the computation of pair-wise similarity. We apply the proposed framework to video concept detection and report superior performance compared to key existing approaches over the benchmark TRECVID data set.
This work is supported by the Research Program of Nanjing University of Posts and Telecommunications under NO. NY209018 and NO. NY209020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Seeger, M.: Learning with labeled and unlabeled data. Technical Report, Edinburgh University (2001)
Chapelle, O., Zien, A., Scholkopf, B.: Semi-supervised learning. MIT Press, Cambridge (2006)
Song, Y., Hua, X., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: ACM International Conference on Multimedia Information Retrieval, pp. 97–104. ACM Press, Singapore (2005)
Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in videos. In: Proc. IEEE International Conference on Computer Vision and Pattern Recognition, pp. 657–663. IEEE Press, San Diego (2005)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic function. In: Proc. IEEE International Conference on Machine Learning, pp. 912–919. IEEE Press, Washington (2003)
Zhou, D., Bousquet, O., SchÄolkopf, B.: Learning with local and global consistency. In: IEEE International Conference on Neural Information Processing Systems, pp. 321–328. IEEE Press, Vancouver (2003)
Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: IEEE International Conference on Annual Conference on Computational Learning Theory, pp. 624–638. IEEE Press, Wisconsin (2004)
He, J., Li, M., Zhang, C.: Generalized manifold-ranking based image retrieval. IEEE Traction on Image Processing 15, 3170–3177 (2006)
Wang, C., Jing, F., Zhang, L., Zhang, H.: Image annotation refinement using random walk with restarts. In: ACM International Conference on Multimedia, pp. 647–650. ACM Press, Augsburg (2007)
Yuan, X., Hua, X., Wang, M., Wu, X.: Manifold-ranking based video concept detection on large database and feature pool. In: ACM International Conference on Multimedia, pp. 623–626. ACM Press, Augsburg (2007)
Wang, M., Hua, X., Zhang, H.: Automatic video annotation by semi-supervised learning with kernel density estimation. In: ACM International Conference on Multimedia, pp. 967–976. ACM Press, Vancouver (2008)
Wang, M., Meiz, T., Dai, L.: Video annotation by graph-based learning with neighborhood similarity. In: ACM International Conference on Multimedia, pp. 325–328. ACM Press, Vancouver (2008)
Tang, J., Hua, X., Wu, X.: Anisotropic Manifold Ranking for Video Annotation. In: IEEE International Conference on Multimedia and Expo., pp. 492–495. IEEE Press, New York (2009)
Stricker, M., Orengo, M.: Similarity of color images. In: IEEE International Conference on Storage and Retrieval for Image and Video Databases, pp. 381–392. IEEE Press, San Diego (1995)
Pass, G.: Comparing images using color coherence vectors. In: ACM International Conference on Multimedia, pp. 65–73. ACM Press, Seattle (1997)
Kokare, M., Chatterji, B., Biswas, P.: Comparison of similarity metrics for texture image retrieval. In: IEEE International Conference on Multimedia and Expo., pp. 571–575. IEEE Press, New York (2003)
Zhu, X.: Semi-Supervised Learning Literature Survey. Technical Report, University of Wisconsin-Madison (2007)
TRECVID. Trecvid retrieval evaluations, http://wwwnlpir.nist.gov/projects/trecvid
Chang, C., Lin, C.: LIBSVM: a library for support vector machines, Software available at, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Wang, J., Zhao, Y., Wu, X., Hua, X.: Transductive multi-label learning for video concept detection. In: ACM International Conference on Multimedia, pp. 298–304. ACM Press, Vancouver (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, S., Liang, Z., Liu, Y. (2010). Improving Video Concept Detection Using Spatio-Temporal Correlation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-15702-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)