Improving Video Concept Detection Using Spatio-Temporal Correlation

Zhu, Songhao; Liang, Zhiwei; Liu, Yuncai

doi:10.1007/978-3-642-15702-8_5

Songhao Zhu²²,
Zhiwei Liang²² &
Yuncai Liu²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6297))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1451 Accesses

Abstract

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video concept detection. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video concept detection, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the computation of pair-wise similarity. We apply the proposed framework to video concept detection and report superior performance compared to key existing approaches over the benchmark TRECVID data set.

This work is supported by the Research Program of Nanjing University of Posts and Telecommunications under NO. NY209018 and NO. NY209020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Seeger, M.: Learning with labeled and unlabeled data. Technical Report, Edinburgh University (2001)
Google Scholar
Chapelle, O., Zien, A., Scholkopf, B.: Semi-supervised learning. MIT Press, Cambridge (2006)
Google Scholar
Song, Y., Hua, X., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: ACM International Conference on Multimedia Information Retrieval, pp. 97–104. ACM Press, Singapore (2005)
Google Scholar
Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in videos. In: Proc. IEEE International Conference on Computer Vision and Pattern Recognition, pp. 657–663. IEEE Press, San Diego (2005)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic function. In: Proc. IEEE International Conference on Machine Learning, pp. 912–919. IEEE Press, Washington (2003)
Google Scholar
Zhou, D., Bousquet, O., SchÄolkopf, B.: Learning with local and global consistency. In: IEEE International Conference on Neural Information Processing Systems, pp. 321–328. IEEE Press, Vancouver (2003)
Google Scholar
Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: IEEE International Conference on Annual Conference on Computational Learning Theory, pp. 624–638. IEEE Press, Wisconsin (2004)
Google Scholar
He, J., Li, M., Zhang, C.: Generalized manifold-ranking based image retrieval. IEEE Traction on Image Processing 15, 3170–3177 (2006)
Article Google Scholar
Wang, C., Jing, F., Zhang, L., Zhang, H.: Image annotation refinement using random walk with restarts. In: ACM International Conference on Multimedia, pp. 647–650. ACM Press, Augsburg (2007)
Google Scholar
Yuan, X., Hua, X., Wang, M., Wu, X.: Manifold-ranking based video concept detection on large database and feature pool. In: ACM International Conference on Multimedia, pp. 623–626. ACM Press, Augsburg (2007)
Google Scholar
Wang, M., Hua, X., Zhang, H.: Automatic video annotation by semi-supervised learning with kernel density estimation. In: ACM International Conference on Multimedia, pp. 967–976. ACM Press, Vancouver (2008)
Google Scholar
Wang, M., Meiz, T., Dai, L.: Video annotation by graph-based learning with neighborhood similarity. In: ACM International Conference on Multimedia, pp. 325–328. ACM Press, Vancouver (2008)
Google Scholar
Tang, J., Hua, X., Wu, X.: Anisotropic Manifold Ranking for Video Annotation. In: IEEE International Conference on Multimedia and Expo., pp. 492–495. IEEE Press, New York (2009)
Google Scholar
Stricker, M., Orengo, M.: Similarity of color images. In: IEEE International Conference on Storage and Retrieval for Image and Video Databases, pp. 381–392. IEEE Press, San Diego (1995)
Google Scholar
Pass, G.: Comparing images using color coherence vectors. In: ACM International Conference on Multimedia, pp. 65–73. ACM Press, Seattle (1997)
Google Scholar
Kokare, M., Chatterji, B., Biswas, P.: Comparison of similarity metrics for texture image retrieval. In: IEEE International Conference on Multimedia and Expo., pp. 571–575. IEEE Press, New York (2003)
Google Scholar
Zhu, X.: Semi-Supervised Learning Literature Survey. Technical Report, University of Wisconsin-Madison (2007)
Google Scholar
TRECVID. Trecvid retrieval evaluations, http://wwwnlpir.nist.gov/projects/trecvid
Chang, C., Lin, C.: LIBSVM: a library for support vector machines, Software available at, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Wang, J., Zhao, Y., Wu, X., Hua, X.: Transductive multi-label learning for video concept detection. In: ACM International Conference on Multimedia, pp. 298–304. ACM Press, Vancouver (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing University of Post and Telecommunications, Nanjing, 210046, P.R. China
Songhao Zhu & Zhiwei Liang
Shanghai Jiao Tong University, Shanghai, 200240, P.R. China
Yuncai Liu

Authors

Songhao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Yuncai Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Nottingham, Jubilee Campus, NG8 1BB, Nottingham, UK
Guoping Qiu
The Centre for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hong Kong, China
Kin Man Lam
Faculty of System Design, Tokyo Metropolitan University, 6-6, Asahigaoka, 191-0065, Hino-city, Tokyo
Hitoshi Kiya
Shanghai Key Laboratory of Intelligent Information Processing, Department of Computer Science & Engineering, Fudan University, Shanghai, China
Xiang-Yang Xue
Department of Electrical Engineering, University of Southern California, 90089-2564, Los Angeles, CA
C.-C. Jay Kuo
LIACS Media Lab, Leiden University,
Michael S. Lew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, S., Liang, Z., Liu, Y. (2010). Improving Video Concept Detection Using Spatio-Temporal Correlation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-15702-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics