Graph regularized GM-pLSA and its applications to video content analysis

Zhong, Cencen; Miao, Zhenjiang

doi:10.1007/s00530-014-0378-9

Graph regularized GM-pLSA and its applications to video content analysis

Regular Paper
Published: 03 May 2014

Volume 20, pages 429–445, (2014)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Cencen Zhong¹ &
Zhenjiang Miao¹

203 Accesses
1 Citation
Explore all metrics

Abstract

As standard probabilistic latent semantic analysis (pLSA) is oriented to discrete quantity only, pLSA with Gaussian mixtures (GM-pLSA) succeeding in transferring it to continuous feature space is proposed, which uses Gaussian mixture model to describe the feature distribution under each latent aspect. However, inheriting from pLSA, GM-pLSA still overlooks the intrinsic interdependence between terms, which indeed is an important clue for performance improvement. In this paper, we present a graph regularized GM-pLSA (GRGM-pLSA) model as an extension of GM-pLSA to embed this term correlation information into the process of model learning. Specifically, grounded on the manifold regularization principle, a graph regularizer is introduced to characterize the correlation between terms; by imposing it on the objective function of GM-pLSA, model parameters of GRGM-pLSA are derived via corresponding expectation maximization algorithm. Furthermore, two applications to video content analysis are devised. One is video categorization where GRGM-pLSA serves for feature mapping with two kinds of sub-shot correlations, respectively, incorporated, while the other provides a new perspective on video concept detection, which transforms the detection task to a GRGM-pLSA-based visual-to-textual feature conversion problem. Extensive experiments and comparison with GM-pLSA and several state-of-the-art approaches in both applications demonstrate the effectiveness of GRGM-pLSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compact representation for large-scale unconstrained video analysis

Article 10 June 2015

Non-local NetVLAD Encoding for Video Classification

SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model

References

Ewerth, R., Freisleben, B.: Semi-supervised learning for semantic video retrieval. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 154–161 (2007)
Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X.: Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent. 20(2), 97–103 (2009)
Article Google Scholar
Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive SVMs. In: Proceedings of ACM International Conference on Multimedia, pp. 188–197 (2007)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(2), 177–196 (2001)
Article MATH Google Scholar
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Proceedings of European Conference on Computer Vision, pp. 517–530 (2006)
Hörster, E., Lienhart, R., Slaney, M.: Continuous visual vocabulary models for pLSA-based scene recognition. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 319–328 (2008)
Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 1802–1817 (2007)
Article Google Scholar
Li, Z., Shi, Z., Liu, X., Shi, Z.: Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recognit. Lett. 32(3), 516–523 (2011)
Article Google Scholar
Ahrendt, P., Larsen, J., Goutte, C.: Co-occurrence models in music genre classification. In: Proceedings of IEEE Workshop on Machine Learning for Signal Processing, pp. 247–252 (2005)
Bekkerman, R., Allan, J.: Using bigrams in text categorization. CIIR Technical Report IR-408 (2004)
Chen, B.: Word topic models for spoken document retrieval and transcription. ACM Trans. Asian Lang. Inf. Process. 8(1), 1–27 (2009)
Google Scholar
Wong, S., Kim, T., Cipolla, R.: Learning motion categories using both semantic and structural information. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–6 (2007)
Fergus, R., Li, F., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: Proceedings of International Conference on Computer Vision, pp. 1816–1823 (2005)
Zhang, J., Gong, S.: Action categorization by structural probabilistic latent semantic analysis. Comput. Vis. Image Underst. 114(8), 857–864 (2010)
Article Google Scholar
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
MATH MathSciNet Google Scholar
Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans. Syst. Man Cybern. Part C 38(3), 416–430 (2008)
Article Google Scholar
Huang, C., Shih, H., Chao, C.: Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans. Multimed. 8(4), 749–760 (2006)
Article Google Scholar
Lehane, B., O’Connor, N.E., Murphy, N.: Action sequence detection in motion pictures. In: Proceedings of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (2004)
Xu, G., Ma, Y., Zhang, H., Yang, S.: An HMM-based framework for video semantic analysis. IEEE Trans. Circuits Syst. Video Technol. 15(11), 1422–1433 (2005)
Article Google Scholar
Xu, C., Wang, J., Lu, H., Zhang, Y.: A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multimed. 10(3), 421–436 (2008)
Article Google Scholar
Truong, B.T., Venkatesh, S., Dorai, C.: Automatic genre identification for content-based video categorization. In: Proceedings of International Conference on Pattern Recognition, pp. 4230–4233 (2000)
Yuan, X., Lai, W., Mei, T., Hua, X., Wu, X., Li, S.: Automatic video genre categorization using hierarchical SVM. In: Proceedings of IEEE International Conference on Image Processing, pp. 2905–2908 (2006)
Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)
Article Google Scholar
Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.J.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C 41(6), 797–819 (2011)
Article Google Scholar
Yang, L., Liu, J., Yang, X., Hua, X.: Multi-modality web video categorization. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 265–274 (2007)
Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Trans. Audio Speech Lang. Process. 16(6), 1406–1416 (2010)
Article Google Scholar
http://www-nlpir.nist.gov/projects/trecvid/. NIST. Trec video retrieval evaluation (trecvid)
Shi, R., Chua, T., Lee, C., Gao, S.: Bayesian learning of hierarchical multinomial mixture models of concepts for automatic image annotation. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 102–112 (2006)
Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008)
Article Google Scholar
Yanagawa, A., Chang, S., Kennedy, L., Hsu, W.: Columbia university’s baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report # 222-2006-8 (2007)
Li, Y., Tian, Y., Duan, L., Yang, J., Huang, T., Gao, W.: Sequence multi-labeling: a unified video annotation scheme with spatial and temporal context. IEEE Trans. Multimed. 12(8), 814–828 (2010)
Article Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Jiang, W., Cotton, C.V., Chang, S., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia, pp. 5–14 (2009)
Liu, K., Weng, M., Tseng, C., Chuang, Y., Chen, M.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)
Article Google Scholar
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126 (2003)
Feng, S., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1002–1009 (2004)
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 127–134 (2003)
Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: Proceedings of ACM International Conference on Multimedia, pp. 275–278 (2003)
Li, Z., Shi, Z., Liu, X., Li, Z., Shi, Z.: Fusing semantic aspects for image annotation and retrieval. J. Vis. Commun. Image Represent. 21(8), 798–805 (2010)
Article Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems 14, pp 585–591. MIT press (2001)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp 321–328. MIT press (2004)
Tang, J., Hua, X., Mei, T., Qi, G., Wu, X.: Video annotation based on temporally consistent Gaussian random field. Electron. Lett. 43(8), 448–449 (2007)
Article Google Scholar
Liu, J., Cai, D., He, X.: Gaussian mixture model with local consistency. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 512–517 (2010)
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)
Article Google Scholar
Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 33–42 (2006)
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representations for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)
Article Google Scholar
Wang, C., Yan, S., Zhang, L., Zhang, H.: Multi-label sparse coding for automatic image annotation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1643–1650 (2009)
Ulges, A., Schulze, C., Keysers, D., Breuel, T.M.: A system that learns to tag videos by watching Youtube. In: Proceedings of International Conference on Computer Vision Systems, pp. 415–424 (2008)
Yanagawa, A., Hsu, W., Chang, S.: Brief descriptions of visual features for baseline TRECVID concept detectors. Columbia University ADVENT Technical Report #219-2006-5 (2006)
Jiang, Y., Yang, J., Ngo, C., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans. Multimed. 12(1), 42–53 (2010)
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Science Foundation of China (61273274, 4123104), National 973 Key Research Program of China (2011CB302203), Ph.D. Programs Foundation of Ministry of Education of China (20100009110004), National Key Technology R&D Program of China (2012BAH01F03) and Tsinghua-Tencent Joint Lab for IIT.

Author information

Authors and Affiliations

Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China
Cencen Zhong & Zhenjiang Miao

Authors

Cencen Zhong
View author publications
You can also search for this author inPubMed Google Scholar
Zhenjiang Miao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Cencen Zhong.

Additional information

Communicated by B. Huet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, C., Miao, Z. Graph regularized GM-pLSA and its applications to video content analysis. Multimedia Systems 20, 429–445 (2014). https://doi.org/10.1007/s00530-014-0378-9

Download citation

Received: 06 May 2013
Accepted: 28 March 2014
Published: 03 May 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s00530-014-0378-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph regularized GM-pLSA and its applications to video content analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Compact representation for large-scale unconstrained video analysis

Non-local NetVLAD Encoding for Video Classification

SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now