Skip to main content

Advertisement

Log in

Graph regularized GM-pLSA and its applications to video content analysis

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

As standard probabilistic latent semantic analysis (pLSA) is oriented to discrete quantity only, pLSA with Gaussian mixtures (GM-pLSA) succeeding in transferring it to continuous feature space is proposed, which uses Gaussian mixture model to describe the feature distribution under each latent aspect. However, inheriting from pLSA, GM-pLSA still overlooks the intrinsic interdependence between terms, which indeed is an important clue for performance improvement. In this paper, we present a graph regularized GM-pLSA (GRGM-pLSA) model as an extension of GM-pLSA to embed this term correlation information into the process of model learning. Specifically, grounded on the manifold regularization principle, a graph regularizer is introduced to characterize the correlation between terms; by imposing it on the objective function of GM-pLSA, model parameters of GRGM-pLSA are derived via corresponding expectation maximization algorithm. Furthermore, two applications to video content analysis are devised. One is video categorization where GRGM-pLSA serves for feature mapping with two kinds of sub-shot correlations, respectively, incorporated, while the other provides a new perspective on video concept detection, which transforms the detection task to a GRGM-pLSA-based visual-to-textual feature conversion problem. Extensive experiments and comparison with GM-pLSA and several state-of-the-art approaches in both applications demonstrate the effectiveness of GRGM-pLSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ewerth, R., Freisleben, B.: Semi-supervised learning for semantic video retrieval. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 154–161 (2007)

  2. Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X.: Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent. 20(2), 97–103 (2009)

    Article  Google Scholar 

  3. Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive SVMs. In: Proceedings of ACM International Conference on Multimedia, pp. 188–197 (2007)

  4. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(2), 177–196 (2001)

    Article  MATH  Google Scholar 

  5. Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Proceedings of European Conference on Computer Vision, pp. 517–530 (2006)

  6. Hörster, E., Lienhart, R., Slaney, M.: Continuous visual vocabulary models for pLSA-based scene recognition. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 319–328 (2008)

  7. Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 1802–1817 (2007)

    Article  Google Scholar 

  8. Li, Z., Shi, Z., Liu, X., Shi, Z.: Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recognit. Lett. 32(3), 516–523 (2011)

    Article  Google Scholar 

  9. Ahrendt, P., Larsen, J., Goutte, C.: Co-occurrence models in music genre classification. In: Proceedings of IEEE Workshop on Machine Learning for Signal Processing, pp. 247–252 (2005)

  10. Bekkerman, R., Allan, J.: Using bigrams in text categorization. CIIR Technical Report IR-408 (2004)

  11. Chen, B.: Word topic models for spoken document retrieval and transcription. ACM Trans. Asian Lang. Inf. Process. 8(1), 1–27 (2009)

    Google Scholar 

  12. Wong, S., Kim, T., Cipolla, R.: Learning motion categories using both semantic and structural information. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–6 (2007)

  13. Fergus, R., Li, F., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: Proceedings of International Conference on Computer Vision, pp. 1816–1823 (2005)

  14. Zhang, J., Gong, S.: Action categorization by structural probabilistic latent semantic analysis. Comput. Vis. Image Underst. 114(8), 857–864 (2010)

    Article  Google Scholar 

  15. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MATH  MathSciNet  Google Scholar 

  16. Brezeale, D., Cook, D.J.: Automatic video classification: a survey of the literature. IEEE Trans. Syst. Man Cybern. Part C 38(3), 416–430 (2008)

    Article  Google Scholar 

  17. Huang, C., Shih, H., Chao, C.: Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans. Multimed. 8(4), 749–760 (2006)

    Article  Google Scholar 

  18. Lehane, B., O’Connor, N.E., Murphy, N.: Action sequence detection in motion pictures. In: Proceedings of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (2004)

  19. Xu, G., Ma, Y., Zhang, H., Yang, S.: An HMM-based framework for video semantic analysis. IEEE Trans. Circuits Syst. Video Technol. 15(11), 1422–1433 (2005)

    Article  Google Scholar 

  20. Xu, C., Wang, J., Lu, H., Zhang, Y.: A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans. Multimed. 10(3), 421–436 (2008)

    Article  Google Scholar 

  21. Truong, B.T., Venkatesh, S., Dorai, C.: Automatic genre identification for content-based video categorization. In: Proceedings of International Conference on Pattern Recognition, pp. 4230–4233 (2000)

  22. Yuan, X., Lai, W., Mei, T., Hua, X., Wu, X., Li, S.: Automatic video genre categorization using hierarchical SVM. In: Proceedings of IEEE International Conference on Image Processing, pp. 2905–2908 (2006)

  23. Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)

    Article  Google Scholar 

  24. Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.J.: A survey on visual content-based video indexing and retrieval. IEEE Trans. Syst. Man Cybern. Part C 41(6), 797–819 (2011)

    Article  Google Scholar 

  25. Yang, L., Liu, J., Yang, X., Hua, X.: Multi-modality web video categorization. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 265–274 (2007)

  26. Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Trans. Audio Speech Lang. Process. 16(6), 1406–1416 (2010)

    Article  Google Scholar 

  27. http://www-nlpir.nist.gov/projects/trecvid/. NIST. Trec video retrieval evaluation (trecvid)

  28. Shi, R., Chua, T., Lee, C., Gao, S.: Bayesian learning of hierarchical multinomial mixture models of concepts for automatic image annotation. In: Proceedings of ACM International Conference on Image and Video Retrieval, pp. 102–112 (2006)

  29. Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008)

    Article  Google Scholar 

  30. Yanagawa, A., Chang, S., Kennedy, L., Hsu, W.: Columbia university’s baseline detectors for 374 LSCOM semantic visual concepts. Columbia University ADVENT Technical Report # 222-2006-8 (2007)

  31. Li, Y., Tian, Y., Duan, L., Yang, J., Huang, T., Gao, W.: Sequence multi-labeling: a unified video annotation scheme with spatial and temporal context. IEEE Trans. Multimed. 12(8), 814–828 (2010)

    Article  Google Scholar 

  32. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

  33. Jiang, W., Cotton, C.V., Chang, S., Ellis, D., Loui, A.C.: Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM International Conference on Multimedia, pp. 5–14 (2009)

  34. Liu, K., Weng, M., Tseng, C., Chuang, Y., Chen, M.: Association and temporal rule mining for post-filtering of semantic concept detection in video. IEEE Trans. Multimed. 10(2), 240–251 (2008)

    Article  Google Scholar 

  35. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126 (2003)

  36. Feng, S., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1002–1009 (2004)

  37. Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 127–134 (2003)

  38. Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: Proceedings of ACM International Conference on Multimedia, pp. 275–278 (2003)

  39. Li, Z., Shi, Z., Liu, X., Li, Z., Shi, Z.: Fusing semantic aspects for image annotation and retrieval. J. Vis. Commun. Image Represent. 21(8), 798–805 (2010)

    Article  Google Scholar 

  40. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  41. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems 14, pp 585–591. MIT press (2001)

  42. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16, pp 321–328. MIT press (2004)

  43. Tang, J., Hua, X., Mei, T., Qi, G., Wu, X.: Video annotation based on temporally consistent Gaussian random field. Electron. Lett. 43(8), 448–449 (2007)

    Article  Google Scholar 

  44. Liu, J., Cai, D., He, X.: Gaussian mixture model with local consistency. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 512–517 (2010)

  45. Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)

    Article  Google Scholar 

  46. Yang, J., Hauptmann, A.G.: Exploring temporal consistency for video analysis and retrieval. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 33–42 (2006)

  47. Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representations for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)

    Article  Google Scholar 

  48. Wang, C., Yan, S., Zhang, L., Zhang, H.: Multi-label sparse coding for automatic image annotation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1643–1650 (2009)

  49. Ulges, A., Schulze, C., Keysers, D., Breuel, T.M.: A system that learns to tag videos by watching Youtube. In: Proceedings of International Conference on Computer Vision Systems, pp. 415–424 (2008)

  50. Yanagawa, A., Hsu, W., Chang, S.: Brief descriptions of visual features for baseline TRECVID concept detectors. Columbia University ADVENT Technical Report #219-2006-5 (2006)

  51. Jiang, Y., Yang, J., Ngo, C., Hauptmann, A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans. Multimed. 12(1), 42–53 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Science Foundation of China (61273274, 4123104), National 973 Key Research Program of China (2011CB302203), Ph.D. Programs Foundation of Ministry of Education of China (20100009110004), National Key Technology R&D Program of China (2012BAH01F03) and Tsinghua-Tencent Joint Lab for IIT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cencen Zhong.

Additional information

Communicated by B. Huet.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, C., Miao, Z. Graph regularized GM-pLSA and its applications to video content analysis. Multimedia Systems 20, 429–445 (2014). https://doi.org/10.1007/s00530-014-0378-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0378-9

Keywords