Skip to main content
Log in

Localizing relevant frames in web videos using topic model and relevance filtering

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Numerous web videos associated with rich metadata are available on the Internet today. While such metadata like video tags bring us facilitation and opportunities for video search and multimedia content understanding, some challenges also arise due to the fact that those video tags are usually annotated at the video level, while many tags actually only describe parts of the video content. How to localize the relevant parts or frames of web video for given tags is the key to many applications and research tasks. In this paper we propose combining topic model and relevance filtering to localize relevant frames. Our method is designed in three steps. First, we apply relevance filtering to assign relevance scores to video frames and a raw relevant frame set is obtained by selecting the top ranked frames. Then, we separate the frames into topics by mining the underlying semantics using latent Dirichlet allocation and use the raw relevance set as validation set to select relevant topics. Finally, the topical relevances are used to refine the raw relevant frame set and the final results are obtained. Experiment results on two real web video databases validate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hong, R., Tang, J., Tan, H.K., Ngo, C.W., Yan, S., Chua, T.S.: Beyond search: event-driven summarization for web videos. ACM Trans. Multimed. Comput. Commun. Appl. 7(4), 35:1–35:18 (2011)

    Article  Google Scholar 

  2. Wang, M., Ni, B., Hua, X.S., Chua, T.S.: Assistive tagging: a survey of multimedia tagging with human–computer joint exploration. ACM Comput. Surv. 44(4), 25:1–25:24 (2012)

    Article  Google Scholar 

  3. Wang, M., Yang, K., Hua, X.S., Zhang, H.J.: Towards a relevant and diverse search of social images. IEEE Trans. Multimed. 12(8), 829–842 (2010)

    Article  Google Scholar 

  4. Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)

    Article  Google Scholar 

  5. Ulges, A., Schulze, C., Keysers, D., Breuel, T.: Identifying relevant frames in weakly labeled videos for training concept detectors. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR ’08, pp. 9–16. ACM, New York, NY, USA (2008)

  6. Borth, D., Ulges, A., Breuel, T.M.: Relevance filtering meets active learning: improving web-based concept detectors. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR ’10, pp. 25–34. ACM, New York, NY, USA (2010)

  7. Tang, J., Zha, Z.J., Tao, D., Chua, T.S.: Semantic-gap-oriented active learning for multilabel image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)

    Article  MathSciNet  Google Scholar 

  8. Tang, J., Yan, S., Hong, R., Qi, G.J., Chua, T.S.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th ACM International Conference on Multimedia, MM ’09, pp. 223–232. ACM, New York, NY, USA (2009)

  9. Li, H., Yi, L., Guan, Y., Zhang, H.: DUT-WEBV: a benchmark dataset for performance evaluation of tag localization for web video. In: Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol. 7733, pp. 305–315. Springer, Berlin (2013)

  10. Ballan, L., Bertini, M., Del Bimbo, A., Meoni, M., Serra, G.: Tag suggestion and localization in user-generated videos based on social knowledge. In: Proceedings of second ACM SIGMM workshop on Social media, WSM ’10, pp. 3–8. ACM, New York, NY, USA (2010)

  11. Tang, J., Hua, X.S., Wang, M., Gu, Z., Qi, G.J., Wu, X.: Correlative linear neighborhood propagation for video annotation. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 409–416 (2009)

    Article  Google Scholar 

  12. Tang, J., Hong, R., Yan, S., Chua, T.S., Qi, G.J., Jain, R.: Image annotation by knn-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2(2), 14:1–14:15 (2011)

    Article  Google Scholar 

  13. Ulges, A., Schulze, C., Breuel, T.: Multiple instance learning on weakly labeled videos. In: Workshop on Cross-Media Information Analysis, Extraction and Management. Springer, Berlin (2008)

  14. Zhang, M.L., Zhou, Z.H.: Improve multi-instance neural networks through feature selection. Neural Process. Lett. 19(1), 1–10 (2004)

    Article  MATH  Google Scholar 

  15. Li, G., Wang, M., Zheng, Y.T., Li, H., Zha, Z.J., Chua, T.S.: Shottagger: tag location for internet videos. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, pp. 37:1–37:8. ACM, New York, NY, USA (2011)

  16. Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)

    Article  Google Scholar 

  17. Shen, J., Cheng, Z.: Personalized video similarity measure. Multimed. Syst. 17(5), 421–433 (2011)

    Google Scholar 

  18. Wang, M., Hua, X.S., Tang, J., Hong, R.: Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11(3), 465–476 (2009)

    Google Scholar 

  19. Shen, J., Tao, D., Li, X.: Modality mixture projections for semantic video event detection. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1587–1596 (2008)

    Article  Google Scholar 

  20. Yanai, K.: Automatic web image selection with a probabilistic latent topic model. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 1237–1238. ACM, New York, NY, USA (2008)

  21. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Tenth IEEE International Conference on Computer Vision 2005, ICCV 2005, vol. 2, pp. 1816–1823. (2005)

  22. Yi, L., Li, H., Neo, S.Y.: Combining topic model and relevance filtering to localize relevant frames in web videos. In: Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol. 7733, pp. 206–216. Springer, Berlin (2013)

  23. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  24. Cai, X., Wang, H., Huang, H., Ding, C.: Simultaneous image classification and annotation via biased random walk on tri-relational graph. In: Proceedings of the 12th European Conference on Computer Vision—Volume Part VI, ECCV’12, pp. 823–836. Springer, Berlin (2012)

  25. Feng, Y., Lapata, M.: Topic models for image annotation and text illustration. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 831–839. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)

  26. Li, H., Wang, X., Tang, J., Zhao, C.: Combining global and local matching of multiple features for precise item image retrieval. Multimed. Syst. 19(1), 37–49 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Funds of China (61033012, 61173104,61202133).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haojie Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Yi, L., Liu, B. et al. Localizing relevant frames in web videos using topic model and relevance filtering. Machine Vision and Applications 25, 1661–1670 (2014). https://doi.org/10.1007/s00138-013-0537-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-013-0537-6

Keywords

Navigation