Localizing relevant frames in web videos using topic model and relevance filtering

Li, Haojie; Yi, Lei; Liu, Bin; Wang, Yi

doi:10.1007/s00138-013-0537-6

Localizing relevant frames in web videos using topic model and relevance filtering

Special Issue Paper
Published: 18 August 2013

Volume 25, pages 1661–1670, (2014)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Haojie Li¹,
Lei Yi¹,
Bin Liu¹ &
…
Yi Wang¹

422 Accesses
9 Citations
Explore all metrics

Abstract

Numerous web videos associated with rich metadata are available on the Internet today. While such metadata like video tags bring us facilitation and opportunities for video search and multimedia content understanding, some challenges also arise due to the fact that those video tags are usually annotated at the video level, while many tags actually only describe parts of the video content. How to localize the relevant parts or frames of web video for given tags is the key to many applications and research tasks. In this paper we propose combining topic model and relevance filtering to localize relevant frames. Our method is designed in three steps. First, we apply relevance filtering to assign relevance scores to video frames and a raw relevant frame set is obtained by selecting the top ranked frames. Then, we separate the frames into topics by mining the underlying semantics using latent Dirichlet allocation and use the raw relevance set as validation set to select relevant topics. Finally, the topical relevances are used to refine the raw relevant frame set and the final results are obtained. Experiment results on two real web video databases validate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web Videos

Tracking topic evolution via salient keyword matching with consideration of semantic broadness for Web video discovery

Article 06 December 2017

Heterogeneous Information Fusion based Topic Detection from Social Media Data

Article 07 September 2022

References

Hong, R., Tang, J., Tan, H.K., Ngo, C.W., Yan, S., Chua, T.S.: Beyond search: event-driven summarization for web videos. ACM Trans. Multimed. Comput. Commun. Appl. 7(4), 35:1–35:18 (2011)
Article Google Scholar
Wang, M., Ni, B., Hua, X.S., Chua, T.S.: Assistive tagging: a survey of multimedia tagging with human–computer joint exploration. ACM Comput. Surv. 44(4), 25:1–25:24 (2012)
Article Google Scholar
Wang, M., Yang, K., Hua, X.S., Zhang, H.J.: Towards a relevant and diverse search of social images. IEEE Trans. Multimed. 12(8), 829–842 (2010)
Article Google Scholar
Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)
Article Google Scholar
Ulges, A., Schulze, C., Keysers, D., Breuel, T.: Identifying relevant frames in weakly labeled videos for training concept detectors. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR ’08, pp. 9–16. ACM, New York, NY, USA (2008)
Borth, D., Ulges, A., Breuel, T.M.: Relevance filtering meets active learning: improving web-based concept detectors. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR ’10, pp. 25–34. ACM, New York, NY, USA (2010)
Tang, J., Zha, Z.J., Tao, D., Chua, T.S.: Semantic-gap-oriented active learning for multilabel image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)
Article MathSciNet Google Scholar
Tang, J., Yan, S., Hong, R., Qi, G.J., Chua, T.S.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th ACM International Conference on Multimedia, MM ’09, pp. 223–232. ACM, New York, NY, USA (2009)
Li, H., Yi, L., Guan, Y., Zhang, H.: DUT-WEBV: a benchmark dataset for performance evaluation of tag localization for web video. In: Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol. 7733, pp. 305–315. Springer, Berlin (2013)
Ballan, L., Bertini, M., Del Bimbo, A., Meoni, M., Serra, G.: Tag suggestion and localization in user-generated videos based on social knowledge. In: Proceedings of second ACM SIGMM workshop on Social media, WSM ’10, pp. 3–8. ACM, New York, NY, USA (2010)
Tang, J., Hua, X.S., Wang, M., Gu, Z., Qi, G.J., Wu, X.: Correlative linear neighborhood propagation for video annotation. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 409–416 (2009)
Article Google Scholar
Tang, J., Hong, R., Yan, S., Chua, T.S., Qi, G.J., Jain, R.: Image annotation by knn-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2(2), 14:1–14:15 (2011)
Article Google Scholar
Ulges, A., Schulze, C., Breuel, T.: Multiple instance learning on weakly labeled videos. In: Workshop on Cross-Media Information Analysis, Extraction and Management. Springer, Berlin (2008)
Zhang, M.L., Zhou, Z.H.: Improve multi-instance neural networks through feature selection. Neural Process. Lett. 19(1), 1–10 (2004)
Article MATH Google Scholar
Li, G., Wang, M., Zheng, Y.T., Li, H., Zha, Z.J., Chua, T.S.: Shottagger: tag location for internet videos. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, pp. 37:1–37:8. ACM, New York, NY, USA (2011)
Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)
Article Google Scholar
Shen, J., Cheng, Z.: Personalized video similarity measure. Multimed. Syst. 17(5), 421–433 (2011)
Google Scholar
Wang, M., Hua, X.S., Tang, J., Hong, R.: Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11(3), 465–476 (2009)
Google Scholar
Shen, J., Tao, D., Li, X.: Modality mixture projections for semantic video event detection. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1587–1596 (2008)
Article Google Scholar
Yanai, K.: Automatic web image selection with a probabilistic latent topic model. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 1237–1238. ACM, New York, NY, USA (2008)
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Tenth IEEE International Conference on Computer Vision 2005, ICCV 2005, vol. 2, pp. 1816–1823. (2005)
Yi, L., Li, H., Neo, S.Y.: Combining topic model and relevance filtering to localize relevant frames in web videos. In: Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol. 7733, pp. 206–216. Springer, Berlin (2013)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cai, X., Wang, H., Huang, H., Ding, C.: Simultaneous image classification and annotation via biased random walk on tri-relational graph. In: Proceedings of the 12th European Conference on Computer Vision—Volume Part VI, ECCV’12, pp. 823–836. Springer, Berlin (2012)
Feng, Y., Lapata, M.: Topic models for image annotation and text illustration. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 831–839. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)
Li, H., Wang, X., Tang, J., Zhao, C.: Combining global and local matching of multiple features for precise item image retrieval. Multimed. Syst. 19(1), 37–49 (2013)
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Funds of China (61033012, 61173104,61202133).

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, China
Haojie Li, Lei Yi, Bin Liu & Yi Wang

Authors

Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yi
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haojie Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Yi, L., Liu, B. et al. Localizing relevant frames in web videos using topic model and relevance filtering. Machine Vision and Applications 25, 1661–1670 (2014). https://doi.org/10.1007/s00138-013-0537-6

Download citation

Received: 11 February 2013
Revised: 28 June 2013
Accepted: 30 July 2013
Published: 18 August 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s00138-013-0537-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Localizing relevant frames in web videos using topic model and relevance filtering

Abstract

Access this article

Similar content being viewed by others

Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web Videos

Tracking topic evolution via salient keyword matching with consideration of semantic broadness for Web video discovery

Heterogeneous Information Fusion based Topic Detection from Social Media Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Localizing relevant frames in web videos using topic model and relevance filtering

Abstract

Access this article

Similar content being viewed by others

Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web Videos

Tracking topic evolution via salient keyword matching with consideration of semantic broadness for Web video discovery

Heterogeneous Information Fusion based Topic Detection from Social Media Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation