On the tag localization of web video

Li, Haojie; Liu, Bin; Yi, Lei; Guan, Yue; Luo, Zhong-Xuan

doi:10.1007/s00530-014-0404-y

On the tag localization of web video

Special Issue Paper
Published: 07 August 2014

Volume 22, pages 405–412, (2016)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Haojie Li¹,
Bin Liu¹,
Lei Yi¹,
Yue Guan¹ &
…
Zhong-Xuan Luo¹

380 Accesses
1 Citation
Explore all metrics

Abstract

Nowadays, numerous social videos have pervaded on the web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally, those contextual data such as tags are provided at the whole video level, without temporal indication of when they actually appear in the video, let alone the spatial annotation of object related tags in the video frames. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames even regions in frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains about 4,000 videos collected from YouTube portal by issuing 50 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like “mountain”, events like “flood”, objects like “cows”, sites like “gas station”, and activities like “handshaking”, offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video and also label the spatial location of object with mask in frames for object related tag. Besides the video itself, the contextual information, such as thumbnail images, titles, and YouTube categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning with Noisy Correspondence

Article 13 April 2024

Learning to Prompt for Vision-Language Models

Article 31 July 2022

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Article 04 April 2024

References

Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive tagging: a survey of multimedia tagging with human–computer joint exploration. ACM Comput. Surv. 44(4), 25 (2012). http://dl.acm.org/citation.cfm?doid=2333112.2333120
Article Google Scholar
Gao, Y., Wang, M., Zha, Z., Shen, J., Li, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Article MathSciNet Google Scholar
Ulges, A., Schulze, C., Breuel, T.: Identifying Relevant frames in weakly labeled videos for training concept detectors. ACM CIVR (2008)
Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning actions from the web. International Conference on Computer Vision (2009)
Li, G., Wang, M., Zheng, Y.-T., Li, H., Zha, Z.-J., Chua, T.-S.: ShotTagger: tag location for internet videos. In: ICMR (2011)
Gao, Y., Wang, W.-B., Yong, J.-H., Gu, H.-J.: Dynamic video summarization using two-level redundancy detection. Multimed. Tools Appl. 42(2), 233–250 (2009)
Article Google Scholar
Hong, R., Tang, J., Tan, H.-K., Ngo, C.-W., Yan, S., Chua, T-S.: Beyond search: event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011)
Article Google Scholar
Tang, J., Zha, Z.-J., Tao, D., Chua, T.-S.: Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans. Image Process. 21(4), 2354–2360 (2012)
Article MathSciNet Google Scholar
Tang, S., Zheng, Y.-T., Wang, Y., Chua, T.-S.: Sparse ensemble learning for concept detection. IEEE Trans. Multimed. 14(1), 43–54 (2012)
Article Google Scholar
Ballan, L., Bertini, M., Del Bimbo, A. et al.: Tag suggestion and localization in user-generated videos based on social knowledge. In: Proceedings of the 2nd ACM SIGMM International Workshop on Social Media (2010)
Ballan, L., Bertini, M., Del Bimbo, A., Serra, G.: Enriching and localizing semantic tags in internet videos. ACM Multimedia (2011)
Chu, W.-T., Li, C.-J. Chou, Y.-K.: Tag suggestion and localization for web videos by bipartite graph matching. ACM SIGMM Workshop on Social Media (2011)
Wang, M., Hong, R., Li, G., Yan, S., Chua, T.-S.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)
Article Google Scholar
Ulges, A., Schulze, C., Breuel, T.: Multiple instance learning from weakly labeled videos. SAMT Workshop on Cross-Media Information Analysis and Retrieval (2008)
Yang, Y., Yang, Y., Huang, Z., Shen, H.T., Nie, F.: Tag localization with spatial correlations and joint group sparsity. CVPR (2011)
Yang, Y., Huang, Z., Yang, Y., Liu, J., Shen, H.T., Luo, J.: Local image tagging via graph regularized joint group sparsity. Pattern Recognit. 46(5), 1358–1368 (2013)
Article MATH Google Scholar
TRECVid evaluation (2013). http://www-nlpir.nist.gov/projects/tv2013/
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. Computer Vision and Pattern Recognition (CVPR) (2012)
Siva, P., Russell, C., Xiang, T.: In defence of negative mining for annotating weakly labelled data. In: ECCV (2012)
Tang K., Sukthankar, R., Yagnik J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Naphade, M., Smith, J.R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-scale concept ontology for multimedia. IEEE Multimed. 13, 86–91 (2006)
Article Google Scholar
Jiang, Y.-G., Ye, G., Chang, S.-F., Ellis, D.P.W., Loui A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR (2011)
Cao, J., Zhang, Y., Song Y., Chen, Z., Zhang, X., Li, J.: MCG-WEBV: A benchmark dataset for web video analysis. Technical Report, ICT-MCG-09-001, Institute of Computing Technology, May 2009
Ulges A., Schulze C., Keysers, D.: A system that learns to tag videos by watching YouTube. Thomas Breuel International Conference on Computer Vision Systems (2008)
Brox T., Malik, J.: Object segmentation by long term analysis of point trajectories. European Conference on Computer Vision (ECCV) (2010)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1), 157–173 (2008)
Article Google Scholar
Tang, J., Li, H., Qi, G.-J., Chua, T.-S.: Image annotation by graph-based inference with integrated multiple/single instance representations. IEEE Trans. Multimed. 12(2), 131–141 (2010)
Article Google Scholar
Zhang, M.-L., Zhou, Z.-H.: Improve multi-instance neural networks through feature selection. Neural Process Lett. 19(1), 1–10 (2004)
Article Google Scholar
Li, H., Wang, X., Tang, J., Zhao, C.: Combining global and local matching of multiple features for precise item image retrieval. Multimed. Syst. 19(1), 37–49 (2013)
Article Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)
Shen, J., Tao, D., Li, X.: Modality mixture projections for semantic video event detection. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1587–1596 (2008)
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Funds of China (61033012, 61173104, 61300085) and the Fundamental Research Funds for the Central Universities (DUT13JR03, DUT14QY03).

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, China
Haojie Li, Bin Liu, Lei Yi, Yue Guan & Zhong-Xuan Luo

Authors

Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yue Guan
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Xuan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Liu, B., Yi, L. et al. On the tag localization of web video. Multimedia Systems 22, 405–412 (2016). https://doi.org/10.1007/s00530-014-0404-y

Download citation

Published: 07 August 2014
Issue Date: July 2016
DOI: https://doi.org/10.1007/s00530-014-0404-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the tag localization of web video

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Learning to Prompt for Vision-Language Models

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the tag localization of web video

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Learning to Prompt for Vision-Language Models

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation