research-article

ShotTagger: tag location for internet videos

Authors:
Guangda Li

NUS Graduate School for Integrative Sciences and Engineering and National University of Singapore

NUS Graduate School for Integrative Sciences and Engineering and National University of Singapore
View Profile

,
Meng Wang

National University of Singapore

National University of Singapore
View Profile

,
Yan-Tao Zheng

National University of Singapore

National University of Singapore
View Profile

,
Haojie Li

Dalian University of Technology

Dalian University of Technology
View Profile

,
Zheng-Jun Zha

National University of Singapore

National University of Singapore
View Profile

,
Tat-Seng Chua

National University of Singapore

National University of Singapore
View Profile

ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia RetrievalApril 2011Article No.: 37Pages 1–8https://doi.org/10.1145/1991996.1992033

Published:18 April 2011Publication History

ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

Pages 1–8

ABSTRACT

Social video sharing websites allow users to annotate videos with descriptive keywords called tags, which greatly facilitate video search and browsing. However, many tags only describe part of the video content, without any temporal indication on when the tag actually appears. Currently, there is very little research on automatically assigning tags to shot-level segments of a video. In this paper, we leverage user's tags as a source to analyze the content within the video and develop a novel system named ShotTagger to assign tags at the shot level. There are two steps to accomplish the location of tags at shot level. The first is to estimate the distribution of tags within the video, which is based on a multiple instance learning framework. The second is to perform the semantic correlation of a tag with other tags in a video in an optimization framework and impose the temporal smoothness across adjacent video shots to refine the tagging results at shot level. We present different applications to demonstrate the usefulness of the tag location scheme in searching, and browsing of videos. A series of experiments conducted on a set of Youtube videos has demonstrated the feasibility and effectiveness of our approach.

References

Google multimedia research interest: http://googleresearch.blogspot.com/2009/12/research-areas-of-interest-multimedia.html/.Google Scholar
Trec video retrieval evaluation: http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
Youtube video: http://www.youtube.com/.Google Scholar
D. Borth, J. Hees, M. Koch, A. Ulges, C. Schulze, T. Breuel, and R. Paredes. Tubefiler: an automatic web video categorizer. In MM '09: Proceedings of the seventeen ACM international conference on Multimedia, pages 1111--1112, 2009. Google ScholarDigital Library
K.-Y. Cheng, S.-J. Luo, B.-Y. Chen, and H.-H. Chu. Smartplayer: user-centric video fast-forwarding. In CHI '09: Proceedings of the 27th international conference on Human factors in computing systems, 2009. Google ScholarDigital Library
H. Feng, A. Chandrashekhara, and T.-S. Chua. Atmra: An automatic temporal multi-resolution analysis framework for shot boundary detection. In MMM, 2003.Google Scholar
M. B. G. Schindler, L. Zitnick. Internet video category recognition. In IEEE Workshop on Internet Vision, 2008.Google ScholarCross Ref
W. Jiang, C. Cotton, S.-F. Chang, D. Ellis, and A. Loui. Short-term audio-visual atoms for generic video concept classification. In MM '09: Proceedings of the seventeen ACM international conference on Multimedia, 2009. Google ScholarDigital Library
Y. Jun and H. Alex. Exploring temporal consistency for video retrieval and analysis. In MIR, 2006.Google Scholar
L. Kennedy. Revision of LSCOM Event/Activity Annotations, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia. Technical report, Columbia University, December 2006.Google Scholar
Y. Linjun, Y. Yichen, and H. Xian-Sheng. Smart video player. In IEEE Conference on Multimedia and Expo (ICME), 2008.Google ScholarCross Ref
J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos "in the wild". In IEEE International Conference on Computer Vision and Pattern Recognition, 2009.Google ScholarCross Ref
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 2004. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
O. Maron and T. Lozano-Pérez. A framework for multiple-instance learning. In NIPS '97: Proceedings of the 1997 conference on Advances in neural information processing systems 10, 1998. Google ScholarDigital Library
W. Meng, Y. Kuiyuan, H. Xiansheng, and Z. Hong-Jiang. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia, 12(8):829--842, 2010. Google ScholarDigital Library
W. Meng and H. Xian-Sheng. Active learning in multimedia annotation and retrieval: A survey. ACM Transactions on Intelligent Systems and Technology. Google ScholarDigital Library
W. Meng, H. Xian-Sheng, H. Richang, T. Jinhui, and S. Yan. Unified video annotation via multi-graph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19(5):733--746, 2009. Google ScholarDigital Library
W. Meng, H. XianSsheng, T. Jinhui, and H. Richang. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11(3):465--476, 2009. Google ScholarDigital Library
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR '06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006. Google ScholarDigital Library
Z. Qi and S. A. Goldman. Em-dd: An improved multiple-instance learning technique. In In Advances in Neural Information Processing Systems, pages 1073--1080. MIT Press, 2001.Google Scholar
R. Rahmani, S. A. Goldman, H. Zhang, J. Krettek, and J. E. Fritts. Localized content based image retrieval. In MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, 2005. Google ScholarDigital Library
S. Ray and M. Craven. Supervised versus multiple instance learning: An empirical comparison. In Proceedings of 22nd International Conference on Machine Learning (ICML), pages 697--704. ACM Press, 2005. Google ScholarDigital Library
B. Settles, M. Craven, and S. Ray. Multiple-instance active learning. In In Advances in Neural Information Processing Systems (NIPS), pages 1289--1296. MIT Press, 2008.Google Scholar
S. Siersdorfer, J. San Pedro, and M. Sanderson. Automatic video tagging using content redundancy. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009. Google ScholarDigital Library
B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 327--336, 2008. Google ScholarDigital Library
A. Ulges, C. Schulze, D. Keysers, and T. Breuel. Identifying relevant frames in weakly labeled videos for training concept detectors. In Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, 2008. Google ScholarDigital Library
A. Ulges, C. Schulze, M. Koch, and T. M. Breuel. Learning automatic concept detectors from online video. Comput. Vis. Image Underst., 114(4):429--438, 2010. Google ScholarDigital Library
W. Xiao, Z. Wan-Lei, and N. Chong-Wah. Towards google challenge: combining contextual and social information for web video categorization. In MM '09: Proceedings of the seventeen ACM international conference on Multimedia, 2009. Google ScholarDigital Library
L. Yang, J. Liu, X. Yang, and X.-S. Hua. Multi-modality web video categorization. In MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval, pages 265--274, 2007. Google ScholarDigital Library
Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition, pages 01--08, 2008.Google Scholar
Z.-J. Zha, T. Mei, J. Wang, Z. Wang, and X.-S. Hua. Graph-based semi-supervised learning with multiple labels. Journal of Visual Communication and Image Representation, 20:97--103, 2009. Google ScholarDigital Library
Z.-J. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. Visual query suggestion. In Proceedings of the ACM international conference on Multimedia, pages 15--24, 2009. Google ScholarDigital Library

Index Terms

ShotTagger: tag location for internet videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Tag suggestion and localization in user-generated videos based on social knowledge
WSM '10: Proceedings of second ACM SIGMM workshop on Social media

Nowadays, almost any web site that provides means for sharing user-generated multimedia content, like Flickr, Facebook, YouTube and Vimeo, has tagging functionalities to let users annotate the material that they want to share. The tags are then used to ...
Read More
Tag Suggestr: Automatic Photo Tag Expansion Using Visual Information for Photo Sharing Websites
SAMT '08: Proceedings of the 3rd International Conference on Semantic and Digital Media Technologies: Semantic Multimedia

In this paper, we propose an automatic photo tag expansion system for the community photo collections, such as Flickr. Our aim is to suggest relevant tags for a target photograph uploaded to the system by a user, by incorporating the visual and textual ...
Read More
Automatic tag expansion using visual similarity for photo sharing websites

In this paper we present an automatic photo tag expansion method designed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval
April 2011
512 pages
ISBN:9781450303361
DOI:10.1145/1991996
General Chairs:
Francesco G. B. De Natale
University of Trento, Italy
,
Alberto Del Bimbo
University of Florence, Italy
,
Program Chairs:
Alan Hanjalic
University of Amsterdam, Netherlands
,
B. S. Manjunath
University of California, Santa Barbara
,
Shin'ichi Satoh
NII, Japan
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
internet video tagging
tag-based video browsing
tag-based video search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 343
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ShotTagger: tag location for internet videos

ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Tag suggestion and localization in user-generated videos based on social knowledge

Tag Suggestr: Automatic Photo Tag Expansion Using Visual Information for Photo Sharing Websites

Automatic tag expansion using visual similarity for photo sharing websites