Article

Correlative multi-label video annotation

Authors:
Guo-Jun Qi

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Xian-Sheng Hua

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Yong Rui

Microsoft China R&D Group, Beijing, China

Microsoft China R&D Group, Beijing, China
View Profile

,
Jinhui Tang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Tao Mei

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Hong-Jiang Zhang

Microsoft Research Advanced Technology Center, Beijing, China

Microsoft Research Advanced Technology Center, Beijing, China
View Profile

MM '07: Proceedings of the 15th ACM international conference on MultimediaSeptember 2007Pages 17–26https://doi.org/10.1145/1291233.1291245

Published:29 September 2007Publication History

MM '07: Proceedings of the 15th ACM international conference on Multimedia

Pages 17–26

ABSTRACT

Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.

Supplemental Material

p17-27_150k.mp4

mp4

78.6 MB

Download

p17-27_768k.mp4

mp4

281.2 MB

Download

References

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarDigital Library
M. Campbell and et al. Ibm research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google Scholar
S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google Scholar
N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University, 2000. Google ScholarDigital Library
S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. In PAKDD, 2004.Google ScholarCross Ref
A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation Online Proceedings, 2004.Google Scholar
A. G. Hauptmann and et al. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google Scholar
W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.Google Scholar
D. Marr. Vision. W. H. Freeman and Company, 1982.Google Scholar
M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002. Google ScholarDigital Library
M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.Google ScholarCross Ref
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505-104), 2005.Google Scholar
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google Scholar
X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.Google Scholar
J. R. Smith and M. Naphade. Multimedia semantic indexing using model vectors. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2003. Google ScholarDigital Library
C. Snoek and et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia, pages 421--430, Santa Barbara, USA, October 2006. Google ScholarDigital Library
TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for intedependent and structured output spaces. In Proc. of Internatial Conference on ICML, 2004. Google ScholarDigital Library
G. Winkler. Image analysis, random fields and dynamic Monte Carlo methods: A mathematical introduction. Springer-Verlag, Berlin, Heidelberg, 1995. Google ScholarDigital Library
Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2004.Google ScholarCross Ref
Y. Y. Yao. Entropy measures, maximum entropy principle, and emerging applications, chapter Information-theoretic measures for knowledge discovery and data mining, pages 115--136. Springer, 2003. Google ScholarDigital Library

Index Terms

Correlative multi-label video annotation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Correlative multilabel video annotation with temporal kernels

Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each ...
Read More
Semi-supervised multi-instance multi-label learning for video annotation task
MM '12: Proceedings of the 20th ACM international conference on Multimedia

Traditional approaches for automatic video annotation usually represent one video clip with a flat feature vector, neglecting the fact that video data contain natural structures. It is also noteworthy that a video clip is often relevant to multiple ...
Read More
Online multi-label active annotation: towards large-scale content-based video search
MM '08: Proceedings of the 16th ACM international conference on Multimedia

Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '07: Proceedings of the 15th ACM international conference on Multimedia
September 2007
1115 pages
ISBN:9781595937025
DOI:10.1145/1291233
General Chairs:
Rainer Lienhart
University of Augsburg, Germany
,
Anand R. Prasad
DoCoMo Euro-Labs,Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, The Netherlands
,
Sunghyun Choi
Seoul National University, South Korea
,
Brian Bailey
University of Illinois at Urbana-Champaign
,
Nicu Sebe
University of Amsterdam, The Netherlands
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concept correlation
multi-labeling
video annotation
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 344
  Total Citations
  View Citations
- 2,071
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Correlative multi-label video annotation

MM '07: Proceedings of the 15th ACM international conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Correlative multilabel video annotation with temporal kernels

Semi-supervised multi-instance multi-label learning for video annotation task

Online multi-label active annotation: towards large-scale content-based video search