ABSTRACT
Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.
Supplemental Material
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarDigital Library
- M. Campbell and et al. Ibm research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google Scholar
- S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google Scholar
- N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University, 2000. Google ScholarDigital Library
- S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. In PAKDD, 2004.Google ScholarCross Ref
- A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation Online Proceedings, 2004.Google Scholar
- A. G. Hauptmann and et al. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google Scholar
- W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.Google Scholar
- D. Marr. Vision. W. H. Freeman and Company, 1982.Google Scholar
- M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002. Google ScholarDigital Library
- M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.Google ScholarCross Ref
- M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505-104), 2005.Google Scholar
- K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google Scholar
- X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.Google Scholar
- J. R. Smith and M. Naphade. Multimedia semantic indexing using model vectors. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2003. Google ScholarDigital Library
- C. Snoek and et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia, pages 421--430, Santa Barbara, USA, October 2006. Google ScholarDigital Library
- TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
- I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for intedependent and structured output spaces. In Proc. of Internatial Conference on ICML, 2004. Google ScholarDigital Library
- G. Winkler. Image analysis, random fields and dynamic Monte Carlo methods: A mathematical introduction. Springer-Verlag, Berlin, Heidelberg, 1995. Google ScholarDigital Library
- Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2004.Google ScholarCross Ref
- Y. Y. Yao. Entropy measures, maximum entropy principle, and emerging applications, chapter Information-theoretic measures for knowledge discovery and data mining, pages 115--136. Springer, 2003. Google ScholarDigital Library
Index Terms
- Correlative multi-label video annotation
Recommendations
Correlative multilabel video annotation with temporal kernels
Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each ...
Semi-supervised multi-instance multi-label learning for video annotation task
MM '12: Proceedings of the 20th ACM international conference on MultimediaTraditional approaches for automatic video annotation usually represent one video clip with a flat feature vector, neglecting the fact that video data contain natural structures. It is also noteworthy that a video clip is often relevant to multiple ...
Online multi-label active annotation: towards large-scale content-based video search
MM '08: Proceedings of the 16th ACM international conference on MultimediaExisting video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real ...
Comments