skip to main content
10.1145/1291233.1291245acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Correlative multi-label video annotation

Authors Info & Claims
Published:29 September 2007Publication History

ABSTRACT

Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.

Skip Supplemental Material Section

Supplemental Material

p17-27_150k.mp4

mp4

78.6 MB

p17-27_768k.mp4

mp4

281.2 MB

References

  1. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Campbell and et al. Ibm research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google ScholarGoogle Scholar
  3. S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google ScholarGoogle Scholar
  4. N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. In PAKDD, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation Online Proceedings, 2004.Google ScholarGoogle Scholar
  7. A. G. Hauptmann and et al. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.Google ScholarGoogle Scholar
  8. W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.Google ScholarGoogle Scholar
  9. D. Marr. Vision. W. H. Freeman and Company, 1982.Google ScholarGoogle Scholar
  10. M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505-104), 2005.Google ScholarGoogle Scholar
  13. K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google ScholarGoogle Scholar
  14. X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.Google ScholarGoogle Scholar
  15. J. R. Smith and M. Naphade. Multimedia semantic indexing using model vectors. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Snoek and et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia, pages 421--430, Santa Barbara, USA, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.Google ScholarGoogle Scholar
  18. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for intedependent and structured output spaces. In Proc. of Internatial Conference on ICML, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Winkler. Image analysis, random fields and dynamic Monte Carlo methods: A mathematical introduction. Springer-Verlag, Berlin, Heidelberg, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Y. Yao. Entropy measures, maximum entropy principle, and emerging applications, chapter Information-theoretic measures for knowledge discovery and data mining, pages 115--136. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Correlative multi-label video annotation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '07: Proceedings of the 15th ACM international conference on Multimedia
          September 2007
          1115 pages
          ISBN:9781595937025
          DOI:10.1145/1291233

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 September 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader