ABSTRACT
We propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots in a video clip are independent from each other. We model the time-dependency between them assuming that n-consecutive video shots are dependent. Our models improve the robustness against occlusion and camera-angle changes by effectively using information from the previous video shots. In our experiments on the TRECVID 2012 Semantic Indexing Benchmark, we applied the proposed models to a system using Gaussian mixture models and support vector machines. Mean average precision was improved from 30.62% to 32.14%, which is the best performance on the TRECVID 2012 Semantic Indexing to the best of our knowledge.
- A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain. Content-based image retrieval at the end of the early years. In IEEE Trans. on PAMI, vol.22, no.12, pp.1349--1380, 2000. Figure 4: Comparison of our methods with TRECVID 2012 Semantic Indexing Submissions. Mean AP of the best submission was 32.10%. Google ScholarDigital Library
- G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. Proc. ECCV SLCV workshop, pages 59--74, 2004.Google Scholar
- F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. Proc. ECCV, pages 464--475, 2006. Google ScholarDigital Library
- N. Inoue, and K. Shinoda. A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors. In IEEE Trans. on Multimedia, vol.14, no.4, pages 1196--1205, 2012.Google ScholarDigital Library
- F. Perronnin, S. Jorge, and T. Mensink. Improving the fisher kernel for large-scale image classification. Proc. ECCV, pages 143--156, 2010. Google ScholarDigital Library
- P. Over, et al. TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics. Proc. TRECVID workshop, 2013.Google Scholar
- C.G.M. Snoek, et al. The MediaMill TRECVID 2012 Semantic Video Search Engine. Proc. TRECVID workshop, 2012.Google Scholar
- N. Inoue, et al., Semantic Indexing Using GMM Supervectors and Tree-structured GMMs (TokyoTech+Canon at TRECVID 2011). Proc. TRECVID workshop, 2011.Google Scholar
- R. Ando, K. Shinoda, S. Furui, and T. Mochizuki. Robust scene Recognition Using Language Models for Scene Contexts. Proc. ACM MIR workshop, pp. 99--106, 2004. Google ScholarDigital Library
- H. Kuehne, A. Arslan, and T. Serre, The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities Proc. CVPR, 2014. Google ScholarDigital Library
- A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. Proc. ACM MIR workshop, pp.321--330, 2006. Google ScholarDigital Library
- A. F. Smeaton, P. Over, and W. Kraaij. High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. In Multimedia Content Analysis, Theory and Applications, Springer Verlag, pp.151--174, 2009.Google Scholar
- S. Ayache, and G. Quéenot. Video Corpus Annotation using Active Learning. Proc. ECIR, pp.187--198, 2008. Google ScholarDigital Library
Index Terms
- n-gram Models for Video Semantic Indexing
Recommendations
An ontology-based retrieval system using semantic indexing
In this paper, we present an ontology-based information extraction and retrieval system and its application in the soccer domain. In general, we deal with three issues in semantic search, namely, usability, scalability and retrieval performance. We ...
Multi modal semantic indexing for image retrieval
CIVR '10: Proceedings of the ACM International Conference on Image and Video RetrievalPopular image retrieval schemes generally rely only on a single mode, (either low level visual features or embedded text) for searching in multimedia databases. Many popular image collections (eg. those emerging over Internet) have associated tags, ...
Multi-modal CBIR Algorithm Based on Latent Semantic Indexing
ICIW '10: Proceedings of the 2010 Fifth International Conference on Internet and Web Applications and ServicesThe paper presents a new multiple feature fusion (MFF) based on latent semantic indexing (LSI) method to achieve an improved image retrieval performance. The proposed method extracts different physical features, which come from not the whole image but ...
Comments