ABSTRACT
This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. In the context of relevance feedback, instead of learning the generative probability distribution over all features of the data, we learn it only over the top retrieved results. Hence during relevance feedback we create a new Fisher Kernel representation based on the most relevant examples. In addition, we propose to use the Fisher Kernel to capture temporal information by cutting up a video in smaller segments, extract a feature vector from each segment, and represent the resulting feature set using the Fisher Kernel representation. We evaluate our method on the MediaEval 2012 Video Genre Tagging Task, a large dataset, which contains 26 categories in 15.000 videos totalling up to 2.000 hours of footage. Results show that our method significantly improves results over existing state-of-the-art relevance feedback techniques. Furthermore, we show significant improvements by using the Fisher Kernel to capture temporal information, and we demonstrate that Fisher kernels are well suited for this task.
- A. W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain: "Content-based Image Retrieval at the End of the Early years",IEEE Trans. PAMI, 2000. Google ScholarDigital Library
- T. Jaakkola, D. Haussler:"Exploiting generative models in discriminative classifiers",In Advances in Neural Information Processing Systems 1999. Google ScholarDigital Library
- F. Perronnin, J. Sanchez, T. Mensink:"Improving the Fisher Kernel for Large-Scale Image Classification",ECCV, 2010. Google ScholarDigital Library
- F. Perronnin, J.A. Rodriguez-Serrano,"Fisher Kernels for Handwritten Word-spotting",10th International Conference on Document Analysis and RecognitionPages 106--110, 2009. Google ScholarDigital Library
- P. Moreno and R. Rifkin."Using the Fisher kernel method for web audio classification",International Conference on Acoustics, Speech, and Signal Processing, pages 2417--2420, 2000. Google ScholarDigital Library
- ttp://www.multimediaeval.org/mediaeval2012/Google Scholar
- . F. Smeaton, P. Over, W. Kraaij:"High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements",Springer Series on Multimedia Content Analysis Theory and Applications, pp. 151--174, 2009.Google Scholar
- http://trec.nist.govGoogle Scholar
- . Rocchio:"Relevance Feedback in Information Retrieval",The Smart Retrieval System Experiments in Automatic Document Processing, G. Salton (Ed.),Prentice Hall, Englewood Cliffs NJ, pp. 313--323, 1971.Google Scholar
- . V. Nguyen, J.-M. Ogier, S. Tabbone, A. Boucher:"Text Retrieval Relevance Feedback Techniques for Bag-of-Words Model in CBIR",ICMLPR, 2009.Google Scholar
- . Rui, T. S. Huang, M. Ortega, M. Mehrotra, S. Beckman:"Relevance feedback: a power tool for interactive content-based image retrieval",IEEE Transactions on Circuits and Video Technology, 1998. %pp. 644--655, 1998. Google ScholarDigital Library
- . Liang, Z. Sun:"Sketch retrieval and relevance feedback with biased SVM classification",Pattern Recognition Letters, 29, pp. 1733--1741, 2008. Google ScholarDigital Library
- . Giacinto:"A Nearest-Neighbor Approach to Relevance Feedback in Content-Based Image Retrieval",ACM Confenference on Image and Video Retrieval, 2007. Google ScholarDigital Library
- . Yu, Y. Lu, Y. Xu, N. Sebe, Q. Tian:"Integrating Relevance Feedback in Boosting for Content-Based Image Retrieval",ASSP, 2007.Google Scholar
- . Wu, A. Zhang:"Interactive pattern analysis for relevance feedback in multimedia information retrieval",Multimedia Systems, 10(1), pp. 41--55, 2004.Google ScholarDigital Library
- . Yuanhua Lv, C. Zhai:"Adaptive Relevance Feedback in Information Retrieval",Information and Knowledge Management Conference, 2009. Google ScholarDigital Library
- . Bian, D. Tao:"Biased discriminant euclidean embedding for content-based image retrieval",IEEE Trans. Image Process., 545--554, 2010. Google ScholarDigital Library
- . Tao, X. Li, S. Maybank:"Negative samples analysis in relevance feedback"IEEE Trans. Knowl. Data Eng., 568--580, 2010. Google ScholarDigital Library
- G. Hauptmann, M. G. Christel, and R. Yan:"Video retrieval based on semantic concepts",Proceedings of the IEEE, vol. 96, pp. 602--622, 2008.Google ScholarCross Ref
- T. Mei, B. Yang, X. Hua, S. Li:"Contextual Video Recommendation by Multimodal Relevance and User Feedback",Information Systems (TOIS), 2011. Google ScholarDigital Library
- B. Ionescu, K. Seyerlehner, I. Mironica, C. Vertan, P. Lambert:"An Audio-Visual Approach to Web Video Categorization",MTAP, 2012.%metricsGoogle Scholar
- I. Mironica, B. Ionescu, C. Vertan:"The influence of the similarity measure to relevance feedback",in Proceedings of the European Signal Processing Conference, Eusipco 2012.Google Scholar
- .H. Cha:"Comprehensive Survey on Distance/Similarity Measures Between Probability Density Functions",Int. Journal of Mathematical Models and Methods in Applied Sciences, 2007.% pp. 300--307, 2007.Google Scholar
- . Rubner, C. Tomasi, L. J. Guibas:"A Metric for Distributions with Applications to Image Databases", European Conference on Computer Vision,1998. Google ScholarDigital Library
- . Deza, M.M. Deza:"Dictionary of Distances",Elsevier Science, 1st edition, 2006.Google Scholar
- . Hatzigiorgaki, A. N. Skodras:"Compressed Domain Image Retrieval: A Comparative Study of Similarity Metrics", SPIE Visual Communications and Image Processing, vol. 5150, 2003.Google Scholar
- . Kelm, S. Schmiedeke, T. Sikora,"Feature-based video key frame extraction for low quality video sequences",WIAMIS, 2009.Google Scholar
- K. Seyerlehner, M. Schedl, T. Pohle, P. Knees:"Using Block Level Features for Genre Classification, Tag Classification and Music Similarity Estimation",Music Information Retrieval Evaluation eXchange, 2010.Google Scholar
- . Liu, L. Xie, H. Meng:"Classification of music and speech in mandarin news broadcasts", Conf. on Machine Speech Communication 2007.Google Scholar
- aafe core features,http://yaafe.sourceforge.net/Google Scholar
- . Sikora:"The MPEG-7 Visual Standard for Content Description - An Overview",IEEE Transactions on Circuits and Systems for Video Technology, 2001. Google ScholarDigital Library
- . Ludwig, D. Delgado, V. Goncalves, U. Nunes:"Trainable Classifier-Fusion Schemes: An Application To Pedestrian Detection",IEEE Int. Conference On Intelligent Transportation Systems, 1, pp. 432--437, 2009.Google Scholar
- . Rasche:"An Approach to the Parameterization of Structure for Fast Categorization",Int. Journal of Computer Vision, 87(3), pp. 337--356, 2010. Google ScholarDigital Library
- S. Nowak, M. Huiskes:"New strategies for image annotation: Overview of the photo annotation task at ImageClef 2010",In the Working Notes of CLEF 2010.Google Scholar
- L. Lamel, J.-L. Gauvain:"Speech Processing for Audio Indexing",Int. Conf. on Natural Language Processing, LNCS, 5221, pp. 4--15, Springer Verlag, 2008. Google ScholarDigital Library
Index Terms
- Fisher kernel based relevance feedback for multimodal video retrieval
Recommendations
Fisher Kernel Temporal Variation-based Relevance Feedback for video retrieval
We proposed a novel framework for Relevance Feedback based on the Fisher Kernel.The Fisher Kernel representation makes possible to capture temporal variation by using frame-based features.We experiment on a high variety of scenarios and public datasets (...
Multimodal retrieval with relevance feedback based on genetic programming
This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal)...
Image retrieval based on indexing and relevance feedback
In content based image retrieval (CBIR) system, search engine retrieves the images similar to the query image according to a similarity measure. It should be fast enough and must have a high precision of retrieval. Indexing scheme is used to achieve a ...
Comments