skip to main content
10.1145/2461466.2461478acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Fisher kernel based relevance feedback for multimodal video retrieval

Published: 16 April 2013 Publication History

Abstract

This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. In the context of relevance feedback, instead of learning the generative probability distribution over all features of the data, we learn it only over the top retrieved results. Hence during relevance feedback we create a new Fisher Kernel representation based on the most relevant examples. In addition, we propose to use the Fisher Kernel to capture temporal information by cutting up a video in smaller segments, extract a feature vector from each segment, and represent the resulting feature set using the Fisher Kernel representation. We evaluate our method on the MediaEval 2012 Video Genre Tagging Task, a large dataset, which contains 26 categories in 15.000 videos totalling up to 2.000 hours of footage. Results show that our method significantly improves results over existing state-of-the-art relevance feedback techniques. Furthermore, we show significant improvements by using the Fisher Kernel to capture temporal information, and we demonstrate that Fisher kernels are well suited for this task.

References

[1]
A. W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain: "Content-based Image Retrieval at the End of the Early years",IEEE Trans. PAMI, 2000.
[2]
T. Jaakkola, D. Haussler:"Exploiting generative models in discriminative classifiers",In Advances in Neural Information Processing Systems 1999.
[3]
F. Perronnin, J. Sanchez, T. Mensink:"Improving the Fisher Kernel for Large-Scale Image Classification",ECCV, 2010.
[4]
F. Perronnin, J.A. Rodriguez-Serrano,"Fisher Kernels for Handwritten Word-spotting",10th International Conference on Document Analysis and RecognitionPages 106--110, 2009.
[5]
P. Moreno and R. Rifkin."Using the Fisher kernel method for web audio classification",International Conference on Acoustics, Speech, and Signal Processing, pages 2417--2420, 2000.
[6]
ttp://www.multimediaeval.org/mediaeval2012/
[7]
. F. Smeaton, P. Over, W. Kraaij:"High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements",Springer Series on Multimedia Content Analysis Theory and Applications, pp. 151--174, 2009.
[8]
http://trec.nist.gov
[9]
. Rocchio:"Relevance Feedback in Information Retrieval",The Smart Retrieval System Experiments in Automatic Document Processing, G. Salton (Ed.),Prentice Hall, Englewood Cliffs NJ, pp. 313--323, 1971.
[10]
. V. Nguyen, J.-M. Ogier, S. Tabbone, A. Boucher:"Text Retrieval Relevance Feedback Techniques for Bag-of-Words Model in CBIR",ICMLPR, 2009.
[11]
. Rui, T. S. Huang, M. Ortega, M. Mehrotra, S. Beckman:"Relevance feedback: a power tool for interactive content-based image retrieval",IEEE Transactions on Circuits and Video Technology, 1998. %pp. 644--655, 1998.
[12]
. Liang, Z. Sun:"Sketch retrieval and relevance feedback with biased SVM classification",Pattern Recognition Letters, 29, pp. 1733--1741, 2008.
[13]
. Giacinto:"A Nearest-Neighbor Approach to Relevance Feedback in Content-Based Image Retrieval",ACM Confenference on Image and Video Retrieval, 2007.
[14]
. Yu, Y. Lu, Y. Xu, N. Sebe, Q. Tian:"Integrating Relevance Feedback in Boosting for Content-Based Image Retrieval",ASSP, 2007.
[15]
. Wu, A. Zhang:"Interactive pattern analysis for relevance feedback in multimedia information retrieval",Multimedia Systems, 10(1), pp. 41--55, 2004.
[16]
. Yuanhua Lv, C. Zhai:"Adaptive Relevance Feedback in Information Retrieval",Information and Knowledge Management Conference, 2009.
[17]
. Bian, D. Tao:"Biased discriminant euclidean embedding for content-based image retrieval",IEEE Trans. Image Process., 545--554, 2010.
[18]
. Tao, X. Li, S. Maybank:"Negative samples analysis in relevance feedback"IEEE Trans. Knowl. Data Eng., 568--580, 2010.
[19]
G. Hauptmann, M. G. Christel, and R. Yan:"Video retrieval based on semantic concepts",Proceedings of the IEEE, vol. 96, pp. 602--622, 2008.
[20]
T. Mei, B. Yang, X. Hua, S. Li:"Contextual Video Recommendation by Multimodal Relevance and User Feedback",Information Systems (TOIS), 2011.
[21]
B. Ionescu, K. Seyerlehner, I. Mironica, C. Vertan, P. Lambert:"An Audio-Visual Approach to Web Video Categorization",MTAP, 2012.%metrics
[22]
I. Mironica, B. Ionescu, C. Vertan:"The influence of the similarity measure to relevance feedback",in Proceedings of the European Signal Processing Conference, Eusipco 2012.
[23]
.H. Cha:"Comprehensive Survey on Distance/Similarity Measures Between Probability Density Functions",Int. Journal of Mathematical Models and Methods in Applied Sciences, 2007.% pp. 300--307, 2007.
[24]
. Rubner, C. Tomasi, L. J. Guibas:"A Metric for Distributions with Applications to Image Databases", European Conference on Computer Vision,1998.
[25]
. Deza, M.M. Deza:"Dictionary of Distances",Elsevier Science, 1st edition, 2006.
[26]
. Hatzigiorgaki, A. N. Skodras:"Compressed Domain Image Retrieval: A Comparative Study of Similarity Metrics", SPIE Visual Communications and Image Processing, vol. 5150, 2003.
[27]
. Kelm, S. Schmiedeke, T. Sikora,"Feature-based video key frame extraction for low quality video sequences",WIAMIS, 2009.
[28]
K. Seyerlehner, M. Schedl, T. Pohle, P. Knees:"Using Block Level Features for Genre Classification, Tag Classification and Music Similarity Estimation",Music Information Retrieval Evaluation eXchange, 2010.
[29]
. Liu, L. Xie, H. Meng:"Classification of music and speech in mandarin news broadcasts", Conf. on Machine Speech Communication 2007.
[30]
aafe core features,http://yaafe.sourceforge.net/
[31]
. Sikora:"The MPEG-7 Visual Standard for Content Description - An Overview",IEEE Transactions on Circuits and Systems for Video Technology, 2001.
[32]
. Ludwig, D. Delgado, V. Goncalves, U. Nunes:"Trainable Classifier-Fusion Schemes: An Application To Pedestrian Detection",IEEE Int. Conference On Intelligent Transportation Systems, 1, pp. 432--437, 2009.
[33]
. Rasche:"An Approach to the Parameterization of Structure for Fast Categorization",Int. Journal of Computer Vision, 87(3), pp. 337--356, 2010.
[34]
S. Nowak, M. Huiskes:"New strategies for image annotation: Overview of the photo annotation task at ImageClef 2010",In the Working Notes of CLEF 2010.
[35]
L. Lamel, J.-L. Gauvain:"Speech Processing for Audio Indexing",Int. Conf. on Natural Language Processing, LNCS, 5221, pp. 4--15, Springer Verlag, 2008.

Cited By

View all
  • (2019)Security Model of Internet of Things Based on Binary Wavelet and Sparse Neural NetworkInternational Journal of Mobile Computing and Multimedia Communications10.4018/IJMCMC.201901010110:1(1-17)Online publication date: 1-Jan-2019
  • (2019)Robust and Efficient Modulation Recognition Based on Local Sequential IQ FeaturesIEEE INFOCOM 2019 - IEEE Conference on Computer Communications10.1109/INFOCOM.2019.8737397(1612-1620)Online publication date: 29-Apr-2019
  • (2016)Mental Visual IndexingProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967296(621-625)Online publication date: 1-Oct-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
April 2013
362 pages
ISBN:9781450320337
DOI:10.1145/2461466
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fisher kernels
  2. multimodal video retrieval
  3. relevance feedback

Qualifiers

  • Research-article

Conference

ICMR'13
Sponsor:

Acceptance Rates

ICMR '13 Paper Acceptance Rate 38 of 96 submissions, 40%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Security Model of Internet of Things Based on Binary Wavelet and Sparse Neural NetworkInternational Journal of Mobile Computing and Multimedia Communications10.4018/IJMCMC.201901010110:1(1-17)Online publication date: 1-Jan-2019
  • (2019)Robust and Efficient Modulation Recognition Based on Local Sequential IQ FeaturesIEEE INFOCOM 2019 - IEEE Conference on Computer Communications10.1109/INFOCOM.2019.8737397(1612-1620)Online publication date: 29-Apr-2019
  • (2016)Mental Visual IndexingProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967296(621-625)Online publication date: 1-Oct-2016
  • (2016)On interactive learning-to-rank for IRNeurocomputing10.1016/j.neucom.2016.03.084208:C(3-24)Online publication date: 5-Oct-2016
  • (2016)A statistical framework for online learning using adjustable model selection criteriaEngineering Applications of Artificial Intelligence10.1016/j.engappai.2015.10.01149:C(19-42)Online publication date: 1-Mar-2016
  • (2016)Fisher Kernel Temporal Variation-based Relevance Feedback for video retrievalComputer Vision and Image Understanding10.1016/j.cviu.2015.10.005143:C(38-51)Online publication date: 1-Feb-2016
  • (2015)Gradient-based Signatures for Efficient Similarity Search in Large-scale Multimedia DatabasesProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806459(1241-1250)Online publication date: 17-Oct-2015
  • (2014)Zero-Example Event Search using MultiModal Pseudo Relevance FeedbackProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578764(297-304)Online publication date: 1-Apr-2014
  • (2013)Daily Living Activities Recognition via Efficient High and Low Level Cues Combination and Fisher Kernel RepresentationImage Analysis and Processing – ICIAP 201310.1007/978-3-642-41181-6_44(431-441)Online publication date: 2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media