Abstract
The Fisher Vector (FV) representation of images can be seen as an extension of the popular bag-of-visual word (BOV). Both of them are based on an intermediate representation, the visual vocabulary built in the low level feature space. If a probability density function (in our case a Gaussian Mixture Model) is used to model the visual vocabulary, we can compute the gradient of the log likelihood with respect to the parameters of the model to represent an image. The Fisher Vector is the concatenation of these partial derivatives and describes in which direction the parameters of the model should be modified to best fit the data. This representation has the advantage to give similar or even better classification performance than BOV obtained with supervised visual vocabularies, being at the same time class independent. This latter property allows its usage both in supervised (categorization, semantic image segmentation) and unsupervised tasks (clustering, retrieval). In this paper we will show how it was successfully applied to these problems achieving state-of-the-art performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2 (2003)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision (2004)
Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group sensitive multiple kernel learning for object categorization. In: ICCV (2009)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73(2) (2007)
Tahir, M., Kittler, J., Mikolajczyk, K., Yan, F., van de Sande, K., Gevers, T.: Visual category recognition using spectral regression and kernel discriminant analysis. In: ICCV Workshop on Subspace Methods (2009)
Gemert, J.V., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. IEEE PAMI (accepted, 2010)
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC: The PASCAL Visual Object Classes Challenge, http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Wang, G., Hoiem, D., Forsyth, D.: Learning image similarity from flickr groups using stochastic intersection kernel machines. In: ICCV (2009)
Maji, S., Berg, A.: Max-margin additive classifiers for detection. In: ICCV (2009)
Perronnin, F., Sánchez, J., Liu, Y.: Large-scale image categorization with explicit data embedding. In: CVPR (2010)
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: The qbic system. IEEE Computer 28(9), 23–32 (1995)
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. JMLR 5 (2004)
Squire, D.M., Müller, W., Müller, H., Rakiller, J., Raki, J.: Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback. Pattern Recognition Letters 21(13-14), 143–149 (1999)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: CVPR (2010)
Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: ICCV (2009)
Zhang, X., Li, Z., Zhang, L., Ma, W., Shum, H.-Y.: Efficient indexing for large-scale visual search. In: ICCV (2009)
Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds.): ImageCLEF- Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series. Springer, Heidelberg (2010) ISBN 978-3-642-15180-4
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)
Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, vol. 11 (1999)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2008 Results (2008), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/results/index.shtml
Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: VOC2007 Results (2007), http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/results/index.shtml
Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV (2009)
Nowak, S., Huiskes, M.: New strategies for image annotation: Overview of the photo annotation task at ImageCLEF 2010. In: [42]
Mensink, T., Csurka, G., Perronnin, F., Sanchez, J., Verbeek, J.: LEAR and XRCE’s participation to visual concept detection task - ImageCLEF 2010. In: [42]
van de Sande, K.E.A., Gevers, T.: The university of amsterdam’s concept detection system at ImageCLEF 2010. In: [42]
Motohashi, N., Izawa, R., Takagi, T.: Meiji university at the ImageCLEF2010 visual concept detection and annotation task: Working notes. In: [42]
Clinchant, S., Csurka, G., Ah-Pine, J., Jacquet, G., Perronnin, F., Sanchez, J., Minoukadeh, K.: XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In: [42]
ImagCLEF, http://ir.shef.ac.uk/imageclef/
Ah-Pine, J., Clinchant, S., Csurka, G., Perronnin, F., Renders, J.M.: 3.4. In: [21] ISBN 978-3-642-15180-4
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)
Verbeek, J., Triggs, B.: Scene segmentation with crfs learned from partially labeled images. In: NIPS (2007)
Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Csurka, G., Perronnin, F.: A simple high performance approach to semantic segmentation. In: BMVC (2008)
Carreira, J., Sminchisescu, C.: Constrained parametric min-cuts for automatic object segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (June 2010); description of our PASCAL VOC 2009 segmentation entry
Marchesotti, L., Cifarelli, C., Csurka, G.: A framework for visual saliency detection with applications to image thumbnailing. In: ICCV (2009)
Braschler, M., Harman, D.: CLEF 2010 LABs and Workshops, Notebook Papers, September 22-23, Padua, Italy (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Csurka, G., Perronnin, F. (2011). Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations. In: Richard, P., Braz, J. (eds) Computer Vision, Imaging and Computer Graphics. Theory and Applications. VISIGRAPP 2010. Communications in Computer and Information Science, vol 229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25382-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-25382-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25381-2
Online ISBN: 978-3-642-25382-9
eBook Packages: Computer ScienceComputer Science (R0)