ABSTRACT
The Vector of Locally Aggregated Descriptors (VLAD) method, developed from BOW and Fisher Vector, has got great successes in image classification and retrieval. However, the traditional VLAD only assigns local descriptors to the closest visual words in the codebook, which is a hard voting process that leads to a large quantization error. In this paper, we propose an approach to fuse VLAD and locality-constrained linear coding (LLC), compared with the original method, several nearest neighbor centers are considered when assigning local descriptors. We use the reconstruction coefficients of LLC to obtain the weights of several nearest neighbor centers. Due to the excellent representation ability of the reconstruction coefficients for local descriptors, we also combine it with VLAD coding. Experiments were conducted on the 15 Scenes, UIUC Sports Event and Corel 10 datasets to demonstrate that our proposed method has outstanding performance in terms of classification accuracy. Our approach also does not generate much additional computational cost while encoding features.
- Sivic, J. and Zisserman, A. 2003. Video google: a text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, 1470--1477. Google ScholarDigital Library
- Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91--110. Google ScholarDigital Library
- Cortes, C. and Vapnik, V. 1995. Support-vector networks. Machine learning, 20(3), 273--297. Google ScholarDigital Library
- Lazebnik, S., Schmid, C. and Ponce, J. 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, 2169--2178. Google ScholarDigital Library
- Liu, L., Wang, L. and Liu, X. 2011. In defense of soft-assignment coding. In IEEE International Conference on Computer Vision, 2486--2493. Google ScholarDigital Library
- Yang, J., Yu, K., Gong, Y. and Huang, T. 2009. Linear spatial pyramid matching using sparse coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, 179--1801.Google Scholar
- Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. and Gong, Y. 2010. Locality-constrained linear coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, 3360--3367.Google Scholar
- Cinbis, R. G., Verbeek, J. and Schmid, C. 2012. Image categorization using Fisher kernels of non-iid image models. In IEEE Conference on Computer Vision and Pattern Recognition, 2184--2191. Google ScholarDigital Library
- Perronnin, F., Sánchez, J. and Mensink, T. 2010. Improving the fisher kernel for large-scale image classification. In European conference on computer vision, 143--156. Google ScholarDigital Library
- Jégou, H., Douze, M., Schmid, C. and Pérez, P. 2010. Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision and Pattern Recognition, 3304--3311.Google Scholar
- Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P. and Schmid, C. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704--1716. Google ScholarDigital Library
- Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I. Y., Tsoumakas, G. and Vlahavas, I. 2014. A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Transactions on Multimedia, 16(6), 1713--1728.Google ScholarCross Ref
- Kastaniotis, D., Fotopoulou, F., Theodorakopoulos, I., Economou, G. and Fotopoulos, S. 2017. HEp-2 cell classification with vector of hierarchically aggregated residuals. Pattern Recognition, 65, 47--57. Google ScholarDigital Library
- Duta, I. C., Ionescu, B., Aizawa, K. and Sebe, N. 2017. Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos. In IEEE Conference on Computer Vision and Pattern Recognition, 3205--3214.Google Scholar
- Wang, Y., Cen, Y., Zhao, R., Kan, S. and Hu, S. 2016. Fusion of multiple VLAD vectors based on different features for image retrieval. In IEEE International Conference on Signal Processing, 742--746.Google Scholar
- Wang, Z., Wang, Y., Wang, L. and Qiao, Y. 2016. Codebook enhancement of VLAD representation for visual recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing, 1258--1262.Google Scholar
- Kim, T. E. and Kim, M. H. 2015. Improving the search accuracy of the VLAD through weighted aggregation of local descriptors. Journal of Visual Communication and Image Representation, 31, 237--252. Google ScholarDigital Library
- Tan, Z., Wang, W., Jiang, Y. and Wang, R. 2016. A simple but efficient way to combine VLAD with locality-constrained linear coding. In IEEE International Conference on Visual Communications and Image Processing, 1--4.Google Scholar
- Delhumeau, J., Gosselin, P. H., Jégou, H. and Pérez, P. 2013. Revisiting the VLAD image representation. In Proceedings of the 21st ACM international conference on Multimedia, 653--656. Google ScholarDigital Library
- Arandjelovic, R. and Zisserman, A. 2013. All about VLAD. In IEEE conference on Computer Vision and Pattern Recognition, 1578--1585. Google ScholarDigital Library
- Yu, K., Zhang, T. and Gong, Y. 2009. Nonlinear learning using local coordinate coding. In Advances in neural information processing systems, 2223--2231. Google ScholarDigital Library
- Long, X., Lu, H., Peng, Y., Wang, X. and Feng, S. 2016. Image classification based on improved VLAD. Multimedia Tools and Applications, 75(10), 5533--5555. Google ScholarDigital Library
- Li, L. J. and Fei-Fei, L. 2007. What, where and who? Classifying events by scene and object recognition. In IEEE International Conference on Computer Vision, 1--8.Google Scholar
- Fei-Fei, L., Fergus, R. and Perona, P. 2007. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer vision and Image understanding, 106(1), 59--70. Google ScholarDigital Library
- Gao, S., Tsang, I., Chia, L. and Zhao, P. (2010). Local Features Are Not Lonely-Laplacian Sparse Coding for Image Classification. In IEEE Conference on Computer Vision and Pattern Recognition, 3555--3561Google Scholar
Index Terms
- VLAD Encoding Based on LLC for Image Classification
Recommendations
CSIFT based locality-constrained linear coding for image classification
In the past decade, SIFT descriptor has been witnessed as one of the most robust local invariant feature descriptors and widely used in various vision tasks. Most traditional image-classification systems depend on the gray-based SIFT descriptors, which ...
Image classification based on improved VLAD
Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in ...
Weighted two-step aggregated VLAD for image retrieval
AbstractThe vector of locally aggregated descriptor (VLAD) has been demonstrated to be efficient and effective in image retrieval and classification tasks. Due to the small-size codebook adopted by the method, the feature space division is coarse and the ...
Comments