Abstract
The bag-of-feature (BoF) model provides a way to construct high-level representation for image classification. Although spatial pyramid matching (SPM) has been incorporated into many of its extensions, these models intrinsically lack the mechanism to utilize frequency domain information. In this paper, we propose the locality-constrained encoding of frequency and spatial information (LEFSI) algorithm, in which an image is decomposed into multiple frequency components and each component is further decomposed into subregions using SPM. The scale-invariant feature transform (SIFT) descriptors are first calculated in each subregion, and then converted into a global descriptor by using the codebook generated on a category-by-category basis and locality-constrained linear coding (LLC). The image feature is defined as the concatenation of global descriptors constructed in all subregions. We evaluated this algorithm against several state-of-the-art models on six benchmark datasets. Our results suggest that the proposed LEFSI algorithm can describe images more effectively and provide more accurate image classification.










Similar content being viewed by others
References
Bo L, Ren X, Fox D (2011) Hierarchical matching pursuit for image classification: Architecture and fast algorithms. Adv Neural Inform Process Syst NIPS 2011:2115–2123
Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. IEEE international conference on computer vision, ICCV 2007, Rio de Janeiro, Brazil, 14-20 October (pp 1-8)
Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 2559–2566
Brown M, Lowe D G (2003) Recognising panoramas, vol. 2. Proceedings Ninth IEEE International Conference on Computer Vision, ICCV 2003, Nice, pp 1218–1225
Csurka G (2004) Visual categorization with bags of keypoints. Workshop Stat Learn Eur Conf Comput Vision ECCV 44(247):1–22
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Sys Man Cybern Part B 43(4):996–1002
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, San Diego, ca, Usa, 20-26 June (Vol.1, pp 886-893)
Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J (2017) Large-scale image retrieval with sparse embedded hashing. Neurocomputing 257:24–36
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9(Aug):1871–1874
Gao Y, Wang M, Tao D, Ji R, Dai Q (1993) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing Publ IEEE Signal Process Soc 21(9):4290–4303
Gao S, Tsang WH, Chia LT (2010) Kernel sparse representation for image classification and face recognition. European conference on computer vision, ECCV 2010, Heraklion Crete, Greece, 5-11 September (pp 1-14)
Griffin G, Holub A, Perona P (2007) Caltech-256 Object Category Dataset. California Institute of Technology. (Unpublished) URL: http://authors.library.caltech.edu/7694
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Hu W, Xie N, Hu R, Ling H, Chen Q, Yan S, Maybank S (2014) Bin Ratio-Based Histogram Distances and Their Application to Image Classification. IEEE Trans Pattern Anal Mach Intell 36(12):2338–2352
Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013, Portland, p 923–930
Krause J, Stark M, Jia D, Li FF (2013) 3d object representations for fine-grained categorization. IEEE international conference on computer vision workshops, ICCV 2013, darling harbour, Sydney, Australia, 1-8 December (pp. 554-561)
Larlus D, Jurie F (2009) Latent mixture vocabularies for object categorization and segmentation. Image Vis Comput 27(5):523–534
Lazebnik S, Schmid C, Ponce J (2004) Semi-local Affine Parts for Object Recognition. British Machine Vision Conference, BMVC 2004, Kingston, pp 779–788
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference Comp Vision Pattern Recogn CVPR 2006:2169–2178
Li FF, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories, vol. 2. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, pp 524–531
Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Li LJ, Su H, Lim Y, Li FF (2014) Object Bank: An Object-Level Image Representation for High-Level Visual Recognition. Int J Comput Vis 107(1):20–39
Li X, Shi J, Dong YS, Tao DC (2015) A survey on scene image classification. SCIENCE CHINA Technol Sci 45:827–848
Li T, Ni B, Wu X, Gao Q, Li Q, Sun D (2016a) On random hyper-class random forest for visual classification. Neurocomputing 172(C:281–289
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. International conference on pattern recognition, ICPR 2012, Tsukuba, Japan, 11-15 November (pp 898-901)
Liu Y, Nie L, Han L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. International joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July (pp 1617-1623)
Liu L, Cheng L, Liu Y, Rosenblum DS (2016a) Recognizing complex activities by a probabilistic interval-based model. Thirtieth AAAI conference on artificial intelligence, AAAI 2016, phoenix, Arizona Usa, 12-17 February (Vol.30, pp 1266-1272)
Liu Y, Nie L, Liu L, Rosenblum DS (2016b) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Luo C, Ni B, Yan S, Wang M, Image Classification by Selective Regularized Subspace Learning. IEEE Trans Multimedia 18(1):40–50
Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124
Luo Y, Wen Y, Tao D, Gui J, Xu C (2016) Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans Image Process 25(1):414–427
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. IEEE computer society conference on computer vision and pattern recognition, CVPR 2006, New York, NY, Usa, 17-22 June (pp 1447-1454)
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. Indian conference on computer vision, Graphics & Image Processing, ICVGIP 2008, Bhubaneswar, pp 722–729
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59
Preoţiuc-Pietro D, Ye L, Hopkins D, Ungar L (2017) Beyond binary labels: political ideology prediction of twitter users. Annual meeting of the Association for Computational Linguistics, ACL2017, Vancouver, Canada, 30 July-4 august (Vol.1, pp.729-740)
Quan Y, Xu Y, Sun Y, Huang Y (2016) Supervised dictionary learning with multiple classifier integration. Pattern Recogn 55:247–260
Quattoni A, Torralba A (2009) Recognizing indoor scenes. IEEE conference on computer vision and pattern recognition, CVPR 2009, Miami, Florida, Usa, 20-25 June (pp 413-420)
Sadeghi F, Tappen MF (2012) Latent Pyramidal Regions for Recognizing Scenes. European Conference on Computer Vision, ECCV 2012, Florence, Italy, 7-13 October
Shaban A, Rabiee HR, Najibi M, Yousefi S (2015) From Local Similarities to Global Coding: A Framework for Coding Applications. IEEE Trans Image Process 24(12):5074–5085
Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408
Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, pp 1312–1320
Thiagarajan JJ, Ramamurthy KN, Spanias A (2014) Multiple kernel sparse representations for supervised and unsupervised learning. IEEE Trans Image Process 23(7):2905–2915
van de Sande K, Gevers T, Snoek C (2010) Evaluating Color Descriptors for Object and Scene Recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. International conference on multimedia, MM 2010, Firenze, Italy, 25-29 October (pp 1469–1472)
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology
Wang ZZ, Yong JH (2008) Texture Analysis and Classification With Linear Regression Model Based on Wavelet Transform. IEEE Trans Image Process 17(8):1421–1430
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained Linear Coding for image classification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 3360–3367
Wang S, Wang Y, Zhu SC (2015) Learning hierarchical space tiling for scene modeling, parsing and attribute tagging. IEEE Trans Pattern Anal Mach Intell 37(12):2478–2491
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Trans Image Process 23(5):1994–2008
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, pp 1794–1801
Yu K, Zhang T, Gong Y (2009) Nonlinear Learning using Local Coordinate Coding. Adv Neural Inform Process Syst NIPS 2009:2223–2231
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision, ECCV 2014, Zurich, Switzerland, 6-12 September (pp 818-833)
Zhang L, Zhang D (2016) Visual Understanding via Multi-Feature Shared Learning With Global Consistency. IEEE Trans Multimedia 18(2):247–259
Zhangzhang S, Song-Chun Z (2013) Learning AND-OR templates for object recognition and detection. IEEE Trans Softw Eng 35(9):2189–2205
Zhao S, Yao H, Gao Y, Ding G, Chua Ts (1949) Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing PP(99):1–1
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3D object retrieval via multi-modal graph learning. Signal Process 112(C):110–118
Zhao S, Yao H, Gao Y, Ji R, Ding G (2017) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645
Zhu J, Wu T, Zhu SC, Yang X, Zhang W (2016) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166
Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants 61471297 and 61771397, in part by Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University and in part by the Australian Research Council (ARC) Grants.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pan, Y., Xia, Y., Song, Y. et al. Locality constrained encoding of frequency and spatial information for image classification. Multimed Tools Appl 77, 24891–24907 (2018). https://doi.org/10.1007/s11042-018-5712-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5712-3