Locality constrained encoding of frequency and spatial information for image classification

Pan, Yongsheng; Xia, Yong; Song, Yang; Cai, Weidong

doi:10.1007/s11042-018-5712-3

Locality constrained encoding of frequency and spatial information for image classification

Published: 01 March 2018

Volume 77, pages 24891–24907, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yongsheng Pan¹,
Yong Xia ORCID: orcid.org/0000-0001-9273-2847^1,2,
Yang Song³ &
…
Weidong Cai³

399 Accesses
Explore all metrics

Abstract

The bag-of-feature (BoF) model provides a way to construct high-level representation for image classification. Although spatial pyramid matching (SPM) has been incorporated into many of its extensions, these models intrinsically lack the mechanism to utilize frequency domain information. In this paper, we propose the locality-constrained encoding of frequency and spatial information (LEFSI) algorithm, in which an image is decomposed into multiple frequency components and each component is further decomposed into subregions using SPM. The scale-invariant feature transform (SIFT) descriptors are first calculated in each subregion, and then converted into a global descriptor by using the codebook generated on a category-by-category basis and locality-constrained linear coding (LLC). The image feature is defined as the concatenation of global descriptors constructed in all subregions. We evaluated this algorithm against several state-of-the-art models on six benchmark datasets. Our results suggest that the proposed LEFSI algorithm can describe images more effectively and provide more accurate image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial locality-preserving feature coding for image classification

Article 21 February 2017

Improved Soft Assignment Coding for Image Classification

Image Classification Using Spatial Difference Descriptor Under Spatial Pyramid Matching Framework

References

Bo L, Ren X, Fox D (2011) Hierarchical matching pursuit for image classification: Architecture and fast algorithms. Adv Neural Inform Process Syst NIPS 2011:2115–2123
Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. IEEE international conference on computer vision, ICCV 2007, Rio de Janeiro, Brazil, 14-20 October (pp 1-8)
Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 2559–2566
Brown M, Lowe D G (2003) Recognising panoramas, vol. 2. Proceedings Ninth IEEE International Conference on Computer Vision, ICCV 2003, Nice, pp 1218–1225
Csurka G (2004) Visual categorization with bags of keypoints. Workshop Stat Learn Eur Conf Comput Vision ECCV 44(247):1–22
Google Scholar
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Sys Man Cybern Part B 43(4):996–1002
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, San Diego, ca, Usa, 20-26 June (Vol.1, pp 886-893)
Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J (2017) Large-scale image retrieval with sparse embedded hashing. Neurocomputing 257:24–36
Article Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9(Aug):1871–1874
MATH Google Scholar
Gao Y, Wang M, Tao D, Ji R, Dai Q (1993) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing Publ IEEE Signal Process Soc 21(9):4290–4303
Article MathSciNet MATH Google Scholar
Gao S, Tsang WH, Chia LT (2010) Kernel sparse representation for image classification and face recognition. European conference on computer vision, ECCV 2010, Heraklion Crete, Greece, 5-11 September (pp 1-14)
Griffin G, Holub A, Perona P (2007) Caltech-256 Object Category Dataset. California Institute of Technology. (Unpublished) URL: http://authors.library.caltech.edu/7694
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Hu W, Xie N, Hu R, Ling H, Chen Q, Yan S, Maybank S (2014) Bin Ratio-Based Histogram Distances and Their Application to Image Classification. IEEE Trans Pattern Anal Mach Intell 36(12):2338–2352
Article Google Scholar
Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013, Portland, p 923–930
Krause J, Stark M, Jia D, Li FF (2013) 3d object representations for fine-grained categorization. IEEE international conference on computer vision workshops, ICCV 2013, darling harbour, Sydney, Australia, 1-8 December (pp. 554-561)
Larlus D, Jurie F (2009) Latent mixture vocabularies for object categorization and segmentation. Image Vis Comput 27(5):523–534
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2004) Semi-local Affine Parts for Object Recognition. British Machine Vision Conference, BMVC 2004, Kingston, pp 779–788
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference Comp Vision Pattern Recogn CVPR 2006:2169–2178
Li FF, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories, vol. 2. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, pp 524–531
Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Article Google Scholar
Li LJ, Su H, Lim Y, Li FF (2014) Object Bank: An Object-Level Image Representation for High-Level Visual Recognition. Int J Comput Vis 107(1):20–39
Article Google Scholar
Li X, Shi J, Dong YS, Tao DC (2015) A survey on scene image classification. SCIENCE CHINA Technol Sci 45:827–848
Google Scholar
Li T, Ni B, Wu X, Gao Q, Li Q, Sun D (2016a) On random hyper-class random forest for visual classification. Neurocomputing 172(C:281–289
Article Google Scholar
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. International conference on pattern recognition, ICPR 2012, Tsukuba, Japan, 11-15 November (pp 898-901)
Liu Y, Nie L, Han L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. International joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July (pp 1617-1623)
Liu L, Cheng L, Liu Y, Rosenblum DS (2016a) Recognizing complex activities by a probabilistic interval-based model. Thirtieth AAAI conference on artificial intelligence, AAAI 2016, phoenix, Arizona Usa, 12-17 February (Vol.30, pp 1266-1272)
Liu Y, Nie L, Liu L, Rosenblum DS (2016b) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Article Google Scholar
Luo C, Ni B, Yan S, Wang M, Image Classification by Selective Regularized Subspace Learning. IEEE Trans Multimedia 18(1):40–50
Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124
Article Google Scholar
Luo Y, Wen Y, Tao D, Gui J, Xu C (2016) Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans Image Process 25(1):414–427
Article MathSciNet Google Scholar
Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. IEEE computer society conference on computer vision and pattern recognition, CVPR 2006, New York, NY, Usa, 17-22 June (pp 1447-1454)
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. Indian conference on computer vision, Graphics & Image Processing, ICVGIP 2008, Bhubaneswar, pp 722–729
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59
Article Google Scholar
Preoţiuc-Pietro D, Ye L, Hopkins D, Ungar L (2017) Beyond binary labels: political ideology prediction of twitter users. Annual meeting of the Association for Computational Linguistics, ACL2017, Vancouver, Canada, 30 July-4 august (Vol.1, pp.729-740)
Quan Y, Xu Y, Sun Y, Huang Y (2016) Supervised dictionary learning with multiple classifier integration. Pattern Recogn 55:247–260
Article Google Scholar
Quattoni A, Torralba A (2009) Recognizing indoor scenes. IEEE conference on computer vision and pattern recognition, CVPR 2009, Miami, Florida, Usa, 20-25 June (pp 413-420)
Sadeghi F, Tappen MF (2012) Latent Pyramidal Regions for Recognizing Scenes. European Conference on Computer Vision, ECCV 2012, Florence, Italy, 7-13 October
Shaban A, Rabiee HR, Najibi M, Yousefi S (2015) From Local Similarities to Global Coding: A Framework for Coding Applications. IEEE Trans Image Process 24(12):5074–5085
Article Google Scholar
Shen XB, Sun QS, Yuan YH (2015) A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction. Neurocomputing 148:397–408
Article Google Scholar
Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, pp 1312–1320
Thiagarajan JJ, Ramamurthy KN, Spanias A (2014) Multiple kernel sparse representations for supervised and unsupervised learning. IEEE Trans Image Process 23(7):2905–2915
Article MathSciNet MATH Google Scholar
van de Sande K, Gevers T, Snoek C (2010) Evaluating Color Descriptors for Object and Scene Recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Article Google Scholar
Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. International conference on multimedia, MM 2010, Firenze, Italy, 25-29 October (pp 1469–1472)
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology
Wang ZZ, Yong JH (2008) Texture Analysis and Classification With Linear Regression Model Based on Wavelet Transform. IEEE Trans Image Process 17(8):1421–1430
Article MathSciNet Google Scholar
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained Linear Coding for image classification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, pp 3360–3367
Wang S, Wang Y, Zhu SC (2015) Learning hierarchical space tiling for scene modeling, parsing and attribute tagging. IEEE Trans Pattern Anal Mach Intell 37(12):2478–2491
Article Google Scholar
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Trans Image Process 23(5):1994–2008
Article MathSciNet MATH Google Scholar
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, pp 1794–1801
Yu K, Zhang T, Gong Y (2009) Nonlinear Learning using Local Coordinate Coding. Adv Neural Inform Process Syst NIPS 2009:2223–2231
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision, ECCV 2014, Zurich, Switzerland, 6-12 September (pp 818-833)
Zhang L, Zhang D (2016) Visual Understanding via Multi-Feature Shared Learning With Global Consistency. IEEE Trans Multimedia 18(2):247–259
Article Google Scholar
Zhangzhang S, Song-Chun Z (2013) Learning AND-OR templates for object recognition and detection. IEEE Trans Softw Eng 35(9):2189–2205
Zhao S, Yao H, Gao Y, Ding G, Chua Ts (1949) Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing PP(99):1–1
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3D object retrieval via multi-modal graph learning. Signal Process 112(C):110–118
Article Google Scholar
Zhao S, Yao H, Gao Y, Ji R, Ding G (2017) Continuous probability distribution prediction of image emotions via multitask shared sparse regression. IEEE Trans Multimedia 19(3):632–645
Article Google Scholar
Zhu J, Wu T, Zhu SC, Yang X, Zhang W (2016) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166
Article MathSciNet Google Scholar
Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61471297 and 61771397, in part by Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University and in part by the Australian Research Council (ARC) Grants.

Author information

Authors and Affiliations

Shaanxi Key Laboratory of Speech & Image Information Processing (SAIIP), School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072, China
Yongsheng Pan & Yong Xia
Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072, China
Yong Xia
Biomedical and Multimedia Information Technology (BMIT) Research Group, School of Information Technologies, University of Sydney, Camperdown, NSW, 2006, Australia
Yang Song & Weidong Cai

Authors

Yongsheng Pan
View author publications
You can also search for this author inPubMed Google Scholar
Yong Xia
View author publications
You can also search for this author inPubMed Google Scholar
Yang Song
View author publications
You can also search for this author inPubMed Google Scholar
Weidong Cai
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yong Xia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, Y., Xia, Y., Song, Y. et al. Locality constrained encoding of frequency and spatial information for image classification. Multimed Tools Appl 77, 24891–24907 (2018). https://doi.org/10.1007/s11042-018-5712-3

Download citation

Received: 03 May 2017
Revised: 01 December 2017
Accepted: 22 January 2018
Published: 01 March 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11042-018-5712-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Locality constrained encoding of frequency and spatial information for image classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Spatial locality-preserving feature coding for image classification

Improved Soft Assignment Coding for Image Classification

Image Classification Using Spatial Difference Descriptor Under Spatial Pyramid Matching Framework

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now