Land-use scene classification: a comparative study on bag of visual word framework

Shahriari, Mana; Bergevin, Robert

doi:10.1007/s11042-016-4316-z

Land-use scene classification: a comparative study on bag of visual word framework

Published: 11 January 2017

Volume 76, pages 23059–23075, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

513 Accesses
18 Citations
Explore all metrics

Abstract

With successful launch of high spatial resolution (HSR) sensors, highly detailed spatial information is provided for remote sensing research. This improvement has allowed researchers to monitor environmental changes on a small spatial scale. However traditional pixel-based classification approaches are not able to interpret high spatial resolution remote sensing imagery effectively. Bag of visual words (BoVW) framework, on the other hand, is becoming one of the most popular approaches to validate the performance of remote sensing image datasets. While pixel-based approaches may not fully describe very high-resolution remote sensing images, BoVW model is narrowing the gap between low-level features and high-level semantic features by generating an intermediate description of image features. This paper presents a comparative study to evaluate the potential of using different coding approaches of BoVW model to solve the land-use scene classification problem. Initially, this work summarizes different configurations of BoVW framework in coding and clustering. Later, we perform an extensive evaluation of BoVW on land-use scene classification and retrieval. Finally we draw several conclusions regarding different coding strategies of BoVW, codebook size and number of training images. The approach is validated on two commonly used datasets in remote sensing, UC Merced a 21-class land-use dataset and RSDataset a 19-class satellite scene dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bag of Visual Words Methodology in Remote Sensing—A Review

Improvement the Bag of Words Image Representation Using Spatial Information

Collaborative Clustering Approach Based on Dempster-Shafer Theory for Bag-of-Visual-Words Codebook Generation

References

Aksoy S, Koperski K, Tusk C, Marchisio G, Tilton JC (2005) Learning bayesian classifiers for scene classification with a visual grammar. IEEE Trans Geosci Remote Sens 43(3):581–589. doi:10.1109/TGRS.2004.839547
Article Google Scholar
Avila S, Thome N, Cord M, Valle E, de A. Araújo A. (2013) Pooling in image representation: The visual codeword point of view. Comput Vis Image Underst 117 (5):453–465. doi:10.1016/j.cviu.2012.09.007. http://www.sciencedirect.com/science/article/pii/S1077314212001737
Article Google Scholar
Blaschke T (2010) Object based image analysis for remote sensing. ISPRS J Photogramm Remote Sens 65(1):2–16. doi:10.1016/j.isprsjprs.2009.06.004. http://www.sciencedirect.com/science/article/pii/S0924271609000884
Article Google Scholar
Chen C, Zhang B, Su H, Li W, Wang L (2016) Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4):745–752. doi:10.1007/s11760-015-0804-2
Article Google Scholar
Chen C, Zhou L, Guo J, Li W, Su H, Guo F (2015) Gabor-filtering-based completed local binary patterns for land-use scene classification. In: IEEE International conference on multimedia big data (bigMM), 2015, pp. 324–329. doi:10.1109/BigMM.2015.23
Chen S, Tian Y (2015) Pyramid of spatial relatons for scene-level land use classification. IEEE Trans Geosci Remote Sens 53(4):1947–1957. doi:10.1109/TGRS.2014.2351395
Article Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In workshop on statistical learning in computer vision, ECCV, pp. 1–22
Dai D, Yang W (2011) Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geosci Remote Sens Lett 8(1):173–176. doi:10.1109/LGRS.2010.2055033
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. vol. 1, pp. 886–893 vol. 1. doi:10.1109/CVPR.2005.177
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Gao S, Tsang IWH, Chia LT (2010) Computer vision – ECCV 2010: 11th european conference on computer vision, heraklion, crete, Greece, september 5-11, 2010, proceedings, Part IV, chap. Kernel sparse representation for image classification and face recognition, pp. 1–14. Springer berlin heidelberg, berlin, heidelberg. doi:10.1007/978-3-642-15561-1_1
Hu J, Xia GS, Hu F, Sun H, Zhang L (2015) A comparative study of sampling analysis in scene classification of high-resolution remote sensing imagery. In: 2015 IEEE International geoscience and remote sensing symposium (IGARSS), pp. 2389–2392. doi:10.1109/IGARSS.2015.7326290
Huang Y, Wu Z, Wang L, Tan T (2014) Feature coding in image classification: a comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3):493–506. doi:10.1109/TPAMI.2013.113
Article Google Scholar
Jaakkola TS, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, MA, USA, pp 487–493. http://dl.acm.org/citation.cfm?id=340534.340715
Google Scholar
Ken Chatfield Victor Lempitsky, A.V., Zisserman, A. (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the british machine vision conference, pp. 76.1–76.12. BMVA press. doi:10.5244/C.25.76
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer society conference on computer vision and pattern recognition, 2006, vol. 2, pp. 2169–2178. doi:10.1109/CVPR.2006.68
Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: IEEE International conference on computer vision (ICCV), 2011, pp. 2486–2493. doi:10.1109/ICCV.2011.6126534
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal region. Image Vis Comput 22(10):761–767. British Machine Vision Computing 2002. doi:10.1016/j.imavis.2004.02.006, http://www.sciencedirect.com/science/article/pii/S0262885604000435
Article Google Scholar
McLachlan G, Peel D (2004) Finite mixture models. John Wiley & Sons
Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340):2
Google Scholar
Ojala T, Pietikäinen M., Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29 (1):51–59. doi:10.1016/0031-3203(95)00067-4. http://www.sciencedirect.com/science/article/pii/0031320395000674
Article Google Scholar
Perronnin F, Liu Y, Sanchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 3384–3391. doi:10.1109/CVPR.2010.5540009
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on computer vision and pattern recognition, pp. 1–8. doi:10.1109/CVPR.2007.383172
Qi K, Wu H, Shen C, Gong J (2015) Land-use scene classification in high-resolution remote sensing images using improved correlatons. IEEE Geosci Remote Sens Lett 12(12):2403–2407. doi:10.1109/LGRS.2015.2478966
Article Google Scholar
Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2011, pp. 1665–1672. doi:10.1109/CVPR.2011.5995504
dos Santos JA, Penatti OAB, DS, Torres, R, Gosselin, PH, Philipp-Foliguet, S, Falco, A (2012) Improving texture description in remote sensing image multi-scale classification tasks by using visual words. In: 21St international conference on pattern recognition (ICPR), 2012, pp. 3090– 3093
dos Santos JA, Penatti OAB, da Silva Torres R (2010) Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. In: VISAPP (2), Pp. 203– 208
dos Santos JA, da Silva Torres R (2013) Remote sensing image segmentation and representation through multiscale analysis. In: 26Th conference on graphics, patterns and images tutorials (SIBGRAPI-t), 2013, pp. 23–30. doi:10.1109/SIBGRAPI-T.2013.11
Shaw GA, Burke HHK (2003) Spectral imaging for remote sensing. Lincoln Laboratory Journal 14(1):3–28
Google Scholar
Simonyan K, Parkhi OM, Vedaldi A, Zisserman A (2013) Fisher vector faces in the wild. In: British machine vision conference
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE international conference on computer vision, 2003, pp. 1470–1477 vol.2. doi:10.1109/ICCV.2003.1238663
Tuytelaars T (2010) Dense interest points. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 2281–2288. doi:10.1109/CVPR.2010.5539911
Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vision 59(1):61–85. doi:10.1023/B:VISI.0000020671.28016.e8
Article Google Scholar
Vedaldi A, Fulkerson B (2008) VLFEat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 3360–3367. doi:10.1109/CVPR.2010.5540018
Xia GS, Yang W, Delon J, Gousseau Y, Sun H, Maître H (2010) Structural High-resolution Satellite Image Indexing. In: Wagner B Székely W (ed) ISPRS TC VII Symposium - 100 years ISPRS, vol. XXXVIII. Vienna, Austria, pp 298–303. https://hal.archives-ouvertes.fr/hal-00458685
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009, pp. 1794–1801. doi:10.1109/CVPR.2009.5206757
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, GIS ’10. ACM, New York, NY, USA, pp 270–279. doi:10.1145/1869790.1869829
Google Scholar
Yu Q, Gong P, Clinton N, Biging G, Kelly M, Schirokauer D (2006) Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery. Photogramm Eng Remote Sens 72(7):799–811
Article Google Scholar
Zhang J, Cheng Z, Li T (2015) A bag-of-visual words approach based on optimal segmentation scale for high resolution remote sensing image classification. In: 2015 IEEE International geoscience and remote sensing symposium (IGARSS), pp. 1012–1015. doi:10.1109/IGARSS.2015.7325940
Zhang J, Li T, Lu X, Cheng Z (2016) Semantic classification of high-resolution remote-sensing images based on mid-level features. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9(6):2343–2353. doi:10.1109/JSTARS.2016.2536943
Article Google Scholar
Zhao L, Tang P, Huo L (2014) A 2-d wavelet decomposition-based bag-of-visual-words model for land-use scene classification. Int J Remote Sens 35 (6):2296–2310. doi:10.1080/01431161.2014.890762
Google Scholar
Zhao LJ, Tang P, Huo LZ (2014) Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7(12):4620–4631. doi:10.1109JSTARS.2014.2339842
Zhao Y, Zhang L, Li P, Huang B (2007) Classification of high spatial resolution imagery using improved gaussian markov random-field-based texture features. IEEE Trans Geosci Remote Sens 45(5):1458–1468. doi:10.1109/TGRS.2007.892602
Article Google Scholar
Zhou X, Yu K, Zhang T, Huang TS (2010) Computer vision – ECCV 2010: 11th european conference on computer vision, heraklion, crete, Greece, september 5-11, 2010, proceedings, Part V, chap. Image classification using super-vector coding of local image descriptors, pp. 141–154. Springer berlin heidelberg, berlin, heidelberg. doi:10.1007/978-3-642-15555-0_11
Zhu Q, Zhong Y, Zhao B, Xia GS, Zhang L (2016) Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery. IEEE Geosci Remote Sens Lett 13(6):747–751. doi:10.1109/LGRS.2015.2513443
Article Google Scholar
Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226. doi:10.1016/j.ins.2016.02.021. http://www.sciencedirect.com/science/article/pii/S0020025516300755
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is partially supported by a Discovery Grant to Professor Robert Bergevin from the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Computer Vision and Systems Laboratory, Laval University, Quebec, QC, Canada
Mana Shahriari & Robert Bergevin

Authors

Mana Shahriari
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bergevin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mana Shahriari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahriari, M., Bergevin, R. Land-use scene classification: a comparative study on bag of visual word framework. Multimed Tools Appl 76, 23059–23075 (2017). https://doi.org/10.1007/s11042-016-4316-z

Download citation

Received: 02 June 2016
Revised: 14 December 2016
Accepted: 27 December 2016
Published: 11 January 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11042-016-4316-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Land-use scene classification: a comparative study on bag of visual word framework

Abstract

Access this article

Similar content being viewed by others

Bag of Visual Words Methodology in Remote Sensing—A Review

Improvement the Bag of Words Image Representation Using Spatial Information

Collaborative Clustering Approach Based on Dempster-Shafer Theory for Bag-of-Visual-Words Codebook Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Land-use scene classification: a comparative study on bag of visual word framework

Abstract

Access this article

Similar content being viewed by others

Bag of Visual Words Methodology in Remote Sensing—A Review

Improvement the Bag of Words Image Representation Using Spatial Information

Collaborative Clustering Approach Based on Dempster-Shafer Theory for Bag-of-Visual-Words Codebook Generation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation