Abstract
With successful launch of high spatial resolution (HSR) sensors, highly detailed spatial information is provided for remote sensing research. This improvement has allowed researchers to monitor environmental changes on a small spatial scale. However traditional pixel-based classification approaches are not able to interpret high spatial resolution remote sensing imagery effectively. Bag of visual words (BoVW) framework, on the other hand, is becoming one of the most popular approaches to validate the performance of remote sensing image datasets. While pixel-based approaches may not fully describe very high-resolution remote sensing images, BoVW model is narrowing the gap between low-level features and high-level semantic features by generating an intermediate description of image features. This paper presents a comparative study to evaluate the potential of using different coding approaches of BoVW model to solve the land-use scene classification problem. Initially, this work summarizes different configurations of BoVW framework in coding and clustering. Later, we perform an extensive evaluation of BoVW on land-use scene classification and retrieval. Finally we draw several conclusions regarding different coding strategies of BoVW, codebook size and number of training images. The approach is validated on two commonly used datasets in remote sensing, UC Merced a 21-class land-use dataset and RSDataset a 19-class satellite scene dataset.
Similar content being viewed by others
References
Aksoy S, Koperski K, Tusk C, Marchisio G, Tilton JC (2005) Learning bayesian classifiers for scene classification with a visual grammar. IEEE Trans Geosci Remote Sens 43(3):581–589. doi:10.1109/TGRS.2004.839547
Avila S, Thome N, Cord M, Valle E, de A. Araújo A. (2013) Pooling in image representation: The visual codeword point of view. Comput Vis Image Underst 117 (5):453–465. doi:10.1016/j.cviu.2012.09.007. http://www.sciencedirect.com/science/article/pii/S1077314212001737
Blaschke T (2010) Object based image analysis for remote sensing. ISPRS J Photogramm Remote Sens 65(1):2–16. doi:10.1016/j.isprsjprs.2009.06.004. http://www.sciencedirect.com/science/article/pii/S0924271609000884
Chen C, Zhang B, Su H, Li W, Wang L (2016) Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4):745–752. doi:10.1007/s11760-015-0804-2
Chen C, Zhou L, Guo J, Li W, Su H, Guo F (2015) Gabor-filtering-based completed local binary patterns for land-use scene classification. In: IEEE International conference on multimedia big data (bigMM), 2015, pp. 324–329. doi:10.1109/BigMM.2015.23
Chen S, Tian Y (2015) Pyramid of spatial relatons for scene-level land use classification. IEEE Trans Geosci Remote Sens 53(4):1947–1957. doi:10.1109/TGRS.2014.2351395
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In workshop on statistical learning in computer vision, ECCV, pp. 1–22
Dai D, Yang W (2011) Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geosci Remote Sens Lett 8(1):173–176. doi:10.1109/LGRS.2010.2055033
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. vol. 1, pp. 886–893 vol. 1. doi:10.1109/CVPR.2005.177
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9:1871–1874
Gao S, Tsang IWH, Chia LT (2010) Computer vision – ECCV 2010: 11th european conference on computer vision, heraklion, crete, Greece, september 5-11, 2010, proceedings, Part IV, chap. Kernel sparse representation for image classification and face recognition, pp. 1–14. Springer berlin heidelberg, berlin, heidelberg. doi:10.1007/978-3-642-15561-1_1
Hu J, Xia GS, Hu F, Sun H, Zhang L (2015) A comparative study of sampling analysis in scene classification of high-resolution remote sensing imagery. In: 2015 IEEE International geoscience and remote sensing symposium (IGARSS), pp. 2389–2392. doi:10.1109/IGARSS.2015.7326290
Huang Y, Wu Z, Wang L, Tan T (2014) Feature coding in image classification: a comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3):493–506. doi:10.1109/TPAMI.2013.113
Jaakkola TS, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, MA, USA, pp 487–493. http://dl.acm.org/citation.cfm?id=340534.340715
Ken Chatfield Victor Lempitsky, A.V., Zisserman, A. (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the british machine vision conference, pp. 76.1–76.12. BMVA press. doi:10.5244/C.25.76
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer society conference on computer vision and pattern recognition, 2006, vol. 2, pp. 2169–2178. doi:10.1109/CVPR.2006.68
Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: IEEE International conference on computer vision (ICCV), 2011, pp. 2486–2493. doi:10.1109/ICCV.2011.6126534
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal region. Image Vis Comput 22(10):761–767. British Machine Vision Computing 2002. doi:10.1016/j.imavis.2004.02.006, http://www.sciencedirect.com/science/article/pii/S0262885604000435
McLachlan G, Peel D (2004) Finite mixture models. John Wiley & Sons
Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340):2
Ojala T, Pietikäinen M., Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29 (1):51–59. doi:10.1016/0031-3203(95)00067-4. http://www.sciencedirect.com/science/article/pii/0031320395000674
Perronnin F, Liu Y, Sanchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 3384–3391. doi:10.1109/CVPR.2010.5540009
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on computer vision and pattern recognition, pp. 1–8. doi:10.1109/CVPR.2007.383172
Qi K, Wu H, Shen C, Gong J (2015) Land-use scene classification in high-resolution remote sensing images using improved correlatons. IEEE Geosci Remote Sens Lett 12(12):2403–2407. doi:10.1109/LGRS.2015.2478966
Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2011, pp. 1665–1672. doi:10.1109/CVPR.2011.5995504
dos Santos JA, Penatti OAB, DS, Torres, R, Gosselin, PH, Philipp-Foliguet, S, Falco, A (2012) Improving texture description in remote sensing image multi-scale classification tasks by using visual words. In: 21St international conference on pattern recognition (ICPR), 2012, pp. 3090– 3093
dos Santos JA, Penatti OAB, da Silva Torres R (2010) Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. In: VISAPP (2), Pp. 203– 208
dos Santos JA, da Silva Torres R (2013) Remote sensing image segmentation and representation through multiscale analysis. In: 26Th conference on graphics, patterns and images tutorials (SIBGRAPI-t), 2013, pp. 23–30. doi:10.1109/SIBGRAPI-T.2013.11
Shaw GA, Burke HHK (2003) Spectral imaging for remote sensing. Lincoln Laboratory Journal 14(1):3–28
Simonyan K, Parkhi OM, Vedaldi A, Zisserman A (2013) Fisher vector faces in the wild. In: British machine vision conference
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE international conference on computer vision, 2003, pp. 1470–1477 vol.2. doi:10.1109/ICCV.2003.1238663
Tuytelaars T (2010) Dense interest points. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 2281–2288. doi:10.1109/CVPR.2010.5539911
Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vision 59(1):61–85. doi:10.1023/B:VISI.0000020671.28016.e8
Vedaldi A, Fulkerson B (2008) VLFEat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 3360–3367. doi:10.1109/CVPR.2010.5540018
Xia GS, Yang W, Delon J, Gousseau Y, Sun H, Maître H (2010) Structural High-resolution Satellite Image Indexing. In: Wagner B Székely W (ed) ISPRS TC VII Symposium - 100 years ISPRS, vol. XXXVIII. Vienna, Austria, pp 298–303. https://hal.archives-ouvertes.fr/hal-00458685
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009, pp. 1794–1801. doi:10.1109/CVPR.2009.5206757
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, GIS ’10. ACM, New York, NY, USA, pp 270–279. doi:10.1145/1869790.1869829
Yu Q, Gong P, Clinton N, Biging G, Kelly M, Schirokauer D (2006) Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery. Photogramm Eng Remote Sens 72(7):799–811
Zhang J, Cheng Z, Li T (2015) A bag-of-visual words approach based on optimal segmentation scale for high resolution remote sensing image classification. In: 2015 IEEE International geoscience and remote sensing symposium (IGARSS), pp. 1012–1015. doi:10.1109/IGARSS.2015.7325940
Zhang J, Li T, Lu X, Cheng Z (2016) Semantic classification of high-resolution remote-sensing images based on mid-level features. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9(6):2343–2353. doi:10.1109/JSTARS.2016.2536943
Zhao L, Tang P, Huo L (2014) A 2-d wavelet decomposition-based bag-of-visual-words model for land-use scene classification. Int J Remote Sens 35 (6):2296–2310. doi:10.1080/01431161.2014.890762
Zhao LJ, Tang P, Huo LZ (2014) Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7(12):4620–4631. doi:10.1109JSTARS.2014.2339842
Zhao Y, Zhang L, Li P, Huang B (2007) Classification of high spatial resolution imagery using improved gaussian markov random-field-based texture features. IEEE Trans Geosci Remote Sens 45(5):1458–1468. doi:10.1109/TGRS.2007.892602
Zhou X, Yu K, Zhang T, Huang TS (2010) Computer vision – ECCV 2010: 11th european conference on computer vision, heraklion, crete, Greece, september 5-11, 2010, proceedings, Part V, chap. Image classification using super-vector coding of local image descriptors, pp. 141–154. Springer berlin heidelberg, berlin, heidelberg. doi:10.1007/978-3-642-15555-0_11
Zhu Q, Zhong Y, Zhao B, Xia GS, Zhang L (2016) Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery. IEEE Geosci Remote Sens Lett 13(6):747–751. doi:10.1109/LGRS.2015.2513443
Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226. doi:10.1016/j.ins.2016.02.021. http://www.sciencedirect.com/science/article/pii/S0020025516300755
Acknowledgments
This work is partially supported by a Discovery Grant to Professor Robert Bergevin from the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shahriari, M., Bergevin, R. Land-use scene classification: a comparative study on bag of visual word framework. Multimed Tools Appl 76, 23059–23075 (2017). https://doi.org/10.1007/s11042-016-4316-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4316-z