Skip to main content

Advertisement

Log in

Land-use scene classification: a comparative study on bag of visual word framework

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With successful launch of high spatial resolution (HSR) sensors, highly detailed spatial information is provided for remote sensing research. This improvement has allowed researchers to monitor environmental changes on a small spatial scale. However traditional pixel-based classification approaches are not able to interpret high spatial resolution remote sensing imagery effectively. Bag of visual words (BoVW) framework, on the other hand, is becoming one of the most popular approaches to validate the performance of remote sensing image datasets. While pixel-based approaches may not fully describe very high-resolution remote sensing images, BoVW model is narrowing the gap between low-level features and high-level semantic features by generating an intermediate description of image features. This paper presents a comparative study to evaluate the potential of using different coding approaches of BoVW model to solve the land-use scene classification problem. Initially, this work summarizes different configurations of BoVW framework in coding and clustering. Later, we perform an extensive evaluation of BoVW on land-use scene classification and retrieval. Finally we draw several conclusions regarding different coding strategies of BoVW, codebook size and number of training images. The approach is validated on two commonly used datasets in remote sensing, UC Merced a 21-class land-use dataset and RSDataset a 19-class satellite scene dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aksoy S, Koperski K, Tusk C, Marchisio G, Tilton JC (2005) Learning bayesian classifiers for scene classification with a visual grammar. IEEE Trans Geosci Remote Sens 43(3):581–589. doi:10.1109/TGRS.2004.839547

    Article  Google Scholar 

  2. Avila S, Thome N, Cord M, Valle E, de A. Araújo A. (2013) Pooling in image representation: The visual codeword point of view. Comput Vis Image Underst 117 (5):453–465. doi:10.1016/j.cviu.2012.09.007. http://www.sciencedirect.com/science/article/pii/S1077314212001737

    Article  Google Scholar 

  3. Blaschke T (2010) Object based image analysis for remote sensing. ISPRS J Photogramm Remote Sens 65(1):2–16. doi:10.1016/j.isprsjprs.2009.06.004. http://www.sciencedirect.com/science/article/pii/S0924271609000884

    Article  Google Scholar 

  4. Chen C, Zhang B, Su H, Li W, Wang L (2016) Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4):745–752. doi:10.1007/s11760-015-0804-2

    Article  Google Scholar 

  5. Chen C, Zhou L, Guo J, Li W, Su H, Guo F (2015) Gabor-filtering-based completed local binary patterns for land-use scene classification. In: IEEE International conference on multimedia big data (bigMM), 2015, pp. 324–329. doi:10.1109/BigMM.2015.23

  6. Chen S, Tian Y (2015) Pyramid of spatial relatons for scene-level land use classification. IEEE Trans Geosci Remote Sens 53(4):1947–1957. doi:10.1109/TGRS.2014.2351395

    Article  Google Scholar 

  7. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In workshop on statistical learning in computer vision, ECCV, pp. 1–22

  8. Dai D, Yang W (2011) Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geosci Remote Sens Lett 8(1):173–176. doi:10.1109/LGRS.2010.2055033

    Article  Google Scholar 

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. vol. 1, pp. 886–893 vol. 1. doi:10.1109/CVPR.2005.177

  10. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: A library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  11. Gao S, Tsang IWH, Chia LT (2010) Computer vision – ECCV 2010: 11th european conference on computer vision, heraklion, crete, Greece, september 5-11, 2010, proceedings, Part IV, chap. Kernel sparse representation for image classification and face recognition, pp. 1–14. Springer berlin heidelberg, berlin, heidelberg. doi:10.1007/978-3-642-15561-1_1

  12. Hu J, Xia GS, Hu F, Sun H, Zhang L (2015) A comparative study of sampling analysis in scene classification of high-resolution remote sensing imagery. In: 2015 IEEE International geoscience and remote sensing symposium (IGARSS), pp. 2389–2392. doi:10.1109/IGARSS.2015.7326290

  13. Huang Y, Wu Z, Wang L, Tan T (2014) Feature coding in image classification: a comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3):493–506. doi:10.1109/TPAMI.2013.113

    Article  Google Scholar 

  14. Jaakkola TS, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, MA, USA, pp 487–493. http://dl.acm.org/citation.cfm?id=340534.340715

    Google Scholar 

  15. Ken Chatfield Victor Lempitsky, A.V., Zisserman, A. (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the british machine vision conference, pp. 76.1–76.12. BMVA press. doi:10.5244/C.25.76

  16. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer society conference on computer vision and pattern recognition, 2006, vol. 2, pp. 2169–2178. doi:10.1109/CVPR.2006.68

  17. Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: IEEE International conference on computer vision (ICCV), 2011, pp. 2486–2493. doi:10.1109/ICCV.2011.6126534

  18. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  19. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal region. Image Vis Comput 22(10):761–767. British Machine Vision Computing 2002. doi:10.1016/j.imavis.2004.02.006, http://www.sciencedirect.com/science/article/pii/S0262885604000435

    Article  Google Scholar 

  20. McLachlan G, Peel D (2004) Finite mixture models. John Wiley & Sons

  21. Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331-340):2

    Google Scholar 

  22. Ojala T, Pietikäinen M., Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29 (1):51–59. doi:10.1016/0031-3203(95)00067-4. http://www.sciencedirect.com/science/article/pii/0031320395000674

    Article  Google Scholar 

  23. Perronnin F, Liu Y, Sanchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 3384–3391. doi:10.1109/CVPR.2010.5540009

  24. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on computer vision and pattern recognition, pp. 1–8. doi:10.1109/CVPR.2007.383172

  25. Qi K, Wu H, Shen C, Gong J (2015) Land-use scene classification in high-resolution remote sensing images using improved correlatons. IEEE Geosci Remote Sens Lett 12(12):2403–2407. doi:10.1109/LGRS.2015.2478966

    Article  Google Scholar 

  26. Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2011, pp. 1665–1672. doi:10.1109/CVPR.2011.5995504

  27. dos Santos JA, Penatti OAB, DS, Torres, R, Gosselin, PH, Philipp-Foliguet, S, Falco, A (2012) Improving texture description in remote sensing image multi-scale classification tasks by using visual words. In: 21St international conference on pattern recognition (ICPR), 2012, pp. 3090– 3093

  28. dos Santos JA, Penatti OAB, da Silva Torres R (2010) Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification. In: VISAPP (2), Pp. 203– 208

  29. dos Santos JA, da Silva Torres R (2013) Remote sensing image segmentation and representation through multiscale analysis. In: 26Th conference on graphics, patterns and images tutorials (SIBGRAPI-t), 2013, pp. 23–30. doi:10.1109/SIBGRAPI-T.2013.11

  30. Shaw GA, Burke HHK (2003) Spectral imaging for remote sensing. Lincoln Laboratory Journal 14(1):3–28

    Google Scholar 

  31. Simonyan K, Parkhi OM, Vedaldi A, Zisserman A (2013) Fisher vector faces in the wild. In: British machine vision conference

  32. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE international conference on computer vision, 2003, pp. 1470–1477 vol.2. doi:10.1109/ICCV.2003.1238663

  33. Tuytelaars T (2010) Dense interest points. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 2281–2288. doi:10.1109/CVPR.2010.5539911

  34. Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vision 59(1):61–85. doi:10.1023/B:VISI.0000020671.28016.e8

    Article  Google Scholar 

  35. Vedaldi A, Fulkerson B (2008) VLFEat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/

  36. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), 2010, pp. 3360–3367. doi:10.1109/CVPR.2010.5540018

  37. Xia GS, Yang W, Delon J, Gousseau Y, Sun H, Maître H (2010) Structural High-resolution Satellite Image Indexing. In: Wagner B Székely W (ed) ISPRS TC VII Symposium - 100 years ISPRS, vol. XXXVIII. Vienna, Austria, pp 298–303. https://hal.archives-ouvertes.fr/hal-00458685

  38. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009, pp. 1794–1801. doi:10.1109/CVPR.2009.5206757

  39. Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, GIS ’10. ACM, New York, NY, USA, pp 270–279. doi:10.1145/1869790.1869829

    Google Scholar 

  40. Yu Q, Gong P, Clinton N, Biging G, Kelly M, Schirokauer D (2006) Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery. Photogramm Eng Remote Sens 72(7):799–811

    Article  Google Scholar 

  41. Zhang J, Cheng Z, Li T (2015) A bag-of-visual words approach based on optimal segmentation scale for high resolution remote sensing image classification. In: 2015 IEEE International geoscience and remote sensing symposium (IGARSS), pp. 1012–1015. doi:10.1109/IGARSS.2015.7325940

  42. Zhang J, Li T, Lu X, Cheng Z (2016) Semantic classification of high-resolution remote-sensing images based on mid-level features. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9(6):2343–2353. doi:10.1109/JSTARS.2016.2536943

    Article  Google Scholar 

  43. Zhao L, Tang P, Huo L (2014) A 2-d wavelet decomposition-based bag-of-visual-words model for land-use scene classification. Int J Remote Sens 35 (6):2296–2310. doi:10.1080/01431161.2014.890762

    Google Scholar 

  44. Zhao LJ, Tang P, Huo LZ (2014) Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7(12):4620–4631. doi:10.1109JSTARS.2014.2339842

  45. Zhao Y, Zhang L, Li P, Huang B (2007) Classification of high spatial resolution imagery using improved gaussian markov random-field-based texture features. IEEE Trans Geosci Remote Sens 45(5):1458–1468. doi:10.1109/TGRS.2007.892602

    Article  Google Scholar 

  46. Zhou X, Yu K, Zhang T, Huang TS (2010) Computer vision – ECCV 2010: 11th european conference on computer vision, heraklion, crete, Greece, september 5-11, 2010, proceedings, Part V, chap. Image classification using super-vector coding of local image descriptors, pp. 141–154. Springer berlin heidelberg, berlin, heidelberg. doi:10.1007/978-3-642-15555-0_11

  47. Zhu Q, Zhong Y, Zhao B, Xia GS, Zhang L (2016) Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery. IEEE Geosci Remote Sens Lett 13(6):747–751. doi:10.1109/LGRS.2015.2513443

    Article  Google Scholar 

  48. Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226. doi:10.1016/j.ins.2016.02.021. http://www.sciencedirect.com/science/article/pii/S0020025516300755

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is partially supported by a Discovery Grant to Professor Robert Bergevin from the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mana Shahriari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahriari, M., Bergevin, R. Land-use scene classification: a comparative study on bag of visual word framework. Multimed Tools Appl 76, 23059–23075 (2017). https://doi.org/10.1007/s11042-016-4316-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4316-z

Keywords

Navigation