Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

Alqasrawi, Yousef; Neagu, Daniel; Cowling, Peter I.

doi:10.1007/s11760-011-0266-0

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

Original Paper
Published: 20 October 2011

Volume 7, pages 759–775, (2013)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yousef Alqasrawi¹,
Daniel Neagu² &
Peter I. Cowling³

401 Accesses
14 Citations
Explore all metrics

Abstract

The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Image Representation for Classification

Improvement the Bag of Words Image Representation Using Spatial Information

Image Classification Model Using Visual Bag of Semantic Words

Article 01 July 2019

Abbreviations

M :: Number of classes
C :: Set of M scene classes
V :: Set of M class-specific vocabularies
V _j :: Set of k visual words learned from training images of class j
v _i :: ith visual word
u _j :: jth visual word
|V|:: Size of visual vocabulary
h(d) :: Histogram of visual words for image d
h _i(d):: Number of descriptors in image d
N _d :: Total number of descriptors in image d
L :: Number of levels on the spatial pyramid layout
h ^l(d _ri):: Histogram vector of BOW for image d at level l and sub-region r _i
c ^l(d _ri):: Colour moments vector for image d at level l and sub-region r _i
m :: Number of images in the training image dataset
T :: Real-valued threshold vector
T ^l _ri :: Average density of keypoints at level land image sub-region r _i over m images
H(d) :: Feature vector for image d results from concatenation of BOW and weighted pyramidal colour moments
w :: Weight vector that indicates the importance of colour information
K :: Kernel function

References

Rui Y., Huang T.S., Chang S.F.: Image retrieval: current techniques, promising directions, and open issues. J. Vis. Commun. Image Represent. 10(1), 39–62 (1999)
Article Google Scholar
Liu Y., Zhang D., Lu G., Ma W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007)
Article MATH Google Scholar
Datta R., Joshi D., Li J., Wang J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)
Article Google Scholar
Vogel, J., Schwaninger, A., Wallraven, C., Bulthoff, H.: Categorization of natural scenes: local versus global information and the role of color. ACM Trans. Appl. Percept. 4(3), November 2007, Article 19 (2007)
Google Scholar
Ross M.G., Oliva A.: Estimating perception of scene layout properties from global image features. J. Vis. 10(1), 1–25 (2010)
Article Google Scholar
Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: Proceedings of IEEE International Conference on Computer Vision ICCV, Beijing, China, 17–21 Octo 2005, pp. 883–890 (2005)
Perina A., Cristani M., Murino V.: Learning natural scene categories by selective multi-scale feature extraction. Image Vis. Comput. 28(6), 927–939 (2010)
Article Google Scholar
Vogel J., Schiele B.: A semantic typicality measure for natural scene categorization. Lect. Notes Comput. Sci. 3175, 195–203 (2004)
Article Google Scholar
Bosch A., Munoz X., Marti R.: Which is the best way to organize/classify images by content?. Image Vis. Comput. 25(6), 778–791 (2007)
Article Google Scholar
Quelhas P., Monay F., Odobez J.M., Gatica-Perez D., Tuytelaars T.: A thousand words in a scene. in: IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1575–1589 (2007)
Article Google Scholar
Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Minneapolis, Minnesota, USA, 18–23 June 2007, pp. 1–8 (2007)
Lowe D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Quelhas, P., Odobez, J.: Natural scene image modeling using color and texture visterms. In: Proceedings of International Conference on Image and Video Retrieval, CIVR, Lecture Notes in Computer Science, Tempe, AZ, USA, 13–15 July 2006, pp. 411–421 (2006)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proceedings of ECCV Workshop on Statistical Learning in Computer Vision, Czech Republic, 11–14 May 2004, pp. 59–74 (2004)
Quelhas, P., Odobez, J.: Multi-level local descriptor quantization for bag-of-visterms image representation. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, 9–11 July 2007, pp. 242–249 (2007)
Wu, Z., Ke, Q., Sun, J., Shum, H.Y.: A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In: Proceedings of 12^th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 1992–1999 (2009)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR, New York, USA, 17–22 June 2006, pp. 2161–2168 (2006)
Perronnin F.: Universal and adapted vocabularies for generic visual categorization. in: IEEE Trans. Patt. Anal. Mach. Intell. 30(7), 1243–1256 (2008)
Article Google Scholar
Wu, J., Rehg, J.: Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 630–637 (2009)
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 494–501 (2007)
Alqasrawi, Y., Neagu, D., Cowling, P.: Natural scene image recognition by fusing weighted colour moments with bag of visual patches on spatial pyramid layout. In: Proceedings of the 9^th International Conference on Intelligent Systems Design and Applications, ISDA, IEEE Computer Society, Pisa, Italy, 30 Nov—2 Dec 2009, pp. 140–145 (2009)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the 9th ACM International Workshop on Multimedia Information Retrieval, ACM MIR, University of Augsburg, Germany, 28–29 Sept 2007, pp. 197–206 (2007)
Khan, F., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: Proceedings of 12^th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 979–986 (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognising natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, New York, USA, 17–22 June 2006, pp. 2169–2178 (2006)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6^th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 401–408 (2007)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, Alaska, USA, 24–26 June 2008, pp. 1–8 (2008)
Battiato, S., Farinella, G., Gallo, G., Ravi, D.: Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J. Image Video Process. Special Issue Multimed. Model., January 2010, pp. 1–13 (2010)
Wang J.Z., Li J., Wiederhold G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. in: IEEE Trans. Patt. Anal. Mach. Intell. 23(9), 947–963 (2001)
Article Google Scholar
Vailaya A., Figueiredo M.A.T., Jain A.K., Zhang H.J.: Image classification for content-based indexing. in: IEEE Trans. Image Process. 10(1), 117–130 (2001)
Article MATH Google Scholar
Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Database, Bombay, India, January 1998, pp. 42–51 (1998)
Oliva A., Torralba A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Swain M., Ballard D.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)
Article Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Diego, CA, USA, 20–26 June 2005, pp. 524–531 (2005)
Bosch, A., Munoz, X., Oliver, A., Marti, R.: Object and scene classification: what does a supervised approach provide us. In: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, ICPR, Hong Kong, China, 20–24 Aug 2006, pp. 773–777 (2006)
Farinella, G., Battiato, S.: Representation models and machine learning techniques for scene classification. In: Wang, P.S.P. (ed.) Chapter in Pattern Recognition, Machine Vision, Principles and Applications, Chap. 13, pp. 199–214. River publisher, Denmark (2010)
Zhu L., Zhang A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)
Article Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9^th IEEE International Conference on Computer Vision, ICCV, Nice, France, 14–17 Octo 2003, pp. 1470–1477 (2003)
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Proceedings of 10^th IEEE International Conference on Computer Vision, ICCV, Beijing, China, 17–20 Octo 2005, pp. 604–610 (2005)
Nilsback, M., Zisserman, A.: A visual vocabulary for flower classification. In: Proceedings of IEEE Conference on Computer Vision, CVPR, New York, USA, 17–22 June 2006, pp. 1447–1454 (2006)
van Gemert J.C., Snoek C.G.M., Veenman C.J., Smeulders A.W.M., Geusebroek J.M.: Comparing compact codebooks for visual categorization. Comput. Vis. Image Underst. 114(4), 450–462 (2010)
Article Google Scholar
Battiato, S., Farinella, G.M., Guarnera, G.C., Meccio, T., Puglisi, G., Ravi, D., Rizzo, R.: Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the 2^nd ACM Workshop on Multimedia in Forensics, Security and Intelligence, Firenze, Italy, 25–29 Octo 2010, pp. 65–70 (2010)
Jiang Y.G., Yang J., Ngo C.W., Hauptmann A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. in: IEEE Trans. Multimed. 12(1), 42–53 (2010)
Article Google Scholar
Mikolajczyk K., Schmid C.: A performance evaluation of local descriptors. in: IEEE Trans Pattern Anal. Machine Intell. 27, 1615–1630 (2005)
Article Google Scholar
http://www.lear.inrialpes.fr/people/mikolajczyk/.
Odone F., Barla A., Verri A.: Building kernels from binary strings for image matching. in: IEEE Trans. Image Process. 14(2), 169–180 (2005)
Article MathSciNet Google Scholar
Chang, C.-C., Ling, C.-J., LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/, 2001
van de Sande K.E.A., Gevers T., Snoek C.G.M.: Evaluating color descriptors for object and scene recognition. in: IEEE Trans Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)
Article Google Scholar
Guldogan E., Gabbouj M.: Feature selection for content-based image retrieval. J. Signal Image Video Process 2(3), 241–250 (2008)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Informatics and Media (SCIM), University of Bradford, Horton D4.02, Bradford, BD7 1DP, UK
Yousef Alqasrawi
School of Computing, Informatics and Media (SCIM), University of Bradford, Horton D4.06, Bradford, BD7 1DP, UK
Daniel Neagu
School of Computing, Informatics and Media (SCIM), University of Bradford, Horton D4.04, Bradford, BD7 1DP, UK
Peter I. Cowling

Authors

Yousef Alqasrawi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Neagu
View author publications
You can also search for this author in PubMed Google Scholar
Peter I. Cowling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yousef Alqasrawi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alqasrawi, Y., Neagu, D. & Cowling, P.I. Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification. SIViP 7, 759–775 (2013). https://doi.org/10.1007/s11760-011-0266-0

Download citation

Received: 08 September 2011
Revised: 20 September 2011
Accepted: 21 September 2011
Published: 20 October 2011
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11760-011-0266-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

Abstract

Access this article

Similar content being viewed by others

Discriminative Image Representation for Classification

Improvement the Bag of Words Image Representation Using Spatial Information

Image Classification Model Using Visual Bag of Semantic Words

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

Abstract

Access this article

Similar content being viewed by others

Discriminative Image Representation for Classification

Improvement the Bag of Words Image Representation Using Spatial Information

Image Classification Model Using Visual Bag of Semantic Words

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation