Beyond bag of latent topics: spatial pyramid matching for scene category recognition

Lu, Fu-xiang; Huang, Jun

doi:10.1631/FITEE.1500070

Beyond bag of latent topics: spatial pyramid matching for scene category recognition

Published: 13 October 2015

Volume 16, pages 817–828, (2015)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Fu-xiang Lu¹ &
Jun Huang²

132 Accesses
8 Citations
Explore all metrics

Abstract

We propose a heterogeneous, mid-level feature based method for recognizing natural scene categories. The proposed feature introduces spatial information among the latent topics by means of spatial pyramid, while the latent topics are obtained by using probabilistic latent semantic analysis (pLSA) based on the bag-of-words representation. The proposed feature always performs better than standard pLSA because the performance of pLSA is adversely affected in many cases due to the loss of spatial information. By combining various interest point detectors and local region descriptors used in the bag-of-words model, the proposed feature can make further improvement for diverse scene category recognition tasks. We also propose a two-stage framework for multi-class classification. In the first stage, for each of possible detector/descriptor pairs, adaptive boosting classifiers are employed to select the most discriminative topics and further compute posterior probabilities of an unknown image from those selected topics. The second stage uses the prod-max rule to combine information coming from multiple sources and assigns the unknown image to the scene category with the highest ‘final’ posterior probability. Experimental results on three benchmark scene datasets show that the proposed method exceeds most state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

A Novel Spatial Layout Representation for Object Recognition

Combining Descriptors Extracted from Feature Maps of Deconvolutional Networks and SIFT Descriptors in Scene Image Classification

References

Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139. [doi:10.1006/jcss.1997.1504]
Article MathSciNet MATH Google Scholar
Harris, C., Stephens, M., 1988. A combined corner and edge detector. Alvey Vision Conf., p.147–151. [doi:10.5244/C.2.23]
Google Scholar
Hofmann, T., 1999. Probabilistic latent semantic indexing. Proc. 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.50–57. [doi:10.1145/312624.312649]
Chapter Google Scholar
Hu, Z.H., Cai, Y.Z., Li, Y.G., et al., 2005. Data fusion for fault diagnosis using multi-class support vector machines. J. Zhejiang Univ.-Sci., 6A(10):1030–1039. [doi:10.1631/jzus.2005.A1030]
Article Google Scholar
Kadir, T., Brady, M., 2001. Saliency, scale and image description. Int. J. Comput. Vis., 45(2):83–105. [doi:10.1023/A:1012460413855]
Article MATH Google Scholar
Kwitt, R., Vasconcelos, N., Rasiwasia, N., 2012. Scene recognition on the semantic manifold. European Conf. on Computer Vision, p.359–372. [doi:10.1007/978–3-642–33765–9_26]
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169–2178. [doi:10.1109/CVPR.2006.68]
Google Scholar
Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524–531. [doi:10.1109/CVPR.2005.16]
Google Scholar
Liu, J.G., Shah, M., 2007. Scene modeling using coclustering. IEEE Int. Conf. on Computer Vision, p.1–7. [doi:10.1109/ICCV.2007.4408866]
Google Scholar
Lowe, D.G., 2004. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis., 60(2):91–110. [doi:10.1023/B:VISI.0000029664.99615.94]
Article Google Scholar
Lu, F.X., Yang, X.K., Zhang, R., et al., 2009. Image classification based on pyramid histogram of topics. IEEE Int. Conf. on Multimedia and Expo, p.398–401. [doi:10.1109/ICME.2009.5202518]
Google Scholar
Lu, F.X., Yang, X.K., Lin, W.Y., et al., 2011. Image classification with multiple feature channels. Opt. Eng., 50(5):057210.1–057210.9. [doi:10.1117/1.3582852]
Google Scholar
Matas, J., Chum, O., Urban, M., et al., 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput., 22(10):761–767. [doi:10.1016/j.imavis.2004.02.006]
Article Google Scholar
Mikolajczyk, K., Schmid, C., 2004. Scale & affine invariant interest point detectors. Int. J. Comput. Vis., 60(1):63–86. [doi:10.1023/B:VISI.0000027790.02288.f2]
Article Google Scholar
Oliva, A., Torralba, A., 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis., 42(3):145–175. [doi:10.1023/A:1011139631724]
Article MATH Google Scholar
Qi, X.B., Xiao, R., Li, C.G., et al., 2014. Pairwise rotation invariant co-occurrence local binary pattern. IEEE Trans. Patt. Anal. Mach. Intell., 36(11):2199–2213. [doi:10.1109/TPAMI.2014.2316826]
Article Google Scholar
Quelhas, P., Monay, F., Odobez, J., et al., 2007. A thousand words in a scene. IEEE Trans. Patt. Anal. Mach. Intell., 29(9):1575–1589. [doi:10.1109/TPAMI.2007.1155]
Article Google Scholar
Shechtman, E., Irani, M., 2007. Matching local selfsimilarities across images and videos. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–8. [doi:10.1109/CVPR.2007.383198]
Wang, Z.L., Feng, J.S., Yan, S.C., et al., 2013. Linear distance coding for image classification. IEEE Trans. Image Process., 22(2):537–548. [doi:10.1109/TIP.2012.2218826]
Article MathSciNet Google Scholar
Wu, J.X., 2012. Efficient HIK SVM learning for image classification. IEEE Trans. Image Process., 21(10):4442–4453. [doi:10.1109/TIP.2012.2207392]
Article MathSciNet Google Scholar
Wu, J.X., Rehg, J.M., 2011. CENTRIST: a visual descriptor for scene categorization. IEEE Trans. Patt. Anal. Mach. Intell., 33(8):1489–1501. [doi:10.1109/TPAMI.2010.224]
Article Google Scholar
Zhang, J.G., Marszalek, M., Lazebnik, S., et al., 2006. Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis., 73(2):213–238. [doi:10.1007/s11263–006-9794–4]
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science & Engineering, Lanzhou University, Lanzhou, 730000, China
Fu-xiang Lu
Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China
Jun Huang

Authors

Fu-xiang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fu-xiang Lu.

Additional information

Project supported by the Fundamental Research Funds for the Central Universities, China (No. lzujbky-2013-41), the National Natural Science Foundation of China (No. 61201446), and the Basic Scientific Research Business Expenses of the Central University and Open Project of Key Laboratory for Magnetism and Magnetic Materials of the Ministry of Education, Lanzhou University (No. LZUMMM2015010)

ORCID: Fu-xiang LU, http://orcid.org/0000-0002-5810-7631

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Fx., Huang, J. Beyond bag of latent topics: spatial pyramid matching for scene category recognition. Frontiers Inf Technol Electronic Eng 16, 817–828 (2015). https://doi.org/10.1631/FITEE.1500070

Download citation

Received: 07 March 2015
Revised: 14 July 2015
Accepted: 21 September 2015
Published: 13 October 2015
Issue Date: October 2015
DOI: https://doi.org/10.1631/FITEE.1500070

Keywords

Document code

A

CLC number

TP391.4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond bag of latent topics: spatial pyramid matching for scene category recognition

Abstract

Access this article

Similar content being viewed by others

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

A Novel Spatial Layout Representation for Object Recognition

Combining Descriptors Extracted from Feature Maps of Deconvolutional Networks and SIFT Descriptors in Scene Image Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Document code

CLC number

Navigation

Beyond bag of latent topics: spatial pyramid matching for scene category recognition

Abstract

Access this article

Similar content being viewed by others

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

A Novel Spatial Layout Representation for Object Recognition

Combining Descriptors Extracted from Feature Maps of Deconvolutional Networks and SIFT Descriptors in Scene Image Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Document code

CLC number

Search

Navigation