Skip to main content
Log in

Beyond bag of latent topics: spatial pyramid matching for scene category recognition

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

We propose a heterogeneous, mid-level feature based method for recognizing natural scene categories. The proposed feature introduces spatial information among the latent topics by means of spatial pyramid, while the latent topics are obtained by using probabilistic latent semantic analysis (pLSA) based on the bag-of-words representation. The proposed feature always performs better than standard pLSA because the performance of pLSA is adversely affected in many cases due to the loss of spatial information. By combining various interest point detectors and local region descriptors used in the bag-of-words model, the proposed feature can make further improvement for diverse scene category recognition tasks. We also propose a two-stage framework for multi-class classification. In the first stage, for each of possible detector/descriptor pairs, adaptive boosting classifiers are employed to select the most discriminative topics and further compute posterior probabilities of an unknown image from those selected topics. The second stage uses the prod-max rule to combine information coming from multiple sources and assigns the unknown image to the scene category with the highest ‘final’ posterior probability. Experimental results on three benchmark scene datasets show that the proposed method exceeds most state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139. [doi:10.1006/jcss.1997.1504]

    Article  MathSciNet  MATH  Google Scholar 

  • Harris, C., Stephens, M., 1988. A combined corner and edge detector. Alvey Vision Conf., p.147–151. [doi:10.5244/C.2.23]

    Google Scholar 

  • Hofmann, T., 1999. Probabilistic latent semantic indexing. Proc. 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.50–57. [doi:10.1145/312624.312649]

    Chapter  Google Scholar 

  • Hu, Z.H., Cai, Y.Z., Li, Y.G., et al., 2005. Data fusion for fault diagnosis using multi-class support vector machines. J. Zhejiang Univ.-Sci., 6A(10):1030–1039. [doi:10.1631/jzus.2005.A1030]

    Article  Google Scholar 

  • Kadir, T., Brady, M., 2001. Saliency, scale and image description. Int. J. Comput. Vis., 45(2):83–105. [doi:10.1023/A:1012460413855]

    Article  MATH  Google Scholar 

  • Kwitt, R., Vasconcelos, N., Rasiwasia, N., 2012. Scene recognition on the semantic manifold. European Conf. on Computer Vision, p.359–372. [doi:10.1007/978–3-642–33765–9_26]

    Google Scholar 

  • Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169–2178. [doi:10.1109/CVPR.2006.68]

    Google Scholar 

  • Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524–531. [doi:10.1109/CVPR.2005.16]

    Google Scholar 

  • Liu, J.G., Shah, M., 2007. Scene modeling using coclustering. IEEE Int. Conf. on Computer Vision, p.1–7. [doi:10.1109/ICCV.2007.4408866]

    Google Scholar 

  • Lowe, D.G., 2004. Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis., 60(2):91–110. [doi:10.1023/B:VISI.0000029664.99615.94]

    Article  Google Scholar 

  • Lu, F.X., Yang, X.K., Zhang, R., et al., 2009. Image classification based on pyramid histogram of topics. IEEE Int. Conf. on Multimedia and Expo, p.398–401. [doi:10.1109/ICME.2009.5202518]

    Google Scholar 

  • Lu, F.X., Yang, X.K., Lin, W.Y., et al., 2011. Image classification with multiple feature channels. Opt. Eng., 50(5):057210.1–057210.9. [doi:10.1117/1.3582852]

    Google Scholar 

  • Matas, J., Chum, O., Urban, M., et al., 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput., 22(10):761–767. [doi:10.1016/j.imavis.2004.02.006]

    Article  Google Scholar 

  • Mikolajczyk, K., Schmid, C., 2004. Scale & affine invariant interest point detectors. Int. J. Comput. Vis., 60(1):63–86. [doi:10.1023/B:VISI.0000027790.02288.f2]

    Article  Google Scholar 

  • Oliva, A., Torralba, A., 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis., 42(3):145–175. [doi:10.1023/A:1011139631724]

    Article  MATH  Google Scholar 

  • Qi, X.B., Xiao, R., Li, C.G., et al., 2014. Pairwise rotation invariant co-occurrence local binary pattern. IEEE Trans. Patt. Anal. Mach. Intell., 36(11):2199–2213. [doi:10.1109/TPAMI.2014.2316826]

    Article  Google Scholar 

  • Quelhas, P., Monay, F., Odobez, J., et al., 2007. A thousand words in a scene. IEEE Trans. Patt. Anal. Mach. Intell., 29(9):1575–1589. [doi:10.1109/TPAMI.2007.1155]

    Article  Google Scholar 

  • Shechtman, E., Irani, M., 2007. Matching local selfsimilarities across images and videos. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–8. [doi:10.1109/CVPR.2007.383198]

  • Wang, Z.L., Feng, J.S., Yan, S.C., et al., 2013. Linear distance coding for image classification. IEEE Trans. Image Process., 22(2):537–548. [doi:10.1109/TIP.2012.2218826]

    Article  MathSciNet  Google Scholar 

  • Wu, J.X., 2012. Efficient HIK SVM learning for image classification. IEEE Trans. Image Process., 21(10):4442–4453. [doi:10.1109/TIP.2012.2207392]

    Article  MathSciNet  Google Scholar 

  • Wu, J.X., Rehg, J.M., 2011. CENTRIST: a visual descriptor for scene categorization. IEEE Trans. Patt. Anal. Mach. Intell., 33(8):1489–1501. [doi:10.1109/TPAMI.2010.224]

    Article  Google Scholar 

  • Zhang, J.G., Marszalek, M., Lazebnik, S., et al., 2006. Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis., 73(2):213–238. [doi:10.1007/s11263–006-9794–4]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fu-xiang Lu.

Additional information

Project supported by the Fundamental Research Funds for the Central Universities, China (No. lzujbky-2013-41), the National Natural Science Foundation of China (No. 61201446), and the Basic Scientific Research Business Expenses of the Central University and Open Project of Key Laboratory for Magnetism and Magnetic Materials of the Ministry of Education, Lanzhou University (No. LZUMMM2015010)

ORCID: Fu-xiang LU, http://orcid.org/0000-0002-5810-7631

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Fx., Huang, J. Beyond bag of latent topics: spatial pyramid matching for scene category recognition. Frontiers Inf Technol Electronic Eng 16, 817–828 (2015). https://doi.org/10.1631/FITEE.1500070

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500070

Keywords

Document code

CLC number

Navigation