Abstract
Currently, the most popular image classification methods are based on global image representations. They face an obvious contradiction between the uncertainty of object position and the global image representation. In this paper, we propose a novel location-aware image classification framework to address this problem. In our framework, an image is classified based on local image representation, and the classifier is learned using an iterative multi-instance learning with a latent SVM, i.e., we infer object location using latent SVM to improve image classification. Our method is very efficient and outperforms the popular spatial pyramid matching (SPM) method and the Region Based Latent SVM (RBLSVM) method [1] on the challenging PASCAL VOC dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yakhnenko, O., Verbeek, J., Schmid, C.: Region-based image classification with a latent SVM model. Research report RR-7665, INRIA (2011)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of ICCV, pp. 1470–1477 (2003)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR (2006)
Grauman, K., Darrell, T.: Pyramid match kernels criminative classification with sets of image features. In: ICCV (2005)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of CVPR (2010)
Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: Proceedings of CVPR (2011)
Xie, L., Tian, Q., Wang, M., Zhang, B.: Spatial pooling of heterogeneous features for image classification. IEEE Trans. Image Process. 23, 1994–2008 (2014)
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Zhang, H.J.: Image classification with kernelized spatial-context. IEEE Trans. Multimedia 12, 278–287 (2010)
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. IEEE Trans. Pattern Anal. Mach. Intell. 89, 31–71 (1997)
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: Proceedings of CVPR (2011)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 509–522 (2002)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proceedings of Advances in Neural Information Processing Systems (2003)
Hong, R., Wang, M., Gao, Y., Tao, D., Li, X., Wu, X.: Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Trans. Cybern. 44, 669–680 (2014)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC2007) (2007), Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Wang, M., Li, G., Lu, Z., Gao, Y., Chua, T.S.: When amazon meets google: product visualization by exploring multiple web sources. ACM Trans. Internet Technol. (TOIT) 12, 12 (2013)
Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21, 4649–4661 (2012)
Wang, X., Feng, B., Bai, X., Liu, W., Latecki, L.J.: Bag of contour fragments for robust shape classification. Pattern Recogn. 47, 2116–2125 (2014)
Zhu, J., Wu, T., Zhu, J., Yang, X., Zhang, W.: Learning reconfigurable scene representation by tangram model. In: 2012 IEEE Workshop on Applications of Computer Vision (WACV), pp. 449–456. IEEE (2012)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2003)
Lee, Y.J., Grauman, K.: Object-graphs for context-aware category discovery. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 34, 346–358 (2011)
Yuan, J., Wu, Y.: Spatial random partition for common visual pattern discovery. In: Proceedings of ICCV (2007)
Zhu, L.L., Lin, C.X., Huang, H., Chen, Y., Yuille, A.L.: Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 759–773. Springer, Heidelberg (2008)
Zhu, J., Zou, W., Yang, X., Zhang, R., Zhou, Q., Zhang, W.: Image classification by hierarchical spatial pooling with partial least squares analysis. In: BMVC, pp. 1–11 (2012)
Khan, I., Roth, P.M., Bischof, H.: Learning object detectors from weakly-labeled internet images. In: OAGM Workshop (2010)
Alexe, B., Deselares, T., Ferrari, V.: What is an object? In: Proceedings of CVPR (2010)
Vijayanarasimhan, S., Grauman, K.: Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In: Proceedings of CVPR (2008)
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of ICCV (2011)
Russakovsky, O., Lin, Y., Yu, K., Fei-Fei, L.: Object-centric spatial pooling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 1–15. Springer, Heidelberg (2012)
Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: International Conference on Computer Vision (2009)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of CVPR (2009)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Quack, T., Ferrari, V., Leibe, B., Gool, L.V.: Efficient mining of frequent and distinctive feature configurations. In: International Conference on Computer Vision (ICCV 2007) (2007)
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC) (2011)
Acknowledgments
This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61503145). This material is also based upon work supported by the NSF under Grants No. IIS-1302164 and OIA-1027897.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, X., Yang, X., Liu, W., Duan, C., Latecki, L.J. (2016). Location-Aware Image Classification. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-27671-7_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)