Location-Aware Image Classification

Wang, Xinggang; Yang, Xin; Liu, Wenyu; Duan, Chen; Latecki, Longin Jan

doi:10.1007/978-3-319-27671-7_69

Xinggang Wang¹⁹,
Xin Yang¹⁹,
Wenyu Liu¹⁹,
Chen Duan²⁰ &
…
Longin Jan Latecki²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

2985 Accesses
2 Citations

Abstract

Currently, the most popular image classification methods are based on global image representations. They face an obvious contradiction between the uncertainty of object position and the global image representation. In this paper, we propose a novel location-aware image classification framework to address this problem. In our framework, an image is classified based on local image representation, and the classifier is learned using an iterative multi-instance learning with a latent SVM, i.e., we infer object location using latent SVM to improve image classification. Our method is very efficient and outperforms the popular spatial pyramid matching (SPM) method and the Region Based Latent SVM (RBLSVM) method [1] on the challenging PASCAL VOC dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yakhnenko, O., Verbeek, J., Schmid, C.: Region-based image classification with a latent SVM model. Research report RR-7665, INRIA (2011)
Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of ICCV, pp. 1470–1477 (2003)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR (2006)
Google Scholar
Grauman, K., Darrell, T.: Pyramid match kernels criminative classification with sets of image features. In: ICCV (2005)
Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of CVPR (2010)
Google Scholar
Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: Proceedings of CVPR (2011)
Google Scholar
Xie, L., Tian, Q., Wang, M., Zhang, B.: Spatial pooling of heterogeneous features for image classification. IEEE Trans. Image Process. 23, 1994–2008 (2014)
Article MathSciNet Google Scholar
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Zhang, H.J.: Image classification with kernelized spatial-context. IEEE Trans. Multimedia 12, 278–287 (2010)
Article Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. IEEE Trans. Pattern Anal. Mach. Intell. 89, 31–71 (1997)
MATH Google Scholar
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: Proceedings of CVPR (2011)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Article Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 509–522 (2002)
Article Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proceedings of Advances in Neural Information Processing Systems (2003)
Google Scholar
Hong, R., Wang, M., Gao, Y., Tao, D., Li, X., Wu, X.: Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Trans. Cybern. 44, 669–680 (2014)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC2007) (2007), Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Wang, M., Li, G., Lu, Z., Gao, Y., Chua, T.S.: When amazon meets google: product visualization by exploring multiple web sources. ACM Trans. Internet Technol. (TOIT) 12, 12 (2013)
Article Google Scholar
Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21, 4649–4661 (2012)
Article MathSciNet Google Scholar
Wang, X., Feng, B., Bai, X., Liu, W., Latecki, L.J.: Bag of contour fragments for robust shape classification. Pattern Recogn. 47, 2116–2125 (2014)
Article Google Scholar
Zhu, J., Wu, T., Zhu, J., Yang, X., Zhang, W.: Learning reconfigurable scene representation by tangram model. In: 2012 IEEE Workshop on Applications of Computer Vision (WACV), pp. 449–456. IEEE (2012)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2003)
Google Scholar
Lee, Y.J., Grauman, K.: Object-graphs for context-aware category discovery. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 34, 346–358 (2011)
Google Scholar
Yuan, J., Wu, Y.: Spatial random partition for common visual pattern discovery. In: Proceedings of ICCV (2007)
Google Scholar
Zhu, L.L., Lin, C.X., Huang, H., Chen, Y., Yuille, A.L.: Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 759–773. Springer, Heidelberg (2008)
Chapter Google Scholar
Zhu, J., Zou, W., Yang, X., Zhang, R., Zhou, Q., Zhang, W.: Image classification by hierarchical spatial pooling with partial least squares analysis. In: BMVC, pp. 1–11 (2012)
Google Scholar
Khan, I., Roth, P.M., Bischof, H.: Learning object detectors from weakly-labeled internet images. In: OAGM Workshop (2010)
Google Scholar
Alexe, B., Deselares, T., Ferrari, V.: What is an object? In: Proceedings of CVPR (2010)
Google Scholar
Vijayanarasimhan, S., Grauman, K.: Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In: Proceedings of CVPR (2008)
Google Scholar
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of ICCV (2011)
Google Scholar
Russakovsky, O., Lin, Y., Yu, K., Fei-Fei, L.: Object-centric spatial pooling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 1–15. Springer, Heidelberg (2012)
Chapter Google Scholar
Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: International Conference on Computer Vision (2009)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of CVPR (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Article MATH Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Quack, T., Ferrari, V., Leibe, B., Gool, L.V.: Efficient mining of frequent and distinctive feature configurations. In: International Conference on Computer Vision (ICCV 2007) (2007)
Google Scholar
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008)
Chapter Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC) (2011)
Google Scholar

Download references

Acknowledgments

This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61503145). This material is also based upon work supported by the NSF under Grants No. IIS-1302164 and OIA-1027897.

Author information

Authors and Affiliations

School of EIC, Huazhong University of Science and Technology, Wuhan, China
Xinggang Wang, Xin Yang & Wenyu Liu
Wuhan Second Ship Design and Research Institute, Wuhan, China
Chen Duan
Department of CIS, Temple University, Philadelphia, PA, USA
Longin Jan Latecki

Authors

Xinggang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Duan
View author publications
You can also search for this author in PubMed Google Scholar
Longin Jan Latecki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinggang Wang .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Yang, X., Liu, W., Duan, C., Latecki, L.J. (2016). Location-Aware Image Classification. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_69
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics