Can Visual Recognition Benefit from Auxiliary Information in Training?

Zhang, Qilin; Hua, Gang; Liu, Wei; Liu, Zicheng; Zhang, Zhengyou

doi:10.1007/978-3-319-16865-4_5

Can Visual Recognition Benefit from Auxiliary Information in Training?

Qilin Zhang⁵,
Gang Hua⁵,
Wei Liu⁶,
Zicheng Liu⁷ &
…
Zhengyou Zhang⁷

Conference paper
First Online: 01 January 2015

2125 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Abstract

We examine an under-explored visual recognition problem, where we have a main view along with an auxiliary view of visual information present in the training data, but merely the main view is available in the test data. To effectively leverage the auxiliary view to train a stronger classifier, we propose a collaborative auxiliary learning framework based on a new discriminative canonical correlation analysis. This framework reveals a common semantic space shared across both views through enforcing a series of nonlinear projections. Such projections automatically embed the discriminative cues hidden in both views into the common space, and better visual recognition is thus achieved on the test data that stems from only the main view. The efficacy of our proposed auxiliary learning approach is demonstrated through three challenging visual recognition tasks with different kinds of auxiliary information.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Note that Eq. (5) is a linear version of Eq. (7) and has a very similar solution. For conciseness, the solution to Eq. (5) is omitted.
2.
The original form of SVM2K is not directly applicable to the missing view problem.

References

Quanz, B., Huan, J.: Large margin transductive transfer learning. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1327–1336. ACM (2009)
Google Scholar
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM (2007)
Google Scholar
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010)
Chapter Google Scholar
Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1785–1792. IEEE (2011)
Google Scholar
Farquhar, J., Hardoon, D., Meng, H., Shawe-taylor, J.S., Szedmak, S.: Two view learning: Svm-2k, theory and practice. In: Advances in Neural Information Processing Systems, pp. 355–362 (2005)
Google Scholar
Zhang, D., He, J., Liu, Y., Si, L., Lawrence, R.D.: Multi-view transfer learning with a large margin approach. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1208–1216 (2011)
Google Scholar
Qi, Z., Yang, M., Zhang, Z.M., Zhang, Z.: Mining noisy tagging from multi-label space. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1925–1929. ACM (2012)
Google Scholar
Vapnik, V., Vashist, A., Pavlovitch, N.: Learning using hidden information (learning with teacher). In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 3188–3195. IEEE (2009)
Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73, 243–272 (2008)
Article Google Scholar
Tenenhaus, A., Tenenhaus, M.: Regularized generalized canonical correlation analysis. Psychometrika 76, 257–284 (2011)
Article MathSciNet Google Scholar
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004)
Article Google Scholar
Kulis, B., Sustik, M., Dhillon, I.: Learning low-rank kernel matrices. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 505–512. ACM (2006)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Google Scholar
Chen, L., Li, W., Xu, D.: Recognizing rgb images by learning from rgb-d data. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2014)
Google Scholar
Shrivastava, A., Gupta, A.: Building part-based object detectors via 3d geometry. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1745–1752. IEEE (2013)
Google Scholar
Tommasi, T., Quadrianto, N., Caputo, B., Lampert, C.H.: Beyond dataset bias: multi-task unaligned shared knowledge transfer. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 1–15. Springer, Heidelberg (2013)
Chapter Google Scholar
Globerson, A., Roweis, S.: Nightmare at test time: robust learning by feature deletion. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 353–360. ACM (2006)
Google Scholar
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2012)
Google Scholar
Chen, J., Liu, X., Lyu, S.: Boosting with side information. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 563–577. Springer, Heidelberg (2013)
Chapter Google Scholar
Shams, L., Wozny, D.R., Kim, R., Seitz, A.: Influences of multisensory experience on subsequent unisensory processing. Front. Psychol. 2, 264 (2011)
Article Google Scholar
Kim, T.K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1005–1018 (2007)
Article Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011)
Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)
Article Google Scholar
Witten, D.M., Tibshirani, R., et al.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8, 1–27 (2009)
Article MathSciNet Google Scholar
Rupnik, J., Shawe-Taylor, J.: Multi-view canonical correlation analysis. In: Conference on Data Mining and Data Warehouses (SiKDD 2010), pp. 1–4 (2010)
Google Scholar
Loog, M., van Ginneken, B., Duin, R.P.: Dimensionality reduction of image features using the canonical contextual correlation projection. Pattern Recogn. 38, 2409–2418 (2005)
Article Google Scholar
Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 601–608. IEEE (2011)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)
Google Scholar
Brown, M., Susstrunk, S.: Multi-spectral sift for scene category recognition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 177–184. IEEE (2011)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE (2006)
Google Scholar
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for rgb-d based object recognition. ISER, June 2012
Google Scholar

Download references

Acknowledgement

Research reported in this publication was partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR015371. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work is also partly supported by US National Science Foundation Grant IIS 1350763, China National Natural Science Foundation Grant 61228303, GH’s start-up funds form Stevens Institute of Technology, a Google Research Faculty Award, a gift grant from Microsoft Research, and a gift grant from NEC Labs America.

Author information

Authors and Affiliations

Stevens Institute of Technology, Hoboken, NJ, USA
Qilin Zhang & Gang Hua
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Wei Liu
Microsoft Research, Redmond, WA, USA
Zicheng Liu & Zhengyou Zhang

Authors

Qilin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Hua
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zicheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyou Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Hua .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (pdf 227 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Q., Hua, G., Liu, W., Liu, Z., Zhang, Z. (2015). Can Visual Recognition Benefit from Auxiliary Information in Training?. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-16865-4_5
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics