Sparse Output Coding for Scalable Visual Recognition

Zhao, Bin; Xing, Eric P.

doi:10.1007/s11263-015-0839-4

Sparse Output Coding for Scalable Visual Recognition

Published: 26 June 2015

Volume 119, pages 60–75, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Bin Zhao¹ &
Eric P. Xing¹

752 Accesses
3 Citations
Explore all metrics

Abstract

Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allwein, E., Schapire, R., & Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research, 1, 113–141.
MathSciNet MATH Google Scholar
Bakker, B., & Heskes, T. (2003). Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4, 83–99.
MATH Google Scholar
Bengio, S., Weston, J., & Grangier, D. (2010). Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, pp. 163–171.
Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR ’12).
Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., & Strehl, A. (2009). Conditional probability tree estimation analysis and algorithms. In Conference in Uncertainty in Artificial Intelligence (UAI).
Beygelzimer, A., Langford, J., & Ravikumar, P. (2009). Error-correcting tournaments. In International conference on algorithmic learning theory (ALT).
Binder, A., Mller, K. -R., & Kawanabe, M. (2011). On taxonomies for multi-class image categorization. International Journal of Computer Vision, 1–21.
Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In IEEE conference on computer vision and pattern recognition (CVPR).
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In COMPSTAT.
Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends in Machine Learning, 3(1), 1–122.
Article MATH Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32, 13–47.
Article MATH Google Scholar
Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In CIKM.
Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 2, 265–292.
MATH Google Scholar
Dekel, O., Keshet, J., & Singer, Y. (2004). Large margin hierarchical classification. In ICML.
Deng, J., Berg, A., & Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval. In CVPR.
Deng, J., Dong, W., Socher, R., Li, L. -J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR).
Deng, J., Satheesh, S., Berg, A., & Fei-Fei, L. (2011). Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS.
Dietterich, T., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
MATH Google Scholar
Eckstein, J., & Bertsekas, D. (1992). On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293–318.
Article MathSciNet MATH Google Scholar
Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.
Article Google Scholar
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision.
Fergus, R., Bernal, H., Weiss, Y., & Torralba, A. (2010). Semantic label sharing for learning with many categories. In ECCV. Berlin: Springer.
Gabay, D., & Mercier, B. (1976). A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers and Mathematics with Applications, 2(1), 17–40.
Article MATH Google Scholar
Gao, T., & Koller, D. (2011). Discriminative learning of relaxed hierarchy for large-scale visual recognition. In International Conference on Computer Vision (ICCV).
Gao, T., & Koller, D. (2011). Multiclass boosting with hinge loss based on output coding. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) .
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology.
Haussler, D. (1999). Convolution kernels on discrete structures. Technical report.
Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Proceedings of NIPS.
Jacob, L., Bach, F., & Vert, J. -P. (2008). Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems NIPS.
Koller, D., & Sahami, M. (1997). Hierarchically classifying docuemnts using very few words. In ICML.
Kosmopoulos, A., Gaussier, E., Paliouras, G., & Aseervatham, S. (2010). The ecir 2010 large scale hierarchical classification workshop. SIGIR Forum, 44(1), 23–32.
Article Google Scholar
Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision (ICCV).
Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012). Building high-level features using large scale unsupervised learning. In ICML.
Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A highlevel image representation for scene classification and semantic feature sparsification. In Proceedings of NIPS.
Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., & Huang. T. (2011). Large-scale image classification: fast feature extraction and svm training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Article Google Scholar
Nilsson, N. (1965). Learning Machines. New York: McGraw-Hill.
MATH Google Scholar
Parsana, M., Bhattacharya, S., Bhattacharyya, C., & Ramakrishnan, K. (2007). Kernels on attributed pointsets with applications. In Advances in Neural Information Processing Systems (NIPS).
Passerini, A., Pontil, M., & Frasconi, P. (2004). New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 15(1), 45–54.
Article Google Scholar
Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Póczos, B., Xiong, L., & Schneider, J. (2011). Nonparametric divergence estimation with applications to machine learning on distributions. In UAI.
Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ecoc: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6), 1001–1007.
Google Scholar
Rastegari, M., Farhadi, A., & Forsyth, D. (2012). Attribute discovery via predictable discriminative binary codes. In Computer Vision (ECCV). Berlin: Springer.
Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. The Journal of Machine Learning Research, 5, 101–141.
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.
Article Google Scholar
Sanchez, Jorge, Perronnin, Florent, Mensink, Thomas, & Verbeek, Jakob. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.
Article MathSciNet MATH Google Scholar
Schapire, R. (1997). Using output codes to boost multiclass learing problems. In ICML .
Schapire, R., & Freund, Y. (2012). Boosting: Foundations and algorithms., Adaptive computation and machine learning series Cambridge: MIT Press.
MATH Google Scholar
Torralba, A., Fergus, R., & Freeman, W. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.
Article Google Scholar
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In Computer Vision (ECCV).
Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr using stochastic intersection kernel machines. In IEEE 12th International Conference on Computer Vision (ICCV).
Weinberger, K., & Chapelle, O. (2008). Large margin taxonomy embedding for document categorization. In Advances in Neural Information Processing Systems (NIPS).
Wen, Z., & Yin, W. (2012). A feasible method for optimization with orthogonality constraints. Mathematical Programming, pp. 1–38.
Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: scaling up to large vocabulary image annotation. In IJCAI.
Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zhang, X., Liang, L., & Shum, H. (2009). Spectral error correcting output codes for efficient multiclass recognition. In 12th International Conference on Computer Vision (ICCV).
Zhang, Y., & Schneider, J. (2012). Maximum margin output coding. In ICML.
Zhao, B., & Xing, E. (2013). Sparse output coding for large-scale visual recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zhou, D., Xiao, L., & Wu, M. (2011). Hierarchical classification via orthogonal transfer. In Proceedings of the 28th International Conference on Machine Learning (ICML).
Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML.

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
Bin Zhao & Eric P. Xing

Authors

Bin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Eric P. Xing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Zhao.

Additional information

Communicated by Antonio Torralba and Alexei Efros.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, B., Xing, E.P. Sparse Output Coding for Scalable Visual Recognition. Int J Comput Vis 119, 60–75 (2016). https://doi.org/10.1007/s11263-015-0839-4

Download citation

Received: 15 May 2013
Accepted: 16 June 2015
Published: 26 June 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11263-015-0839-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Output Coding for Scalable Visual Recognition

Abstract

Access this article

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

Learning to Prompt for Vision-Language Models

Sparse Recovery of Hyperspectral Signal from Natural RGB Images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse Output Coding for Scalable Visual Recognition

Abstract

Access this article

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

Learning to Prompt for Vision-Language Models

Sparse Recovery of Hyperspectral Signal from Natural RGB Images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation