Skip to main content
Log in

Sparse Output Coding for Scalable Visual Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Allwein, E., Schapire, R., & Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research, 1, 113–141.

    MathSciNet  MATH  Google Scholar 

  • Bakker, B., & Heskes, T. (2003). Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4, 83–99.

    MATH  Google Scholar 

  • Bengio, S., Weston, J., & Grangier, D. (2010). Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, pp. 163–171.

  • Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR ’12).

  • Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., & Strehl, A. (2009). Conditional probability tree estimation analysis and algorithms. In Conference in Uncertainty in Artificial Intelligence (UAI).

  • Beygelzimer, A., Langford, J., & Ravikumar, P. (2009). Error-correcting tournaments. In International conference on algorithmic learning theory (ALT).

  • Binder, A., Mller, K. -R., & Kawanabe, M. (2011). On taxonomies for multi-class image categorization. International Journal of Computer Vision, 1–21.

  • Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In COMPSTAT.

  • Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends in Machine Learning, 3(1), 1–122.

    Article  MATH  Google Scholar 

  • Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32, 13–47.

    Article  MATH  Google Scholar 

  • Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In CIKM.

  • Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 2, 265–292.

    MATH  Google Scholar 

  • Dekel, O., Keshet, J., & Singer, Y. (2004). Large margin hierarchical classification. In ICML.

  • Deng, J., Berg, A., & Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval. In CVPR.

  • Deng, J., Dong, W., Socher, R., Li, L. -J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR).

  • Deng, J., Satheesh, S., Berg, A., & Fei-Fei, L. (2011). Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS.

  • Dietterich, T., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.

    MATH  Google Scholar 

  • Eckstein, J., & Bertsekas, D. (1992). On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293–318.

    Article  MathSciNet  MATH  Google Scholar 

  • Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.

    Article  Google Scholar 

  • Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision.

  • Fergus, R., Bernal, H., Weiss, Y., & Torralba, A. (2010). Semantic label sharing for learning with many categories. In ECCV. Berlin: Springer.

  • Gabay, D., & Mercier, B. (1976). A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers and Mathematics with Applications, 2(1), 17–40.

    Article  MATH  Google Scholar 

  • Gao, T., & Koller, D. (2011). Discriminative learning of relaxed hierarchy for large-scale visual recognition. In International Conference on Computer Vision (ICCV).

  • Gao, T., & Koller, D. (2011). Multiclass boosting with hinge loss based on output coding. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) .

  • Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology.

  • Haussler, D. (1999). Convolution kernels on discrete structures. Technical report.

  • Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Proceedings of NIPS.

  • Jacob, L., Bach, F., & Vert, J. -P. (2008). Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems NIPS.

  • Koller, D., & Sahami, M. (1997). Hierarchically classifying docuemnts using very few words. In ICML.

  • Kosmopoulos, A., Gaussier, E., Paliouras, G., & Aseervatham, S. (2010). The ecir 2010 large scale hierarchical classification workshop. SIGIR Forum, 44(1), 23–32.

    Article  Google Scholar 

  • Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision (ICCV).

  • Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012). Building high-level features using large scale unsupervised learning. In ICML.

  • Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A highlevel image representation for scene classification and semantic feature sparsification. In Proceedings of NIPS.

  • Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., & Huang. T. (2011). Large-scale image classification: fast feature extraction and svm training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.

    Article  Google Scholar 

  • Nilsson, N. (1965). Learning Machines. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Parsana, M., Bhattacharya, S., Bhattacharyya, C., & Ramakrishnan, K. (2007). Kernels on attributed pointsets with applications. In Advances in Neural Information Processing Systems (NIPS).

  • Passerini, A., Pontil, M., & Frasconi, P. (2004). New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 15(1), 45–54.

    Article  Google Scholar 

  • Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Póczos, B., Xiong, L., & Schneider, J. (2011). Nonparametric divergence estimation with applications to machine learning on distributions. In UAI.

  • Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ecoc: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6), 1001–1007.

    Google Scholar 

  • Rastegari, M., Farhadi, A., & Forsyth, D. (2012). Attribute discovery via predictable discriminative binary codes. In Computer Vision (ECCV). Berlin: Springer.

  • Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. The Journal of Machine Learning Research, 5, 101–141.

  • Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.

    Article  Google Scholar 

  • Sanchez, Jorge, Perronnin, Florent, Mensink, Thomas, & Verbeek, Jakob. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.

    Article  MathSciNet  MATH  Google Scholar 

  • Schapire, R. (1997). Using output codes to boost multiclass learing problems. In ICML .

  • Schapire, R., & Freund, Y. (2012). Boosting: Foundations and algorithms., Adaptive computation and machine learning series Cambridge: MIT Press.

    MATH  Google Scholar 

  • Torralba, A., Fergus, R., & Freeman, W. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.

    Article  Google Scholar 

  • Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In Computer Vision (ECCV).

  • Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr using stochastic intersection kernel machines. In IEEE 12th International Conference on Computer Vision (ICCV).

  • Weinberger, K., & Chapelle, O. (2008). Large margin taxonomy embedding for document categorization. In Advances in Neural Information Processing Systems (NIPS).

  • Wen, Z., & Yin, W. (2012). A feasible method for optimization with orthogonality constraints. Mathematical Programming, pp. 1–38.

  • Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: scaling up to large vocabulary image annotation. In IJCAI.

  • Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Zhang, X., Liang, L., & Shum, H. (2009). Spectral error correcting output codes for efficient multiclass recognition. In 12th International Conference on Computer Vision (ICCV).

  • Zhang, Y., & Schneider, J. (2012). Maximum margin output coding. In ICML.

  • Zhao, B., & Xing, E. (2013). Sparse output coding for large-scale visual recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Zhou, D., Xiao, L., & Wu, M. (2011). Hierarchical classification via orthogonal transfer. In Proceedings of the 28th International Conference on Machine Learning (ICML).

  • Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Zhao.

Additional information

Communicated by Antonio Torralba and Alexei Efros.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, B., Xing, E.P. Sparse Output Coding for Scalable Visual Recognition. Int J Comput Vis 119, 60–75 (2016). https://doi.org/10.1007/s11263-015-0839-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-015-0839-4

Keywords

Navigation