Skip to main content
Log in

Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Images usually contain multiple objects that are semantically related to one another. Mapping from low-level visual features to mutually dependent high-level semantics can be formulated as a structured prediction problem. Current statistical models for structured prediction make simplifying assumptions about the underlying output graph structure, such as assuming a low-order Markov chain, because exact inference becomes intractable as the tree-width of the underlying graph increases. Approximate inference algorithms, on the other hand, force one to trade off representational power with computational efficiency. In this paper, we present large margin sigmoid belief networks (LMSBNs) for structured prediction in images. LMSBNs allow a very fast inference algorithm for arbitrary graph structures that runs in polynomial time with high probability. This probability is data-distribution dependent and is maximized in learning. The new approach overcomes the representation-efficiency trade-off in previous models and allows fast structured prediction with complicated graph structures. We present results from applying a fully connected model to semantic image annotation, image retrieval and optical character recognition (OCR) problems, and demonstrate that the proposed approach can yield significant performance gains over current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abdou, S., & Scordilis, M. S. (2004). Beam search pruning in speech recognition using a posterior probability-based confidence measure. Speech Communication, 42(3–4), 409–428.

    Article  Google Scholar 

  • Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.

    MathSciNet  Google Scholar 

  • Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems. Cambridge: MIT Press.

    Google Scholar 

  • Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 161–168).

    Google Scholar 

  • Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37, 1757–1771.

    Article  Google Scholar 

  • Carneiro, G., Chan, A., Moreno, P., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410.

    Article  Google Scholar 

  • Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. L. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research, 9, 1775–1822.

    MathSciNet  MATH  Google Scholar 

  • Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50, 123–132.

    Article  MathSciNet  Google Scholar 

  • Daumé III, H., Langford, J., & Marcu, D. (2009). Search-based structured prediction. Machine Learning Journal, 75(3), 297–325.

    Article  Google Scholar 

  • Doucet, A., de Freitas, N., Murphy, K. P., & Russell, S. J. (2000). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Annual conference on uncertainty in artificial intelligence (pp. 176–183).

    Google Scholar 

  • Duygulu, P., Barnard, K., de Freitas, J., & Forsyth, D. (2006). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), Lecture notes in computer science: Vol. 2353. European conference on computer vision (pp. 349–354). Berlin: Springer.

    Google Scholar 

  • Fan, R. E., & Lin, C. J. (2007). A study on threshold selection for multi-label classification (Tech. rep.). National Taiwan University.

  • Feng, S., Manmatha R., & Lavrenko, V. (2004). Multiple Bernoulli relevance models for image and video annotation. In Computer vision and pattern recognition.

    Google Scholar 

  • Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In International conference on machine learning (pp. 304–311). New York: ACM.

    Google Scholar 

  • Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In International conference on computer vision (pp. 309–316).

    Chapter  Google Scholar 

  • Guo, Y., Wilkinson, D., & Schuurmans, D. (2005). Maximum margin Bayesian networks. In Annual conference on uncertainty in artificial intelligence.

    Google Scholar 

  • Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Computer vision and pattern recognition (pp. 448–453).

    Google Scholar 

  • Hoefel, G., & Elkan, C. (2008). Learning a two-stage SVM/CRF sequence classifier. In ACM conference on information and knowledge management (pp. 271–278). New York, NY, USA.

    Google Scholar 

  • Hsieh, C. J., Chang, K. W., Lin, C. J., Keerthi, S. S., & Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear SVM. In International conference on machine learning (pp. 408–415). New York: ACM.

    Google Scholar 

  • Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural SVMs. Machine Learning, 77(1), 27–59.

    Article  MATH  Google Scholar 

  • Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Cambridge, MA, USA.

  • Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In European conference on computer vision (pp. 185–208).

    Google Scholar 

  • Kulesza, A., & Pereira, F. (2007). Structured learning with approximate inference. In Advances in neural information processing systems.

    Google Scholar 

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning.

    Google Scholar 

  • Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in neural information processing systems. Cambridge: MIT Press.

    Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Computer vision and pattern recognition.

    Google Scholar 

  • LeCun, Y., Chopra, S., Hadsell, R., Huang, F. J., Bakir, G., Hofman, T., Schölkopf, B., Smola, A., & Taskar, B. (Eds.) (2006). A tutorial on energy-based learning. In Predicting structured data. Cambridge: MIT Press.

    Google Scholar 

  • Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42, 218–228.

    Article  MATH  Google Scholar 

  • Lowd, D., & Domingos, P. (2008). Learning arithmetic circuits. In Annual conference on uncertainty in artificial intelligence (pp. 383–392). Corvallis: AUAI Press.

    Google Scholar 

  • Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In European conference on computer vision (pp. 316–329). Berlin: Springer.

    Google Scholar 

  • McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th international conf. on machine learning (pp. 591–598). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In The international conference on image and video retrieval (pp. 42–50). Berlin: Springer.

    Chapter  Google Scholar 

  • Miao, X., & Rao, R. P. (2009). Large margin Boltzmann machines. In International joint conference on artificial intelligence (pp. 1156–1162).

    Google Scholar 

  • Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56(1), 71–113.

    Article  MathSciNet  MATH  Google Scholar 

  • Perez-Cruz, F., Ghahramani, Z., & Pontil, M. (2007). Conditional graphical models. Cambridge: MIT Press.

    Google Scholar 

  • Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In Advances in neural information processing systems (pp. 1097–1104). Cambridge: MIT Press.

    Google Scholar 

  • Rosenberg, D., Klein, D., & Taskar, B. (2007). Mixture-of-parents maximum entropy Markov models. In Annual conference on uncertainty in artificial intelligence (pp. 318–325). Corvallis: AUAI Press.

    Google Scholar 

  • Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1), 157–173.

    Article  Google Scholar 

  • Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. In International conference on machine learning (pp. 807–814). New York: ACM.

    Google Scholar 

  • Shalev-Shwartz, S., Srebro, N., & Sridharan, K. (2008). Fast rates for regularized objectives. In Advances in neural information processing systems.

    Google Scholar 

  • Tadepalli, P., & Natarajan, B. K. (1996). A formal framework for speedup learning from problems and solutions. The Journal of Artificial Intelligence Research, 4, 445–475.

    MathSciNet  MATH  Google Scholar 

  • Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems, Vancouver, Canada.

    Google Scholar 

  • Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning.

    Google Scholar 

  • Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook, pp. 667–685.

    Google Scholar 

  • Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2003). Tree-reweighted belief propagation algorithms and approximate Ml estimation via pseudo-moment matching. In AISTATS.

    Google Scholar 

  • Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2005a). A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51, 2313–2335.

    Article  MathSciNet  Google Scholar 

  • Wainwright, M. J., Jaakola, T., & Willsky, A. S. (2005b). MAP estimation via agreement on trees: message passing and linear programming. IEEE Transactions on Information Theory, 51, 3697–3717.

    Article  Google Scholar 

  • Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In The international conference on image and video retrieval (pp. 507–517). Berlin: Springer.

    Chapter  Google Scholar 

  • Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51, 2282–2312.

    Article  MathSciNet  Google Scholar 

  • Zhang, T. (2001). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xu Miao or Rajesh P. N. Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miao, X., Rao, R.P.N. Fast Structured Prediction Using Large Margin Sigmoid Belief Networks. Int J Comput Vis 99, 302–318 (2012). https://doi.org/10.1007/s11263-011-0423-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0423-5

Keywords

Navigation