Elsevier

Pattern Recognition

Volume 60, December 2016, Pages 761-769
Pattern Recognition

Sparse conditional copula models for structured output regression

https://doi.org/10.1016/j.patcog.2016.03.027Get rights and content

Highlights

  • Sparse non-linear, non-Gaussian density modeling by conditional copula.

  • Loose output correlation estimation by sparse copula inverse covariance learning.

  • Efficient alternating optimization method for marginals and copula.

  • Superior to existing multiple output regression methods on several datasets.

Abstract

We deal with the multiple output regression task where the central theme is to capture the sparse output correlation among the output variables. Sparse inverse covariance learning of linear Gaussian conditional models has been recently studied, shown to achieve superb prediction performance. However, it can fail when the underlying true input–output process is non-Gaussian and/or non-linear. We introduce a novel sparse conditional copula model to represent the joint density of the output variables. By incorporating a Gaussian copula function, yet modeling univariate marginal densities by (non-Gaussian) mixtures of experts, we achieve high flexibility in representation that admits non-linear and non-Gaussian densities. We then propose a sparse learning method for this copula-based model that effectively imposes sparsity in the conditional dependency among output variables. The learning optimization is efficient as it can be decomposed into gradient-descent marginal density estimation and the sparse inverse covariance learning for the copula function. Improved performance of the proposed approach is demonstrated on several interesting image/vision tasks with high dimensions.

Introduction

We consider the multiple output (or vector output) regression problem where the goal is to predict multiple response variables from input covariates. Contrary to the conventional single (scalar) output regression in statistics and machine learning fields, the central aspect is to capture the statistical correlation among the output variables efficiently. In this sense, the simple strategy of applying single output regression estimation independently for each of the output variables is considered to be suboptimal since it completely ignores the output correlation in learning a predictive model. In the machine learning literature, there has been considerable recent research work on estimating more accurate multiple output regressors beyond the independent treatment [1], [2], [3].

In the practitioner׳s point of view, effective output correlation modeling in regression is highly beneficial for several application problem. For instance, for the task of denoising hand-written character images, the output vector comprises of pixel intensities of a character image, in which neighboring dimensions tend to have similar values (as either foreground character pixels or background), clearly exhibiting strong conditional dependency patterns among the output variables. Also in computer vision, in the motion estimation problem predicting motions for next few frames, an instance of typical forecasting problems, as the image features at two consecutive frames typically conform to the motion smoothness constraint, the data contain high statistical correlation among the output variables.

However, for situations where the input/output data are high dimensional, one has to trade off the faithful output correlation modeling against model complexity. One sensible approach is to confine the model parameter space with sparseness constraints, which motivates the recently emerging sparse inverse covariance estimation [4], [5], [6], [7], [8], [9]. Under the Gaussian random field models (either joint or conditional densities), these approaches attempt to impose the sparseness of the inverse covariance matrix based on the following fact: a Gaussian density factorizes into pairwise potentials on output variables (say, yi and yj), with the corresponding coefficient equal (up to constant) to the (i,j) component of the inverse covariance matrix. Therefore, having many zero entries in the inverse covariance leads to a loose statistical dependency structure.

While the sparse inverse covariance estimation has achieved superb prediction performance, the main weak point is that most approaches are built on the strong assumption of linear Gaussianity of output given input. Although the assumption leads to convex optimization, the linear Gaussian model family is less flexible and considerably restricted in representational capacity. In the sequel, when the true data generating process is far from linear Gaussian, it may suffer from inaccurate prediction.

In this paper, we propose a novel sparse copula-based density model that admits much richer non-Gaussian and non-linear density models. The copula model is studied considerably in statistics community [10], [11], [12], [13], [14], [15], while its application to machine learning is relatively rare. A main difficulty in multivariate density modeling is to choose a proper function family beyond the multivariate Gaussian, and the copula alleviates it by decoupling the task into two subtasks: (i) build marginal distributions for individual variables which is easy due to univariate modeling, and (ii) model the inter-correlation among the variables by the so-called copula function. The copula function can be any function that only satisfies certain uniform marginal constraints.

One may argue that defining the copula function is as difficult as choosing a multivariate density. However, the most popular strategy is to adopt the Gaussian copula function. It is known that as far as the marginal densities are not linear Gaussians, the resulting density is non-Gaussian. Moreover, by choosing sensible (non-Gaussian) marginal densities, the model can be endowed with high representational power beyond Gaussian, hence providing a great deal of flexibility in modeling a non-linear and non-Gaussian multivariate distribution.

Indeed the proposed model employs, as the univariate marginal densities, the mixture of sigmoid-weighted Gaussians. Also known as the mixture of experts [16], [17], it divides the input space into different regimes, each delineated by multiple hyperplanes, and in each regime, the output variable conforms to a mixture of Gaussians, a rich density family that can approximate any density with high precision.

In addition to the enriched representational capacity with flexible density modeling, the intriguing property of the proposed model is that the conditional independence among the output variables can be directly imposed by a sparse inverse covariance in the copula function. Imposing sparseness of the inverse covariance in the copula function in learning a density model is the central idea in our approach, which is beneficial for capturing the most salient correlation in the output variables for high-dimensional data scenarios. Moreover we derive that the sparse inverse copula covariance learning with parameters of marginal densities fixed, becomes an instance of convex optimization.1

Furthermore, the learning problem can be framed exactly into the existing sparse inverse covariance learning for linear Gaussian models by replacing the empirical second-order moments with the statistics based on the marginal copula-linked features. In the computational perspective, we can benefit from the existing fast-convergent sparse inverse covariance learning algorithms (e.g., [9]) without any modification. In a nutshell, our copula-based sparse density model is endowed with rich representational power, and at the same time, avoids overfitting by loose statistical dependency modeling through sparse inverse covariance in the copula function. This promising aspect is indeed demonstrated on several interesting real-world multiple output regression tasks.

The paper is organized as follows: after describing the formal problem setup and notation in Section 1.1, we briefly discuss some recent related work on sparse inverse covariance learning for Gaussian models in Section 2. Then in Section 3, our conditional copula model is introduced, for which we propose the sparse learning algorithm based on block coordinate descent. In Section 4 the experimental results on several regression tasks, contrasting the proposed model against existing approaches are provided.

The multiple output regression is considered where the main goal is to predict the response vector yRp from the input feature vector (or the covariates) xRd. One straightforward approach can be applying standard scalar-output regression, estimating a regression model for each output dimension independently. However, it can lead to a suboptimal solution in that the underlying inter-correlation among the output (random) variables is ignored. Properly capturing the statistical dependency of the response variables is the key to yield better prediction models. We are particularly interested in situations where the input/output dimensions (d and p) are relatively large compared to the training sample size. This implies that one can have potentially overfitted models unless certain regularization or constraints are properly imposed on the model space.

For notations, we use boldfaced symbols for vectors and matrices, and plain letters for scalar values. Also, yi, for i=1,,p, indicates the i-th entry of y, while yC for a set C{1,,p} stands for the set of variables in y whose indices belong to C. We denote by N(z;m,V) the multivariate Gaussian density with mean m and covariance V. For a matrix A, Ai,j indicates the (i,j)-entry of A. The determinant and trace of a positive definite matrix Σ (i.e., Σ0) are denoted by det(Σ) and Tr(Σ), respectively.

Section snippets

Related work on sparse inverse covariance learning

In this section we briefly review some related work on sparse inverse covariance learning for Gaussian random field models and related models. Beginning with general sparse Gaussian density estimation (Section 2.1), we describe its extension to conditional linear Gaussian models for multiple output prediction tasks (Section 2.2).

Sparse learning of conditional copula models

The proposed sparse learning algorithm for a conditional copula density model is described. We begin with introducing our model that incorporates Gaussian mixtures for the marginal densities to yield a non-Gaussian, non-linear model. Then we show that within the proposed model, the conditional independency of the output variables is directly controlled by the sparseness of the inverse covariance in the copula function. Based on this observation, we formulate an L1-regularized sparse inverse

Evaluations

In this section we empirically demonstrate the prediction performance of the proposed approach on several synthetic and real-world multiple output regression problems.

The performance of the proposed sparse copula-based density model is compared with the state-of-the-art approaches based on sparse inverse covariance estimation of Gaussian models. We summarize the competing methods as follows.

  • SpCopula: The proposed sparse copula-based conditional density model. We follow the alternating

Conclusion

In this paper we have proposed a novel sparse conditional density model based on the copula modeling. Compared to existing sparse inverse covariance learning approaches that are built on the strong linear Gaussian model assumption, our model enjoys high representational power, while conditional independency of the output variables is directly controlled and imposed by sparse learning of the inverse covariance in the copula function. The model thus has capability of representing complex

Conflict of Interest

The author has no conflict of interest.

Acknowledgments

This study is supported by National Research Foundation of Korea (NRF-2013R1A1A1076101).

Minyoung Kim received the BS and MS degrees both in Computer Science and Engineering in Seoul National University, South Korea. He earned the PhD degree in Computer Science from Rutgers University in 2008. From 2009 to 2010 he was a postdoctoral researcher at the Robotics Institute of Carnegie Mellon University. He is currently an Assistant Professor in the Department of Electronics and IT Media Engineering at Seoul National University of Science and Technology in Korea. His primary research

References (38)

  • R. Jenatton et al.

    Structured variable selection with sparsity-inducing norms

    J. Mach. Learn. Res.

    (2011)
  • S. Kim et al.

    Statistical estimation of correlated genome associations to a quantitative trait network

    PLoS Genet.

    (2009)
  • G. Obozinski et al.

    Support union recovery in high-dimensional multivariate regression

    Ann. Stat.

    (2011)
  • O. Banerjee et al.

    Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data

    J. Mach. Learn. Res.

    (2008)
  • J. Friedman et al.

    Sparse inverse covariance estimation with the graphical LASSO

    Biostatistics

    (2008)
  • C.-J. Hsieh, M.A. Sustik, I.S. Dhillon, P. Ravikumar, Sparse inverse covariance matrix estimation using quadratic...
  • X. Yuan et al.

    Partial Gaussian graphical model estimation

    IEEE Trans. Inf. Theory

    (2014)
  • K.-A. Sohn, S. Kim, Joint estimation of structured sparsity and output structure in multiple-output regression via...
  • M. Wytock, J. Z. Kolter, Sparse Gaussian conditional random fields: algorithms, theory, and application to energy...
  • H. Joe

    Multivariate Models and Multivariate Dependence Concepts

    (1997)
  • R.B. Nelsen

    An Introduction to Copulas

    (1999)
  • P. Embrechts et al.

    Correlation and dependence properties in risk managementproperties and Pitfalls

  • U. Cherubini et al.

    Copula Methods in Finance

    (2004)
  • D. Thompson et al.

    Estimating joint flow probabilities at stream confluences using copulas

    Transp. Res. Rec.

    (2011)
  • C. Schölzel et al.

    Multivariate non-normally distributed random variables in climate researchintroduction to the copula approach

    Nonlinear Process. Geophys.

    (2008)
  • M.I. Jordan et al.

    Hierarchical mixtures of experts and the EM algorithm

    Neural Comput.

    (1994)
  • C.M. Bishop, M. Svensén, Bayesian hierarchical mixtures of experts, in: Uncertainty in Artificial Intelligence,...
  • H. Rue et al.

    Gaussian Markov Random FieldsTheory and Applications

    (2005)
  • J.M. Hammersley, P. Clifford, Markov field on finite graphs and lattices, 1971,...
  • Cited by (2)

    • Recent development in copula and its applications to the energy, forestry and environmental sciences

      2019, International Journal of Hydrogen Energy
      Citation Excerpt :

      Several studies have examined models selected for different fields of endeavor. For example [51]; tries to capture the sparse output correlation among the output variables in sparse conditional copula models. Kim finds that the model can represent a complex input/output relationship without over-fitting and demonstrates the superiority of the sparse copula model in prediction performance via several synthetic and real-world multiple output regression problems.

    Minyoung Kim received the BS and MS degrees both in Computer Science and Engineering in Seoul National University, South Korea. He earned the PhD degree in Computer Science from Rutgers University in 2008. From 2009 to 2010 he was a postdoctoral researcher at the Robotics Institute of Carnegie Mellon University. He is currently an Assistant Professor in the Department of Electronics and IT Media Engineering at Seoul National University of Science and Technology in Korea. His primary research interest is machine learning and computer vision. His research focus includes graphical models, motion estimation/tracking, discriminative models/learning, kernel methods, and dimensionality reduction.

    View full text