Sparse conditional copula models for structured output regression
Introduction
We consider the multiple output (or vector output) regression problem where the goal is to predict multiple response variables from input covariates. Contrary to the conventional single (scalar) output regression in statistics and machine learning fields, the central aspect is to capture the statistical correlation among the output variables efficiently. In this sense, the simple strategy of applying single output regression estimation independently for each of the output variables is considered to be suboptimal since it completely ignores the output correlation in learning a predictive model. In the machine learning literature, there has been considerable recent research work on estimating more accurate multiple output regressors beyond the independent treatment [1], [2], [3].
In the practitioner׳s point of view, effective output correlation modeling in regression is highly beneficial for several application problem. For instance, for the task of denoising hand-written character images, the output vector comprises of pixel intensities of a character image, in which neighboring dimensions tend to have similar values (as either foreground character pixels or background), clearly exhibiting strong conditional dependency patterns among the output variables. Also in computer vision, in the motion estimation problem predicting motions for next few frames, an instance of typical forecasting problems, as the image features at two consecutive frames typically conform to the motion smoothness constraint, the data contain high statistical correlation among the output variables.
However, for situations where the input/output data are high dimensional, one has to trade off the faithful output correlation modeling against model complexity. One sensible approach is to confine the model parameter space with sparseness constraints, which motivates the recently emerging sparse inverse covariance estimation [4], [5], [6], [7], [8], [9]. Under the Gaussian random field models (either joint or conditional densities), these approaches attempt to impose the sparseness of the inverse covariance matrix based on the following fact: a Gaussian density factorizes into pairwise potentials on output variables (say, yi and yj), with the corresponding coefficient equal (up to constant) to the (i,j) component of the inverse covariance matrix. Therefore, having many zero entries in the inverse covariance leads to a loose statistical dependency structure.
While the sparse inverse covariance estimation has achieved superb prediction performance, the main weak point is that most approaches are built on the strong assumption of linear Gaussianity of output given input. Although the assumption leads to convex optimization, the linear Gaussian model family is less flexible and considerably restricted in representational capacity. In the sequel, when the true data generating process is far from linear Gaussian, it may suffer from inaccurate prediction.
In this paper, we propose a novel sparse copula-based density model that admits much richer non-Gaussian and non-linear density models. The copula model is studied considerably in statistics community [10], [11], [12], [13], [14], [15], while its application to machine learning is relatively rare. A main difficulty in multivariate density modeling is to choose a proper function family beyond the multivariate Gaussian, and the copula alleviates it by decoupling the task into two subtasks: (i) build marginal distributions for individual variables which is easy due to univariate modeling, and (ii) model the inter-correlation among the variables by the so-called copula function. The copula function can be any function that only satisfies certain uniform marginal constraints.
One may argue that defining the copula function is as difficult as choosing a multivariate density. However, the most popular strategy is to adopt the Gaussian copula function. It is known that as far as the marginal densities are not linear Gaussians, the resulting density is non-Gaussian. Moreover, by choosing sensible (non-Gaussian) marginal densities, the model can be endowed with high representational power beyond Gaussian, hence providing a great deal of flexibility in modeling a non-linear and non-Gaussian multivariate distribution.
Indeed the proposed model employs, as the univariate marginal densities, the mixture of sigmoid-weighted Gaussians. Also known as the mixture of experts [16], [17], it divides the input space into different regimes, each delineated by multiple hyperplanes, and in each regime, the output variable conforms to a mixture of Gaussians, a rich density family that can approximate any density with high precision.
In addition to the enriched representational capacity with flexible density modeling, the intriguing property of the proposed model is that the conditional independence among the output variables can be directly imposed by a sparse inverse covariance in the copula function. Imposing sparseness of the inverse covariance in the copula function in learning a density model is the central idea in our approach, which is beneficial for capturing the most salient correlation in the output variables for high-dimensional data scenarios. Moreover we derive that the sparse inverse copula covariance learning with parameters of marginal densities fixed, becomes an instance of convex optimization.1
Furthermore, the learning problem can be framed exactly into the existing sparse inverse covariance learning for linear Gaussian models by replacing the empirical second-order moments with the statistics based on the marginal copula-linked features. In the computational perspective, we can benefit from the existing fast-convergent sparse inverse covariance learning algorithms (e.g., [9]) without any modification. In a nutshell, our copula-based sparse density model is endowed with rich representational power, and at the same time, avoids overfitting by loose statistical dependency modeling through sparse inverse covariance in the copula function. This promising aspect is indeed demonstrated on several interesting real-world multiple output regression tasks.
The paper is organized as follows: after describing the formal problem setup and notation in Section 1.1, we briefly discuss some recent related work on sparse inverse covariance learning for Gaussian models in Section 2. Then in Section 3, our conditional copula model is introduced, for which we propose the sparse learning algorithm based on block coordinate descent. In Section 4 the experimental results on several regression tasks, contrasting the proposed model against existing approaches are provided.
The multiple output regression is considered where the main goal is to predict the response vector from the input feature vector (or the covariates) . One straightforward approach can be applying standard scalar-output regression, estimating a regression model for each output dimension independently. However, it can lead to a suboptimal solution in that the underlying inter-correlation among the output (random) variables is ignored. Properly capturing the statistical dependency of the response variables is the key to yield better prediction models. We are particularly interested in situations where the input/output dimensions (d and p) are relatively large compared to the training sample size. This implies that one can have potentially overfitted models unless certain regularization or constraints are properly imposed on the model space.
For notations, we use boldfaced symbols for vectors and matrices, and plain letters for scalar values. Also, yi, for , indicates the i-th entry of y, while yC for a set stands for the set of variables in y whose indices belong to C. We denote by the multivariate Gaussian density with mean m and covariance V. For a matrix A, indicates the (i,j)-entry of A. The determinant and trace of a positive definite matrix (i.e., ) are denoted by and , respectively.
Section snippets
Related work on sparse inverse covariance learning
In this section we briefly review some related work on sparse inverse covariance learning for Gaussian random field models and related models. Beginning with general sparse Gaussian density estimation (Section 2.1), we describe its extension to conditional linear Gaussian models for multiple output prediction tasks (Section 2.2).
Sparse learning of conditional copula models
The proposed sparse learning algorithm for a conditional copula density model is described. We begin with introducing our model that incorporates Gaussian mixtures for the marginal densities to yield a non-Gaussian, non-linear model. Then we show that within the proposed model, the conditional independency of the output variables is directly controlled by the sparseness of the inverse covariance in the copula function. Based on this observation, we formulate an L1-regularized sparse inverse
Evaluations
In this section we empirically demonstrate the prediction performance of the proposed approach on several synthetic and real-world multiple output regression problems.
The performance of the proposed sparse copula-based density model is compared with the state-of-the-art approaches based on sparse inverse covariance estimation of Gaussian models. We summarize the competing methods as follows.
- •
SpCopula: The proposed sparse copula-based conditional density model. We follow the alternating
Conclusion
In this paper we have proposed a novel sparse conditional density model based on the copula modeling. Compared to existing sparse inverse covariance learning approaches that are built on the strong linear Gaussian model assumption, our model enjoys high representational power, while conditional independency of the output variables is directly controlled and imposed by sparse learning of the inverse covariance in the copula function. The model thus has capability of representing complex
Conflict of Interest
The author has no conflict of interest.
Acknowledgments
This study is supported by National Research Foundation of Korea (NRF-2013R1A1A1076101).
Minyoung Kim received the BS and MS degrees both in Computer Science and Engineering in Seoul National University, South Korea. He earned the PhD degree in Computer Science from Rutgers University in 2008. From 2009 to 2010 he was a postdoctoral researcher at the Robotics Institute of Carnegie Mellon University. He is currently an Assistant Professor in the Department of Electronics and IT Media Engineering at Seoul National University of Science and Technology in Korea. His primary research
References (38)
- et al.
Structured variable selection with sparsity-inducing norms
J. Mach. Learn. Res.
(2011) - et al.
Statistical estimation of correlated genome associations to a quantitative trait network
PLoS Genet.
(2009) - et al.
Support union recovery in high-dimensional multivariate regression
Ann. Stat.
(2011) - et al.
Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data
J. Mach. Learn. Res.
(2008) - et al.
Sparse inverse covariance estimation with the graphical LASSO
Biostatistics
(2008) - C.-J. Hsieh, M.A. Sustik, I.S. Dhillon, P. Ravikumar, Sparse inverse covariance matrix estimation using quadratic...
- et al.
Partial Gaussian graphical model estimation
IEEE Trans. Inf. Theory
(2014) - K.-A. Sohn, S. Kim, Joint estimation of structured sparsity and output structure in multiple-output regression via...
- M. Wytock, J. Z. Kolter, Sparse Gaussian conditional random fields: algorithms, theory, and application to energy...
Multivariate Models and Multivariate Dependence Concepts
(1997)
An Introduction to Copulas
Correlation and dependence properties in risk managementproperties and Pitfalls
Copula Methods in Finance
Estimating joint flow probabilities at stream confluences using copulas
Transp. Res. Rec.
Multivariate non-normally distributed random variables in climate researchintroduction to the copula approach
Nonlinear Process. Geophys.
Hierarchical mixtures of experts and the EM algorithm
Neural Comput.
Gaussian Markov Random FieldsTheory and Applications
Cited by (2)
Recent development in copula and its applications to the energy, forestry and environmental sciences
2019, International Journal of Hydrogen EnergyCitation Excerpt :Several studies have examined models selected for different fields of endeavor. For example [51]; tries to capture the sparse output correlation among the output variables in sparse conditional copula models. Kim finds that the model can represent a complex input/output relationship without over-fitting and demonstrates the superiority of the sparse copula model in prediction performance via several synthetic and real-world multiple output regression problems.
Development in Copula Applications in Forestry and Environmental Sciences
2020, Forum for Interdisciplinary Mathematics
Minyoung Kim received the BS and MS degrees both in Computer Science and Engineering in Seoul National University, South Korea. He earned the PhD degree in Computer Science from Rutgers University in 2008. From 2009 to 2010 he was a postdoctoral researcher at the Robotics Institute of Carnegie Mellon University. He is currently an Assistant Professor in the Department of Electronics and IT Media Engineering at Seoul National University of Science and Technology in Korea. His primary research interest is machine learning and computer vision. His research focus includes graphical models, motion estimation/tracking, discriminative models/learning, kernel methods, and dimensionality reduction.