Abstract
We consider the task of multiple-output regression where both input and output are high-dimensional. Due to the limited amount of training samples compared to data dimensions, properly imposing loose statistical dependency in learning a regression model is crucial for reliable prediction accuracy. The sparse inverse covariance learning of conditional Gaussian random fields has been recently emerging to achieve this goal, shown to exhibit superior performance to non-sparse approaches. However, one of its main drawbacks is the strong assumption of linear Gaussianity in modeling the input-output relationship. For certain application domains, the assumption might be too restricted and less powerful in representation, and consequently, prediction based on the wrong models can result in suboptimal performance. In this paper, we extend the idea of sparse learning to a non-Gaussian model, especially the powerful conditional Gaussian mixture. For this latent-variable model, we propose a novel sparse inverse covariance learning algorithm based on the expectation-maximization lower-bound optimization technique. It is shown that each M-step reduces to solving the regular sparse inverse covariance estimation of linear Gaussian models, in conjunction with estimating sparse logistic regression. We demonstrate the improved prediction performance of the proposed algorithm over exisitng methods on several datasets.
Similar content being viewed by others
Notes
It states that two random variables z i and z j are conditionally independent given a set of other variables z C (C⊆{1, … , k}∖{i, j}) if and only if nodes i and j are separated by the node set C.
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Andrew G, Gao J (2007) Scalable training of l1-regularized log-linear models. In: International conference on machine learning
Banerjee O, Ghaoui LE, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9:485–516
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Bishop CM, Svensén M (2003) Bayesian hierarchical mixtures of experts. Uncertainty in artificial intelligence
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 9(3):432–441
Greiner R, Zhou W (2002) Structural extension to logistic regression: discriminative parameter learning of belief net classifiers. In: Proceedings of annual meeting of the American Association for Artificial Intelligence
Hammersley JM, Clifford P (1971) Markov field on finite graphs and lattices. Unpublished
Harman HH (1976) Modern factor analysis. University of Chicago Press
Hong T (2012) Global energy forecasting competition. http://www.gefcom.org
Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar P (2011) Sparse inverse covariace matrix estimation using quadratic approximation. Neural Inf Process Sys
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press
Lafferty J, McCallum A, Pereira F (2001) Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning, Williamstown
Lee S, Lee H, Abbeel P, Ng AY (2006) Efficient l1 regularized logistic regression. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI)
Mccallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: International conference on machine learning
Nadas A (1983) A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Acoust Speech Signal Process 31(4):814–817
Ng AY, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems
Oztoprak F, Nocedal J, Rennie S, Olsen PA (2012) Newton-like methods for sparse inverse covariance estimation. Neural Inf Process Sys
Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In: International conference on machine learning
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. CRC Press
Samdani R, Chang KW, Roth D (2014) A discriminative latent variable model for online clustering. In: International conference on machine learning
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Sohn KA, Kim S (2012) Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. Artif Intell Stat
Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4(4):267–373
Vapnik VN (1998) Statistical learning theory. Wiley-Interscience
Woodland P, Povey D (2000) Large scale discriminative training for speech recognition. In: Proceedings of the workshop on automatic speech recognition
Wytock M, Kolter JZ (2013) Sparse Gaussian conditional random fields: algorithms, theory, and application to energy forecasting. In: International conference on machine learning
Yang E, Lozano A, Ravikumar P (2014) Elementary estimators for sparse covariance matrices and other structured moments. In: International conference on machine learning
Yin J, Li H (2011) A sparse conditional gaussian graphical model for analysis of general genomics data. Ann Appl Stat 5:2630–2650
Yuan X, Zhang T (2014) Partial Gaussian graphical model estimation. IEEE Trans Inf Theory 60:1673–1687
Author information
Authors and Affiliations
Corresponding author
Additional information
Compliance with Ethical Standards
This work is supported by National Research Foundation of Korea (NRF-2013R1A1A1076101). The author has no conflict of interest. This research does not involve human participants nor animals. Consent to submit this manuscript has been received tacitly from the author’s institution, Seoul National University of Science & Technology.
Rights and permissions
About this article
Cite this article
Kim, M. Sparse inverse covariance learning of conditional Gaussian mixtures for multiple-output regression. Appl Intell 44, 17–29 (2016). https://doi.org/10.1007/s10489-015-0691-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0691-9