Skip to main content
Log in

Sparse inverse covariance learning of conditional Gaussian mixtures for multiple-output regression

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

We consider the task of multiple-output regression where both input and output are high-dimensional. Due to the limited amount of training samples compared to data dimensions, properly imposing loose statistical dependency in learning a regression model is crucial for reliable prediction accuracy. The sparse inverse covariance learning of conditional Gaussian random fields has been recently emerging to achieve this goal, shown to exhibit superior performance to non-sparse approaches. However, one of its main drawbacks is the strong assumption of linear Gaussianity in modeling the input-output relationship. For certain application domains, the assumption might be too restricted and less powerful in representation, and consequently, prediction based on the wrong models can result in suboptimal performance. In this paper, we extend the idea of sparse learning to a non-Gaussian model, especially the powerful conditional Gaussian mixture. For this latent-variable model, we propose a novel sparse inverse covariance learning algorithm based on the expectation-maximization lower-bound optimization technique. It is shown that each M-step reduces to solving the regular sparse inverse covariance estimation of linear Gaussian models, in conjunction with estimating sparse logistic regression. We demonstrate the improved prediction performance of the proposed algorithm over exisitng methods on several datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. It states that two random variables z i and z j are conditionally independent given a set of other variables z C (C⊆{1, … , k}∖{i, j}) if and only if nodes i and j are separated by the node set C.

  2. http://mocap.cs.cmu.edu

References

  1. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723

    Article  MATH  MathSciNet  Google Scholar 

  2. Andrew G, Gao J (2007) Scalable training of l1-regularized log-linear models. In: International conference on machine learning

  3. Banerjee O, Ghaoui LE, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9:485–516

    MATH  MathSciNet  Google Scholar 

  4. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202

    Article  MATH  MathSciNet  Google Scholar 

  5. Bishop CM, Svensén M (2003) Bayesian hierarchical mixtures of experts. Uncertainty in artificial intelligence

  6. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38

    MATH  MathSciNet  Google Scholar 

  7. Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 9(3):432–441

    Article  MATH  Google Scholar 

  8. Greiner R, Zhou W (2002) Structural extension to logistic regression: discriminative parameter learning of belief net classifiers. In: Proceedings of annual meeting of the American Association for Artificial Intelligence

  9. Hammersley JM, Clifford P (1971) Markov field on finite graphs and lattices. Unpublished

  10. Harman HH (1976) Modern factor analysis. University of Chicago Press

  11. Hong T (2012) Global energy forecasting competition. http://www.gefcom.org

  12. Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar P (2011) Sparse inverse covariace matrix estimation using quadratic approximation. Neural Inf Process Sys

  13. Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214

    Article  Google Scholar 

  14. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press

  15. Lafferty J, McCallum A, Pereira F (2001) Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: International conference on machine learning, Williamstown

  16. Lee S, Lee H, Abbeel P, Ng AY (2006) Efficient l1 regularized logistic regression. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI)

  17. Mccallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: International conference on machine learning

  18. Nadas A (1983) A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Acoust Speech Signal Process 31(4):814–817

    Article  Google Scholar 

  19. Ng AY, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems

  20. Oztoprak F, Nocedal J, Rennie S, Olsen PA (2012) Newton-like methods for sparse inverse covariance estimation. Neural Inf Process Sys

  21. Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In: International conference on machine learning

  22. Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. CRC Press

  23. Samdani R, Chang KW, Roth D (2014) A discriminative latent variable model for online clustering. In: International conference on machine learning

  24. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MATH  Google Scholar 

  25. Sohn KA, Kim S (2012) Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. Artif Intell Stat

  26. Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4(4):267–373

    Article  Google Scholar 

  27. Vapnik VN (1998) Statistical learning theory. Wiley-Interscience

  28. Woodland P, Povey D (2000) Large scale discriminative training for speech recognition. In: Proceedings of the workshop on automatic speech recognition

  29. Wytock M, Kolter JZ (2013) Sparse Gaussian conditional random fields: algorithms, theory, and application to energy forecasting. In: International conference on machine learning

  30. Yang E, Lozano A, Ravikumar P (2014) Elementary estimators for sparse covariance matrices and other structured moments. In: International conference on machine learning

  31. Yin J, Li H (2011) A sparse conditional gaussian graphical model for analysis of general genomics data. Ann Appl Stat 5:2630–2650

    Article  MATH  MathSciNet  Google Scholar 

  32. Yuan X, Zhang T (2014) Partial Gaussian graphical model estimation. IEEE Trans Inf Theory 60:1673–1687

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minyoung Kim.

Additional information

Compliance with Ethical Standards

This work is supported by National Research Foundation of Korea (NRF-2013R1A1A1076101). The author has no conflict of interest. This research does not involve human participants nor animals. Consent to submit this manuscript has been received tacitly from the author’s institution, Seoul National University of Science & Technology.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, M. Sparse inverse covariance learning of conditional Gaussian mixtures for multiple-output regression. Appl Intell 44, 17–29 (2016). https://doi.org/10.1007/s10489-015-0691-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0691-9

Keywords

Navigation