Skip to main content
Log in

Clusterwise elastic-net regression based on a combined information criterion

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Many research questions pertain to a regression problem assuming that the population under study is not homogeneous with respect to the underlying model. In this setting, we propose an original method called Combined Information criterion CLUSterwise elastic-net regression (Ciclus). This method handles several methodological and application-related challenges. It is derived from both the information theory and the microeconomic utility theory and maximizes a well-defined criterion combining three weighted sub-criteria, each being related to a specific aim: getting a parsimonious partition, compact clusters for a better prediction of cluster-membership, and a good within-cluster regression fit. The solving algorithm is monotonously convergent, under mild assumptions. The Ciclus principle provides an innovative solution to two key issues: (i) the automatic optimization of the number of clusters, (ii) the proposal of a prediction model. We applied it to elastic-net regression in order to be able to manage high-dimensional data involving redundant explanatory variables. Ciclus is illustrated through both a simulation study and a real example in the field of omic data, showing how it improves the quality of the prediction and facilitates the interpretation. It should therefore prove useful whenever the data involve a population mixture as for example in biology, social sciences, economics or marketing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ahonen I, Nevalainen J, Larocque D (2019) Prediction with a flexible finite mixture-of-regressions. Comput Stat Data Anal 132:212–224

    Article  MathSciNet  MATH  Google Scholar 

  • Aldana-Bobadilla E, Kuri-Morales A (2015) A clustering method based on the maximum entropy principle. Entropy 151–180

  • Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404

    Article  Google Scholar 

  • Beck G, Azzag H, Bougeard S, Lebbah M, Niang N (2018) A new micro-batch approach for partial least square clusterwise regression. Procedia Comput Sci 144:239–250

    Article  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE T Pattern Anal 22:719–725

    Article  Google Scholar 

  • Biernacki C, Garcia-Escudero L, S I (2020) Special issue on innovations on model based clustering and classification. Adv Data Anal Classif 14(2):231–234

  • Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut Oberwolfach

  • Bougeard S, Abdi H, Saporta G, Niang N (2017) Clusterwise analysis for multiblock component methods. Adv Data Anal Classif 12(2):285–313

    Article  MathSciNet  MATH  Google Scholar 

  • Bougeard S, Cariou V, Saporta G, Niang N (2018) Prediction for regularized clusterwise multiblock regression. Appl Stoch Model Bus 34(6):852–867

    Article  MathSciNet  MATH  Google Scholar 

  • Brusco M, Cradit J, Taschian A (2003) Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. J Mark Res 40:225–234

    Article  Google Scholar 

  • Brusco M, Cradit J, Steinley D, Fox G (2008) Cautionary remarks on the use of clusterwise regression. Multivar Behav Res 43:29–49

    Article  Google Scholar 

  • Bry X, Verron T, Redont P, Cazes P (2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. J Chemom 26:158–169

    Article  Google Scholar 

  • Bry X, Trottier C, Mortier F, Cornu T, Verron T (2016) Supervised component generalized linear regression with multiple explanatory blocks: THEME-SCGLR. In: Vinzi V, Russolillo G, Saporta G, Trinchera L, Abdi H (eds) The multiple facets of partial least squares and related methods, Springer proceedings in mathematics and statistics, pp 141–154

  • Bushel P, Wolfinger R, Gibson G (2007) Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Syst Biol 1–15

  • Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, France

  • Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) Nbclust: an r package for determining the relevant number of clusters in a data set. J Stat Softw 61:1–36

    Article  Google Scholar 

  • Cheng C, Fu A, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, pp 84–93

  • Cover T, Thomas J (2006) Elements of Information Theory, 2nd edn. Wiley

  • DeSarbo W, Cron W (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282

    Article  MathSciNet  MATH  Google Scholar 

  • DeSarbo W, Grisaffe D (1998) Combinatorial optimization approaches to constrained market segmentation: an application to industrial market segmentation. Mark Lett 9:115–134

    Article  Google Scholar 

  • Devijver E (2015) Finite mixture regression: a sparse variable selection by model selection for clustering. Electron J Stat 9:2642–2674

    Article  MathSciNet  MATH  Google Scholar 

  • Diday E (1976) Classification et sélection de paramètres sous contraintes. Tech. rep, IRIA-LABORIA

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Article  Google Scholar 

  • Gitman I, Chen J, Lei E, Dubrawski A (2018) Novel prediction techniques based on clusterwise linear regression. arXiv arXiv:1804.10742

  • Heinloth A, Irwin R, Boorman G, Nettesheim P, Fannin R, Sieber S, Snell M, Tucker C, Li L, Travlos G, Vansant G, Blackshear P, Tennant R, Cunningham M, Paules R (2004) Gene expression profiling of rat livers reveals indicators of potential adverse effects. Toxicol Sci 80:193–202

  • Heller R, Stanley D, Yekutieli D, Rubin N, Benjamini Y (2006) Cluster-based analysis of FMRI data. NeuroImage 33:599–608

    Article  Google Scholar 

  • Hubert H, Arabie P (1985) Comparing partitions. J Classif 193–218

  • Hwang H, DeSarbo S, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72:181–198

    Article  MathSciNet  MATH  Google Scholar 

  • Le Cao K, Rossouw D, Robert-Granie C, Besse P (2008) A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol 7:1

    MathSciNet  MATH  Google Scholar 

  • Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11:1

    Article  Google Scholar 

  • Mortier F, Ouedraogo D, Claeys F, Tadesse M, Cornu G, Baya F, Benedet F, Freycon V, Gourlet-Fleury S, Picard N (2015) Mixture of inhomogeneous matrix models for species-rich ecosystems. Environmetrics 26:39–51

    Article  MathSciNet  Google Scholar 

  • Nadaraya E (1964) On estimating regression. Theory of probability and its applications. Theory Probab Appl 9:141–142

    Article  Google Scholar 

  • Preda C, Saporta G (2005) Clusterwise PLS regression on a stochastic process. Comput Stat Data Anal 49:99–108

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2017) R: A Language and Environment for Statistical Computing (version 3.6.1). R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Article  Google Scholar 

  • Rohart F, Gautier B, Singh A, Le Cao KA (2017) mixomics: an r package for ’omics feature selection and multiple data integration. PLoS computational biology 13(11):e1005752

  • Shannon C (1948) A mathematical theory of communication. L’Institut d’electronique et d’informatique Gaspard-Monge (Reprinted with corrections from The Bell System Technical Journal) 27:379–423

  • Späth H (1979) Clusterwise linear regression. Computing 22:367–373

    Article  MathSciNet  MATH  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36:111–147

    MathSciNet  MATH  Google Scholar 

  • Suk HW, Hwang H (2010) Regularized fuzzy clusterwise ridge regression. Adv Data Anal Classif 4:35–51

    Article  MathSciNet  MATH  Google Scholar 

  • Vinzi V, Lauro C, Amato S (2005) PLS typological regression. In: Monari P, Mignani S, Montanari A, Vichi M (eds) New developments in classification and data analysis. Springer, pp 133–140

  • Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2009) REBUS-PLS: a response-based procedure for detecting unit segments in PLS path modeling. Appl Stochastic Models Bus Ind 24:439–458

    Article  MATH  Google Scholar 

  • Watson G (1964) Smooth regression analysis. Sankhya: Indian J Stat Ser A 64:359–372

    MathSciNet  MATH  Google Scholar 

  • Wilderjans T, Ceulemans E (2013) Clusterwise Parafac to identify heterogeneity in three-way data. Chemometr Intell Lab 129:87–97

    Article  Google Scholar 

  • Wilderjans T, Vande Gaer E, Kiers H, Van Mechelen I, Ceulemans E (2017) Principal covariates clusterwise regression (PCCR): Accounting for multicollinearity and population heterogeneity in hierarchically organized data. Psychometrika 82:86–111

    Article  MathSciNet  MATH  Google Scholar 

  • Woo CW, Krishnan A, Wager T (2014) Cluster-extent based thresholding in fMRI analyses: Pitfalls and recommendations. Neuroimage 91:412–419

    Article  Google Scholar 

  • Xiang S, Yao W (2020) Semi parametric mixtures of regressions with single-index for model based clustering. Adv Data Anal Classif 14:261–292

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan M, Lin Y (2005) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéphanie Bougeard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Illustration of two simulated situations

Appendix A: Illustration of two simulated situations

See Figs. 6 and 7.

Fig. 6
figure 6

Illustration of the simulated situation s5 (described in Table 2)

Fig. 7
figure 7

Illustration of the simulated situation s7 (described in Table 2)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bry, X., Niang, N., Verron, T. et al. Clusterwise elastic-net regression based on a combined information criterion. Adv Data Anal Classif 17, 75–107 (2023). https://doi.org/10.1007/s11634-021-00489-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00489-w

Keywords

Mathematics Subject Classification

Navigation