Homogeneity detection for the high-dimensional generalized linear model

https://doi.org/10.1016/j.csda.2017.04.001Get rights and content

Abstract

We propose to use a penalized estimator for detecting homogeneity of the high-dimensional generalized linear model. Here, the homogeneity is a specific model structure where regression coefficients are grouped having exactly the same value in each group. The proposed estimator achieves weak oracle property under mild regularity conditions and is invariant to the choice of reference levels when there are categorical covariates in the model. An efficient algorithm is also provided. Various numerical studies confirm that the proposed penalized estimator gives better performance than other conventional variable selection estimators when the model has homogeneity.

Introduction

For years, the penalized estimations have been extended to reflect a prior knowledge of relationships among covariates (Tibshirani et al., 2005, Yuan and Lin, 2006). However, there are more complicated but interesting applications in grouping covariates without any prior knowledge, which is called homogeneity detection in this paper. The homogeneity detection means identifying a specific model structure where regression coefficients are grouped having exactly the same value in each group and has higher conceptual generalizability than variable selection (Shen and Huang, 2010, Ke et al., 2015).

Many researchers have developed homogeneity detection techniques in the linear regression model. For example, Bondell and Reich (2008) proposed an octagonal shrinkage and clustering algorithm for regression by using two convex penalties; Petry et al. (2011) and Jang et al. (2013) developed similar techniques called the pairwise least absolute shrinkage and selection operator and the hexagonal operator for regression with shrinkage and equality selection, respectively. Shen and Huang (2010) proposed to use the capped 1-penalty and proved that the resulting estimator has an oracle property (Shen and Huang, 2010, Shen et al., 2012, Zhu et al., 2013). Similar works can be found in Ke et al. (2015), Petry et al. (2011) and Masarotto and Varin (2012) who used the smoothly clipped absolute deviation (SCAD) (Fan and Li, 2001) and adaptive version (Zou, 2006) of least absolute shrinkage and selection operator (LASSO) penalties, respectively.

The homogeneity detection is useful especially when there are categorical covariates in the model. In this case, regression coefficients are interpreted as relative effects on the response with respect to a predefined reference level. Through the homogeneity detection we can produce a simpler model by collapsing multiple levels with the same effect. (Gertheiss and Tutz, 2010). Moreover, homogeneity detection gives better prediction accuracy than other conventional sparse estimators when the true model has the homogeneity structure (Ke et al., 2015).

In this paper, we propose to use a penalized estimator for the homogeneity detection in the high-dimensional generalized linear model (GLM), that composed of two non-convex penalties: individual sparsity and sparsity of pairwise difference. We consider a class of non-convex penalties that includes most of existing non-convex penalties considered by previous researchers.

First, we extend the homogeneity detection from the linear regression model (Ke et al., 2015, Shen and Huang, 2010) to GLMs. The main challenges are investigating asymptotic properties that support the use of the proposed estimator and developing a computational algorithm when the model is high-dimensional. We prove that the proposed estimator satisfies weak oracle property (Fan and Lv, 2011, Kwon and Kim, 2012, Kim et al., 2016) which is new and covers the results in Ke et al. (2015) and Shen and Huang (2010) under mild conditions. We develop an algorithm by applying the concave–convex procedure (Yuille and Rangarajan, 2003, Kim et al., 2008) and alternating direction method of multipliers (Boyd et al., 2011).

Second, we prove that the homogeneity structure constructed by the proposed estimator does not depend on choice of reference levels under the presence of categorical covariates. This proof offers a new and critical justification for the proposed method of homogeneity detection. This implies that the proposed estimator is invariant to the choice of reference levels which does not hold for the conventional sparse estimators such as the LASSO and SCAD to hold.

We organize the paper as follows. Section  2 introduces a penalized MLE and presents a computational algorithm for homogeneity detection in GLMs. Section  3 proves weak oracle property under regularity conditions. Section  4 studies the invariance property when the model includes categorical covariates. Section  5 presents the results of numerical studies and concluding remarks follow in Section  6.

Section snippets

Penalized estimation for homogeneity detection

Let (yi,xiT)T,i=1,,n be a random sample of response–predictor pairs from GLM where the conditional density function of yi|xi is f(yi|xi;β) with E(yi|xi;β)=ϕ1(xiTβ) for a link function ϕ and the marginal distribution of xi does not depend on β. Given λ10 and λ20, we consider a penalized MLE that is defined as βˆ=argminβQλ1,λ2(β), where Qλ1,λ2(β)=1ni=1nlogf(yi|xi;β)+j=1pGλ1(|βj|)+s<tGλ2(|βsβt|) and Gλ is a nonconvex penalty with tuning parameter λ. The penalized negative log-likelihood Qλ

Asymptotic property

In this section, we provide sufficient conditions for local optimality of the oracle MLE and then prove that the oracle MLE becomes a local minimizer of the penalized negative log-likelihood with probability converging to 1.

An application: homogeneity detection in the presence of categorical covariates

In this section we focus on the case where all covariates are categorical and consider a problem of reducing levels of each categorical variables using the proposed method. For example, we usually categorize continuous variables to accommodate nonlinear effect as in locally constant nonparametric regression model. It is an important problem how many levels should be made for a given continuous covariate.

Unlike the usual MLE the penalized MLE can produce a different predictive model depending on

Numerical studies

This section presents results of numerical studies including two simulated examples and two real data analysis.

Concluding remarks

In this paper we extend the sparse regularization method for regression model to the group pursuit method in the GLM with various non-convex penalties. This extension is based on the generalization of local optimality conditions in the log-likelihood function with a large class of non-convex grouping penalties and on the investigation of asymptotic properties of the oracle estimator under regularity conditions. The regularity conditions require that clustered coefficients are sufficiently far

Acknowledgments

Jeon’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF-2016R1C1B1010545) funded by the Ministry of Science, ICT and Future Planning. Kwon’s research was supported by Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning (No. 2014R1A1A1002995). Choi’s research was supported by Basic Science Research Program through the National Research Foundation

References (46)

  • G.-B. Ye et al.

    Split bregman method for large scale fused LASSO

    Comput. Statist. Data Anal.

    (2011)
  • I.C. Yeh et al.

    The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients

    Expert Syst. Appl.

    (2009)
  • R.B. Basnet et al.

    Learning to detect phishing webpages

    J. Internet Serv. Inf. Secur.

    (2014)
  • A. Beck et al.

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems

    SIAM J. Imag. Sci.

    (2009)
  • H.D. Bondell et al.

    Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

    Biometrics

    (2008)
  • H.D. Bondell et al.

    Simultaneous factor selection and collapsing levels in ANOVA

    Biometrics

    (2009)
  • S. Boyd et al.

    Distributed optimization and statistical learning via the alternating direction method of multipliers

    Found. Trends® Mach. Learn.

    (2011)
  • H. Choi et al.

    Fused least absolute shrinkage and selection operator for credit scoring

    J. Stat. Comput. Simul.

    (2015)
  • J. Fan et al.

    Variable selection via nonconcave penalized likelihood and its oracle properties

    J. Amer. Statist. Assoc.

    (2001)
  • J. Fan et al.

    Nonconcave penalized likelihood with np-dimensionality

    IEEE Trans. Inform. Theory

    (2011)
  • J. Fan et al.

    Nonconcave penalized likelihood with a diverging number of parameters

    Ann. Statist.

    (2004)
  • J. Fan et al.

    Tuning parameter selection in high dimensional penalized likelihood

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2013)
  • L. Frank et al.

    A statistical view of some chemometrics regression tools

    Technometrics

    (1993)
  • J. Gertheiss et al.

    Sparse modeling of categorial explanatory variables

    Ann. Appl. Stat.

    (2010)
  • T. Goldstein et al.

    Fast alternating direction optimization methods

    SIAM J. Imag. Sci.

    (2014)
  • Grant, M., Boyd, S., 2010. cvx Users Guide for cvx version 1.21 (build...
  • D.R. Hunter et al.

    A tutorial on mm algorithms

    Amer. Statist.

    (2004)
  • Jang, W., Lim, J., Lazar, N.A., Loh, J.M., Yu, D., 2013. Regression shrinkage and grouping of highly correlated...
  • T. Ke et al.

    Homogeneity pursuit

    J. Amer. Statist. Assoc.

    (2015)
  • Y. Kim et al.

    Smoothly clipped absolute deviation on high dimensions

    J. Amer. Statist. Assoc.

    (2008)
  • Y. Kim et al.

    A necessary condition for the strong Oracle property

    Scand. J. Statist.

    (2016)
  • Y. Kim et al.

    Global optimality of non-convex penalized estimators

    Biometrika

    (2012)
  • S. Kwon et al.

    Large sample properties of the scad-penalized maximum likelihood estimation on high dimensions

    Statist. Sinica

    (2012)
  • Cited by (8)

    • ADMM for least square problems with pairwise-difference penalties for coefficient grouping

      2022, Communications for Statistical Applications and Methods
    View all citing articles on Scopus
    View full text