Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression

https://doi.org/10.1016/j.cor.2012.02.007Get rights and content

Abstract

A branch and bound strategy is proposed for solving the clusterwise regression problem, extending Brusco's repetitive branch and bound algorithm (RBBA). The resulting strategy relies upon iterative heuristic optimization, new ways of observation sequencing, and branch and bound optimization of a limited number of ending subsets. These three key features lead to significantly faster optimization of the complete set and the strategy has more general applications than only for clusterwise regression. Additionally, an efficient implementation of incremental calculations within the branch and bound search algorithm eliminates most of the redundant ones. Experiments using both real and synthetic data compare the various features of the proposed optimization algorithm and contrasts them against a benchmark mixed logical-quadratic programming formulation optimized by CPLEX. The results indicate that all components of the proposed algorithm provide significant improvements in processing times, and, when combined, generally provide the best performance, significantly outperforming CPLEX.

Introduction

Clusterwise regression is a clustering technique where multiple lines or hyperplanes are fit to mutually exclusive subsets of a dataset such that the sum of squared errors (SSE) from each observation to its cluster's line is minimized [1], [2], [3]. The term line will be used in this paper for both line and hyperplane. Clusterwise regression has relevance to such areas as spline estimation, utility function clustering and response based segmentations of customers, markets, regions, subjects, strategies or investors [1], [2], [3], [4], [5]. Optimization for clusterwise regression is considered “a tough combinatorial optimization problem” [4], and it appears that the only currently known feasible method for global optimization is by mixed logical-quadratic programming (MLQP) [6].

A new paradigm, the repetitive branch and bound algorithm (RBBA) has recently been proposed by Brusco and Stahl for clustering, seriation and variable selection [7], [8]. It works by sequencing the data and optimizing by branch and bound (BB) a series of problems corresponding to ending subsets with one more observation at a time [8]. An ending subset is a sequential subset of observations whose last observation is also the last observation of the complete set. Values of the solved problems are used to strengthen the lower bounds of the search at the current iteration. Building on this previous work, the present paper proposes an extended branch and bound strategy which combines iterative heuristic optimization, new ways of sequencing the observations, and branch and bound optimization of a limited number of ending subsets. These three key features lead to significantly faster optimization of the complete set and the strategy has more general applications than only for clusterwise regression.

Clusterwise regression is a cubic optimization problem defined by the number of clusters (K), the number of independent dimensions (D), and the number of observations (O). The iterators are for a cluster (k∈{1,…,K}), an independent dimension (d∈{1,…,D}), and an observation (o∈{1,…,O}). The model parameters are the independent variable for an observation and dimension (xod) and the dependent variable for an observation (yo). The model variables are the cluster assignment of an observation to a cluster (zok), the regression coefficient (aka β) for a dimension of a cluster (bdk) and the error for an observation of a cluster (eok). The cubic model is as follows:SSE=mink=1Ko=1O(zokeok2)s.t.d=1D(bdkxod)+eok=yoo=1,,O,k=1,,Kk=1Kzok=1o=1,,Ozok{0,1}o=1,,O,k=1,,K

The objective (1) is the minimization of the sum over all clusters of the sum of squared errors (SSE) for their observations relative to their regression line. The constraint (2) fits the regression lines to the data by adjusting the coefficient and error terms. An observation can only be assigned to one cluster at a time (3) and the cluster assignment is binary (4). This formulation does not explicitly require an intercept but it can be included in the model simply by adding a variable to the data with a constant of one. All models in this paper include an intercept, thus D is always one more than the number of independent variables in the original dataset. Since there are KO possible clustering configurations and since there is a minimum of D2O regression computations [9] to perform per clustering configuration, the enumeration of the complete problem search space requires at least KOD2O operations.

Although identifying the globally optimal solution to a clusterwise regression problem by no means guarantees identifying the true model, on average, these globally optimal solutions will lead to better models than random local optima identified by heuristics. However, as stressed by Brusco et al., clusterwise regression makes no effort to distinguish between error explained by clustering and error explained by regression [10]. Also, since clusterwise regression fits multiple lines to the data, the overfitting potential is much greater than that of a single regression line. Consequently an evaluation procedure has been proposed to test if there is overfitting or not [10]. Nevertheless, evaluating and addressing this overfitting problem is not in the scope of the current research and neither is the statistical validity of identified optimal clusterwise regression models. This research considers only the feasibility and processing time for finding the optimal solution to a clusterwise multiple linear regression problem (OCMLR).

This paper is structured as follows: Section 2 provides an overview of some previous heuristic and exact optimization approaches, Section 3 details the proposed exact global optimization strategy, Section 4 overviews the experimental protocol and datasets, Section 5 presents the results and related discussion. Conclusions are presented in Section 6.

Section snippets

Heuristics

Various heuristics have been applied to solve the clusterwise regression problem. The exchange method, which is stepwise optimal but not globally optimal, consists in tentatively moving each observation from its cluster to each other cluster, keeping only the reassignments that reduce the error. This is repeated until a complete pass over the observations does not result in any improvement [1], [2], [3], [11], [12]. The simulated annealing (SA) [13], variable neighborhood search (VNS) [14], and

Proposed exact global optimization strategy

A branch and bound algorithm can also be used for solving the clusterwise regression problem optimally. Although this is a difficult task, symmetry breaking, identifying stronger bounds and controlling the path through the search space can reduce the actual size of the search and incremental regression calculations will reduce the number of operations for each evaluation. The upper bound can be strengthened by heuristic optimization. The lower bound can be strengthened by exact global

Experimental protocol

As detailed in the section on observation sequencing, because of large variations in processing times, it is important to always use appropriate statistics when comparing the processing times of various algorithms. Consequently, for all of the optimization experiments, the same problem was executed 100 times and the sequence of observations was randomized each time. Once the total processing time for a specific problem and algorithm had passed beyond 100 h, it was aborted and considered timed

Results and discussions

The results of optimizing the real datasets into two and three clusters are presented in Table 3, which indicate that the BBHSE algorithm provides a significant performance advantage over CPLEX and simpler branch and bound algorithms (BB and BB.h). However, the results also indicate that the heuristic and sequencing alone (BB.h.s1, BB.h.s2) often do as well or even slightly better than adding the ending subset searches (BB.h.s1.e, BB.h.s2.e). In addition, the more general sequencing strategy

Conclusion

The results indicate that the proposed combined heuristic optimization, observation sequencing and global optimization of ending subsets (BBHSE) strategy provides significant performance advantages over all currently available alternatives. The choice of the observation sequencing has major impact on performance and two algorithms are proposed. The first and more general rule is to sequence the observations by descending error in the cluster with forced alternating of clusters. The second rule,

References (46)

  • W.M. Gentleman

    Least Squares Computations by givens transformations without square roots

    IMA Journal of Applied Mathematics

    (1973)
  • M.J. Brusco et al.

    Cautionary remarks on the use of clusterwise regression

    Multivariate Behavioral Research

    (2008)
  • H. Späth

    Correction to algorithm 39 clusterwise linear regression

    Computing

    (1981)
  • H. Späth

    A fast algorithm for clusterwise linear regression

    Computing

    (1982)
  • W. DeSarbo et al.

    A simulated annealing methodology for clusterwise linear regression

    Psychometrika

    (1989)
  • G. Caporossi et al.

    Variable neighborhood search for least squares clusterwise regression

    Les Cahiers du GERAD

    (2005)
  • J.M. Aurifeille

    A bio-mimetic approach to marketing segmentation: principles and comparative analysis

    European Journal of Economic and Social Systems

    (2000)
  • J.M. Aurifeille et al.

    A dyadic segmentation approach to business partnerships

    European Journal of Economic and Social Systems

    (2001)
  • A. Ciampi et al.

    Locally linear regression and the calibration problem for micro-array analysis

  • B. Mirkin

    Clustering for data mining

    (2005)
  • W. DeSarbo

    A maximum likelihood methodology for clusterwise linear regression

    Journal of Classification

    (1988)
  • Wedel M. Clusterwise regression and market segmentation. Developments and applications. Doctoral thesis. Wageningen:...
  • M. Wedel et al.

    A mixture likelihood approach for generalized linear models

    Journal of Classification

    (1995)
  • Cited by (23)

    • Incremental DC optimization algorithm for large-scale clusterwise linear regression

      2021, Journal of Computational and Applied Mathematics
      Citation Excerpt :

      CLR is a global optimization problem. However, conventional global optimization algorithms as well as exact algorithms from [15,17,18] are not always applicable to solve CLR problems in data sets with the relatively large number of data points and/or input variables. In addition, these algorithms are not efficient when the large number of linear functions are required to approximate data as they may require prohibitively large computational effort and may not find any solution in a reasonable time.

    • Clusterwise support vector linear regression

      2020, European Journal of Operational Research
      Citation Excerpt :

      Therefore, the number of variables is significantly less than that in models based on nonlinear programming, mixed-integer linear and mixed-integer quadratic programming techniques, where also the number of data points in a data set affects. Algorithms for solving the CLR problem include those which are extensions of clustering algorithms such as the k-means (Späth, 1979) and the expectation-maximization algorithms (EM) (Gaffney & Smyth, 1999) and those based on the nonlinear programming (Lau et al., 1999), the mixed-integer linear programming (Bertsimas & Shioda, 2007), the mixed-integer nonlinear programming (Carbonneau, Caporossi, & Hansen, 2012; DeSarbo, Oliver, & Rangaswamy, 1989), nonsmooth optimization (Bagirov & Ugon, 2018; Bagirov et al., 2013; 2015a; 2015b) and mixture models (DeSarbo & Cron, 1988; Garcìa-Escudero et al., 2010). In this paper, a new approach for modelling and solving CLR problems is proposed using support vector machines (SVM) for regression (Collobert & Bengio, 2001; Smola & Schölkopf, 2004).

    • Satisfiability modulo theories for process systems engineering

      2018, Computers and Chemical Engineering
      Citation Excerpt :

      For many cases, e.g. where MILP binary variables model existence or assignment, MLLP may result in an easier-to-comprehend model with fewer variables. MLLP may be extended to models with nonlinear constraints, i.e. to mixed logical-nonlinear programming (Bemporad and Giorgetti, 2004; 2006; Bollapragada et al., 2001; Carbonneau et al., 2011; 2012; Türkay and Grossmann, 1996). Even advances such as GDP (Grossmann and Ruiz, 2012) cannot compete with the expressiveness of constraints written in SMT solvers.

    View all citing articles on Scopus
    View full text