Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression

doi:10.1016/j.cor.2012.02.007

Computers & Operations Research

Volume 39, Issue 11, November 2012, Pages 2748-2762

https://doi.org/10.1016/j.cor.2012.02.007 Get rights and content

Abstract

A branch and bound strategy is proposed for solving the clusterwise regression problem, extending Brusco's repetitive branch and bound algorithm (RBBA). The resulting strategy relies upon iterative heuristic optimization, new ways of observation sequencing, and branch and bound optimization of a limited number of ending subsets. These three key features lead to significantly faster optimization of the complete set and the strategy has more general applications than only for clusterwise regression. Additionally, an efficient implementation of incremental calculations within the branch and bound search algorithm eliminates most of the redundant ones. Experiments using both real and synthetic data compare the various features of the proposed optimization algorithm and contrasts them against a benchmark mixed logical-quadratic programming formulation optimized by CPLEX. The results indicate that all components of the proposed algorithm provide significant improvements in processing times, and, when combined, generally provide the best performance, significantly outperforming CPLEX.

Introduction

Clusterwise regression is a clustering technique where multiple lines or hyperplanes are fit to mutually exclusive subsets of a dataset such that the sum of squared errors (SSE) from each observation to its cluster's line is minimized [1], [2], [3]. The term line will be used in this paper for both line and hyperplane. Clusterwise regression has relevance to such areas as spline estimation, utility function clustering and response based segmentations of customers, markets, regions, subjects, strategies or investors [1], [2], [3], [4], [5]. Optimization for clusterwise regression is considered “a tough combinatorial optimization problem” [4], and it appears that the only currently known feasible method for global optimization is by mixed logical-quadratic programming (MLQP) [6].

A new paradigm, the repetitive branch and bound algorithm (RBBA) has recently been proposed by Brusco and Stahl for clustering, seriation and variable selection [7], [8]. It works by sequencing the data and optimizing by branch and bound (BB) a series of problems corresponding to ending subsets with one more observation at a time [8]. An ending subset is a sequential subset of observations whose last observation is also the last observation of the complete set. Values of the solved problems are used to strengthen the lower bounds of the search at the current iteration. Building on this previous work, the present paper proposes an extended branch and bound strategy which combines iterative heuristic optimization, new ways of sequencing the observations, and branch and bound optimization of a limited number of ending subsets. These three key features lead to significantly faster optimization of the complete set and the strategy has more general applications than only for clusterwise regression.

Clusterwise regression is a cubic optimization problem defined by the number of clusters (K), the number of independent dimensions (D), and the number of observations (O). The iterators are for a cluster (k∈{1,…,K}), an independent dimension (d∈{1,…,D}), and an observation (o∈{1,…,O}). The model parameters are the independent variable for an observation and dimension (x_od) and the dependent variable for an observation (y_o). The model variables are the cluster assignment of an observation to a cluster (z_ok), the regression coefficient (aka β) for a dimension of a cluster (b_dk) and the error for an observation of a cluster (e_ok). The cubic model is as follows: $SSE = \min \sum_{k = 1}^{K} \sum_{o = 1}^{O} (z_{o k} e_{o k}^{2})$ $s . t . \sum_{d = 1}^{D} (b_{d k} x_{o d}) + e_{o k} = y_{o} o = 1, \dots, O, k = 1, \dots, K$ $\sum_{k = 1}^{K} z_{o k} = 1 o = 1, \dots, O$ $z_{o k} \in {0, 1} o = 1, \dots, O, k = 1, \dots, K$

The objective (1) is the minimization of the sum over all clusters of the sum of squared errors (SSE) for their observations relative to their regression line. The constraint (2) fits the regression lines to the data by adjusting the coefficient and error terms. An observation can only be assigned to one cluster at a time (3) and the cluster assignment is binary (4). This formulation does not explicitly require an intercept but it can be included in the model simply by adding a variable to the data with a constant of one. All models in this paper include an intercept, thus D is always one more than the number of independent variables in the original dataset. Since there are K^O possible clustering configurations and since there is a minimum of D²O regression computations [9] to perform per clustering configuration, the enumeration of the complete problem search space requires at least K^OD²O operations.

Although identifying the globally optimal solution to a clusterwise regression problem by no means guarantees identifying the true model, on average, these globally optimal solutions will lead to better models than random local optima identified by heuristics. However, as stressed by Brusco et al., clusterwise regression makes no effort to distinguish between error explained by clustering and error explained by regression [10]. Also, since clusterwise regression fits multiple lines to the data, the overfitting potential is much greater than that of a single regression line. Consequently an evaluation procedure has been proposed to test if there is overfitting or not [10]. Nevertheless, evaluating and addressing this overfitting problem is not in the scope of the current research and neither is the statistical validity of identified optimal clusterwise regression models. This research considers only the feasibility and processing time for finding the optimal solution to a clusterwise multiple linear regression problem (OCMLR).

This paper is structured as follows: Section 2 provides an overview of some previous heuristic and exact optimization approaches, Section 3 details the proposed exact global optimization strategy, Section 4 overviews the experimental protocol and datasets, Section 5 presents the results and related discussion. Conclusions are presented in Section 6.

Section snippets

Heuristics

Various heuristics have been applied to solve the clusterwise regression problem. The exchange method, which is stepwise optimal but not globally optimal, consists in tentatively moving each observation from its cluster to each other cluster, keeping only the reassignments that reduce the error. This is repeated until a complete pass over the observations does not result in any improvement [1], [2], [3], [11], [12]. The simulated annealing (SA) [13], variable neighborhood search (VNS) [14], and

Proposed exact global optimization strategy

A branch and bound algorithm can also be used for solving the clusterwise regression problem optimally. Although this is a difficult task, symmetry breaking, identifying stronger bounds and controlling the path through the search space can reduce the actual size of the search and incremental regression calculations will reduce the number of operations for each evaluation. The upper bound can be strengthened by heuristic optimization. The lower bound can be strengthened by exact global

Experimental protocol

As detailed in the section on observation sequencing, because of large variations in processing times, it is important to always use appropriate statistics when comparing the processing times of various algorithms. Consequently, for all of the optimization experiments, the same problem was executed 100 times and the sequence of observations was randomized each time. Once the total processing time for a specific problem and algorithm had passed beyond 100 h, it was aborted and considered timed

Results and discussions

The results of optimizing the real datasets into two and three clusters are presented in Table 3, which indicate that the BBHSE algorithm provides a significant performance advantage over CPLEX and simpler branch and bound algorithms (BB and BB.h). However, the results also indicate that the heuristic and sequencing alone (BB.h.s1, BB.h.s2) often do as well or even slightly better than adding the ending subset searches (BB.h.s1.e, BB.h.s2.e). In addition, the more general sequencing strategy

Conclusion

The results indicate that the proposed combined heuristic optimization, observation sequencing and global optimization of ending subsets (BBHSE) strategy provides significant performance advantages over all currently available alternatives. The choice of the observation sequencing has major impact on performance and two algorithms are proposed. The first and more general rule is to sequence the observations by descending error in the cluster with forced alternating of clusters. The second rule,

References (46)

K.N. Lau et al.
A mathematical programming approach to clusterwise regression model and its extensions
European Journal of Operational Research
(1999)
J.M. Aurifeille
Quester PG. Predicting business ethical tolerance in international markets: a concomitant clusterwise regression analysis
International Business Review
(2003)
J.N. Hooker et al.
Mixed logical-linear programming
Discrete Applied Mathematics.
(1999)
C. Charles
Régression typologique et reconnaissance des formes [Thèse de doctorat 3ième cycle]
(1977)
E. Diday
Optimization en Classification Automatique
(1979)
H. Späth
Algorithm 39 Clusterwise linear regression
Computing
(1979)
C. Hennig
Identifiablity of models for clusterwise linear regression
Journal of Classification
(2000)
R.A. Carbonneau et al.
Globally optimal clusterwise regression by mixed logical-quadratic programming
European Journal of Operational Research
(2011)
M.J. Brusco et al.
Branch-and-bound applications in combinatorial data analysis
(2005)
M.J. Brusco
A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning
Psychometrika
(2006)

W.M. Gentleman

Least Squares Computations by givens transformations without square roots

IMA Journal of Applied Mathematics

(1973)

M.J. Brusco et al.

Cautionary remarks on the use of clusterwise regression

Multivariate Behavioral Research

(2008)

H. Späth

Correction to algorithm 39 clusterwise linear regression

Computing

(1981)

H. Späth

A fast algorithm for clusterwise linear regression

Computing

(1982)

W. DeSarbo et al.

A simulated annealing methodology for clusterwise linear regression

Psychometrika

(1989)

G. Caporossi et al.

Variable neighborhood search for least squares clusterwise regression

Les Cahiers du GERAD

(2005)

J.M. Aurifeille

A bio-mimetic approach to marketing segmentation: principles and comparative analysis

European Journal of Economic and Social Systems

(2000)

J.M. Aurifeille et al.

A dyadic segmentation approach to business partnerships

European Journal of Economic and Social Systems

(2001)

A. Ciampi et al.

Locally linear regression and the calibration problem for micro-array analysis

B. Mirkin

Clustering for data mining

(2005)

W. DeSarbo

A maximum likelihood methodology for clusterwise linear regression

Journal of Classification

(1988)

Wedel M. Clusterwise regression and market segmentation. Developments and applications. Doctoral thesis. Wageningen:...

M. Wedel et al.

A mixture likelihood approach for generalized linear models

Journal of Classification

(1995)

Cited by (23)

A column generation based heuristic algorithm for piecewise linear regression
2021, Expert Systems with Applications
Piecewise linear regression is a powerful and flexible regression technique where the dataset is divided into disjoint partitions and a separate regression is computed for each partition. Here, we consider the piecewise linear regression problem where the data partitioning is performed via a fixed number of break points on a predetermined dimension. We develop a column generation heuristic based on a set partitioning formulation of the problem and evaluate its prediction performance using a mixed integer programming formulation introduced earlier as a benchmark. Our results show that the proposed heuristic displays an efficient and robust performance, and also scales up smoothly as the dataset grows.
Incremental DC optimization algorithm for large-scale clusterwise linear regression
2021, Journal of Computational and Applied Mathematics
Citation Excerpt :
CLR is a global optimization problem. However, conventional global optimization algorithms as well as exact algorithms from [15,17,18] are not always applicable to solve CLR problems in data sets with the relatively large number of data points and/or input variables. In addition, these algorithms are not efficient when the large number of linear functions are required to approximate data as they may require prohibitively large computational effort and may not find any solution in a reasonable time.
The objective function in the nonsmooth optimization model of the clusterwise linear regression (CLR) problem with the squared regression error is represented as a difference of two convex functions. Then using the difference of convex algorithm (DCA) approach the CLR problem is replaced by the sequence of smooth unconstrained optimization subproblems. A new algorithm based on the DCA and the incremental approach is designed to solve the CLR problem. We apply the Quasi-Newton method to solve the subproblems. The proposed algorithm is evaluated using several synthetic and real-world data sets for regression and compared with other algorithms for CLR. Results demonstrate that the DCA based algorithm is efficient for solving CLR problems with the large number of data points and in particular, outperforms other algorithms when the number of input variables is small.
Clusterwise support vector linear regression
2020, European Journal of Operational Research
Citation Excerpt :
Therefore, the number of variables is significantly less than that in models based on nonlinear programming, mixed-integer linear and mixed-integer quadratic programming techniques, where also the number of data points in a data set affects. Algorithms for solving the CLR problem include those which are extensions of clustering algorithms such as the k-means (Späth, 1979) and the expectation-maximization algorithms (EM) (Gaffney & Smyth, 1999) and those based on the nonlinear programming (Lau et al., 1999), the mixed-integer linear programming (Bertsimas & Shioda, 2007), the mixed-integer nonlinear programming (Carbonneau, Caporossi, & Hansen, 2012; DeSarbo, Oliver, & Rangaswamy, 1989), nonsmooth optimization (Bagirov & Ugon, 2018; Bagirov et al., 2013; 2015a; 2015b) and mixture models (DeSarbo & Cron, 1988; Garcìa-Escudero et al., 2010). In this paper, a new approach for modelling and solving CLR problems is proposed using support vector machines (SVM) for regression (Collobert & Bengio, 2001; Smola & Schölkopf, 2004).
In clusterwise linear regression (CLR), the aim is to simultaneously partition data into a given number of clusters and to find regression coefficients for each cluster. In this paper, we propose a novel approach to model and solve the CLR problem. The main idea is to utilize the support vector machine (SVM) approach to model the CLR problem by using the SVM for regression to approximate each cluster. This new formulation of the CLR problem is represented as an unconstrained nonsmooth optimization problem, where we minimize a difference of two convex (DC) functions. To solve this problem, a method based on the combination of the incremental algorithm and the double bundle method for DC optimization is designed. Numerical experiments are performed to validate the reliability of the new formulation for CLR and the efficiency of the proposed method. The results show that the SVM approach is suitable for solving CLR problems, especially, when there are outliers in data.
Satisfiability modulo theories for process systems engineering
2018, Computers and Chemical Engineering
Citation Excerpt :
For many cases, e.g. where MILP binary variables model existence or assignment, MLLP may result in an easier-to-comprehend model with fewer variables. MLLP may be extended to models with nonlinear constraints, i.e. to mixed logical-nonlinear programming (Bemporad and Giorgetti, 2004; 2006; Bollapragada et al., 2001; Carbonneau et al., 2011; 2012; Türkay and Grossmann, 1996). Even advances such as GDP (Grossmann and Ruiz, 2012) cannot compete with the expressiveness of constraints written in SMT solvers.
Process systems engineers have long recognized the importance of both logic and optimization for automated decision-making. But modern challenges in process systems engineering could strongly benefit from methodological contributions in computer science. In particular, we propose satisfiability modulo theories (SMT) for process systems engineering applications. We motivate SMT using a series of test beds and show the applicability of SMT algorithms and implementations on (i) two-dimensional bin packing, (ii) model explainers, and (iii) mixed-integer nonlinear optimization solvers.
The role of optimization in some recent advances in data-driven decision-making
2023, Mathematical Programming
A Generalized Framework for Predictive Clustering and Optimization
2023, arXiv

View all citing articles on Scopus

View full text

Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression

Abstract

Introduction

Section snippets

Heuristics

Proposed exact global optimization strategy

Experimental protocol

Results and discussions

Conclusion

European Journal of Operational Research

International Business Review

Discrete Applied Mathematics.

Régression typologique et reconnaissance des formes [Thèse de doctorat 3ième cycle]

Optimization en Classification Automatique

Algorithm 39 Clusterwise linear regression

Computing

Identifiablity of models for clusterwise linear regression

Journal of Classification

Globally optimal clusterwise regression by mixed logical-quadratic programming

European Journal of Operational Research

Branch-and-bound applications in combinatorial data analysis

A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning

Psychometrika

Least Squares Computations by givens transformations without square roots

IMA Journal of Applied Mathematics

Cautionary remarks on the use of clusterwise regression

Multivariate Behavioral Research

Correction to algorithm 39 clusterwise linear regression

Computing

A fast algorithm for clusterwise linear regression

Computing

A simulated annealing methodology for clusterwise linear regression

Psychometrika

Variable neighborhood search for least squares clusterwise regression

Les Cahiers du GERAD

A bio-mimetic approach to marketing segmentation: principles and comparative analysis

European Journal of Economic and Social Systems

A dyadic segmentation approach to business partnerships

European Journal of Economic and Social Systems

Locally linear regression and the calibration problem for micro-array analysis

Clustering for data mining

A maximum likelihood methodology for clusterwise linear regression

Journal of Classification

A mixture likelihood approach for generalized linear models

Journal of Classification