Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression
Introduction
Clusterwise regression is a clustering technique where multiple lines or hyperplanes are fit to mutually exclusive subsets of a dataset such that the sum of squared errors (SSE) from each observation to its cluster's line is minimized [1], [2], [3]. The term line will be used in this paper for both line and hyperplane. Clusterwise regression has relevance to such areas as spline estimation, utility function clustering and response based segmentations of customers, markets, regions, subjects, strategies or investors [1], [2], [3], [4], [5]. Optimization for clusterwise regression is considered “a tough combinatorial optimization problem” [4], and it appears that the only currently known feasible method for global optimization is by mixed logical-quadratic programming (MLQP) [6].
A new paradigm, the repetitive branch and bound algorithm (RBBA) has recently been proposed by Brusco and Stahl for clustering, seriation and variable selection [7], [8]. It works by sequencing the data and optimizing by branch and bound (BB) a series of problems corresponding to ending subsets with one more observation at a time [8]. An ending subset is a sequential subset of observations whose last observation is also the last observation of the complete set. Values of the solved problems are used to strengthen the lower bounds of the search at the current iteration. Building on this previous work, the present paper proposes an extended branch and bound strategy which combines iterative heuristic optimization, new ways of sequencing the observations, and branch and bound optimization of a limited number of ending subsets. These three key features lead to significantly faster optimization of the complete set and the strategy has more general applications than only for clusterwise regression.
Clusterwise regression is a cubic optimization problem defined by the number of clusters (K), the number of independent dimensions (D), and the number of observations (O). The iterators are for a cluster (k∈{1,…,K}), an independent dimension (d∈{1,…,D}), and an observation (o∈{1,…,O}). The model parameters are the independent variable for an observation and dimension (xod) and the dependent variable for an observation (yo). The model variables are the cluster assignment of an observation to a cluster (zok), the regression coefficient (aka β) for a dimension of a cluster (bdk) and the error for an observation of a cluster (eok). The cubic model is as follows:
The objective (1) is the minimization of the sum over all clusters of the sum of squared errors (SSE) for their observations relative to their regression line. The constraint (2) fits the regression lines to the data by adjusting the coefficient and error terms. An observation can only be assigned to one cluster at a time (3) and the cluster assignment is binary (4). This formulation does not explicitly require an intercept but it can be included in the model simply by adding a variable to the data with a constant of one. All models in this paper include an intercept, thus D is always one more than the number of independent variables in the original dataset. Since there are KO possible clustering configurations and since there is a minimum of D2O regression computations [9] to perform per clustering configuration, the enumeration of the complete problem search space requires at least KOD2O operations.
Although identifying the globally optimal solution to a clusterwise regression problem by no means guarantees identifying the true model, on average, these globally optimal solutions will lead to better models than random local optima identified by heuristics. However, as stressed by Brusco et al., clusterwise regression makes no effort to distinguish between error explained by clustering and error explained by regression [10]. Also, since clusterwise regression fits multiple lines to the data, the overfitting potential is much greater than that of a single regression line. Consequently an evaluation procedure has been proposed to test if there is overfitting or not [10]. Nevertheless, evaluating and addressing this overfitting problem is not in the scope of the current research and neither is the statistical validity of identified optimal clusterwise regression models. This research considers only the feasibility and processing time for finding the optimal solution to a clusterwise multiple linear regression problem (OCMLR).
This paper is structured as follows: Section 2 provides an overview of some previous heuristic and exact optimization approaches, Section 3 details the proposed exact global optimization strategy, Section 4 overviews the experimental protocol and datasets, Section 5 presents the results and related discussion. Conclusions are presented in Section 6.
Section snippets
Heuristics
Various heuristics have been applied to solve the clusterwise regression problem. The exchange method, which is stepwise optimal but not globally optimal, consists in tentatively moving each observation from its cluster to each other cluster, keeping only the reassignments that reduce the error. This is repeated until a complete pass over the observations does not result in any improvement [1], [2], [3], [11], [12]. The simulated annealing (SA) [13], variable neighborhood search (VNS) [14], and
Proposed exact global optimization strategy
A branch and bound algorithm can also be used for solving the clusterwise regression problem optimally. Although this is a difficult task, symmetry breaking, identifying stronger bounds and controlling the path through the search space can reduce the actual size of the search and incremental regression calculations will reduce the number of operations for each evaluation. The upper bound can be strengthened by heuristic optimization. The lower bound can be strengthened by exact global
Experimental protocol
As detailed in the section on observation sequencing, because of large variations in processing times, it is important to always use appropriate statistics when comparing the processing times of various algorithms. Consequently, for all of the optimization experiments, the same problem was executed 100 times and the sequence of observations was randomized each time. Once the total processing time for a specific problem and algorithm had passed beyond 100 h, it was aborted and considered timed
Results and discussions
The results of optimizing the real datasets into two and three clusters are presented in Table 3, which indicate that the BBHSE algorithm provides a significant performance advantage over CPLEX and simpler branch and bound algorithms (BB and BB.h). However, the results also indicate that the heuristic and sequencing alone (BB.h.s1, BB.h.s2) often do as well or even slightly better than adding the ending subset searches (BB.h.s1.e, BB.h.s2.e). In addition, the more general sequencing strategy
Conclusion
The results indicate that the proposed combined heuristic optimization, observation sequencing and global optimization of ending subsets (BBHSE) strategy provides significant performance advantages over all currently available alternatives. The choice of the observation sequencing has major impact on performance and two algorithms are proposed. The first and more general rule is to sequence the observations by descending error in the cluster with forced alternating of clusters. The second rule,
References (46)
- et al.
A mathematical programming approach to clusterwise regression model and its extensions
European Journal of Operational Research
(1999) Quester PG. Predicting business ethical tolerance in international markets: a concomitant clusterwise regression analysis
International Business Review
(2003)- et al.
Mixed logical-linear programming
Discrete Applied Mathematics.
(1999) Régression typologique et reconnaissance des formes [Thèse de doctorat 3ième cycle]
(1977)Optimization en Classification Automatique
(1979)Algorithm 39 Clusterwise linear regression
Computing
(1979)Identifiablity of models for clusterwise linear regression
Journal of Classification
(2000)- et al.
Globally optimal clusterwise regression by mixed logical-quadratic programming
European Journal of Operational Research
(2011) - et al.
Branch-and-bound applications in combinatorial data analysis
(2005) A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning
Psychometrika
(2006)
Least Squares Computations by givens transformations without square roots
IMA Journal of Applied Mathematics
Cautionary remarks on the use of clusterwise regression
Multivariate Behavioral Research
Correction to algorithm 39 clusterwise linear regression
Computing
A fast algorithm for clusterwise linear regression
Computing
A simulated annealing methodology for clusterwise linear regression
Psychometrika
Variable neighborhood search for least squares clusterwise regression
Les Cahiers du GERAD
A bio-mimetic approach to marketing segmentation: principles and comparative analysis
European Journal of Economic and Social Systems
A dyadic segmentation approach to business partnerships
European Journal of Economic and Social Systems
Locally linear regression and the calibration problem for micro-array analysis
Clustering for data mining
A maximum likelihood methodology for clusterwise linear regression
Journal of Classification
A mixture likelihood approach for generalized linear models
Journal of Classification
Cited by (23)
A column generation based heuristic algorithm for piecewise linear regression
2021, Expert Systems with ApplicationsIncremental DC optimization algorithm for large-scale clusterwise linear regression
2021, Journal of Computational and Applied MathematicsCitation Excerpt :CLR is a global optimization problem. However, conventional global optimization algorithms as well as exact algorithms from [15,17,18] are not always applicable to solve CLR problems in data sets with the relatively large number of data points and/or input variables. In addition, these algorithms are not efficient when the large number of linear functions are required to approximate data as they may require prohibitively large computational effort and may not find any solution in a reasonable time.
Clusterwise support vector linear regression
2020, European Journal of Operational ResearchCitation Excerpt :Therefore, the number of variables is significantly less than that in models based on nonlinear programming, mixed-integer linear and mixed-integer quadratic programming techniques, where also the number of data points in a data set affects. Algorithms for solving the CLR problem include those which are extensions of clustering algorithms such as the k-means (Späth, 1979) and the expectation-maximization algorithms (EM) (Gaffney & Smyth, 1999) and those based on the nonlinear programming (Lau et al., 1999), the mixed-integer linear programming (Bertsimas & Shioda, 2007), the mixed-integer nonlinear programming (Carbonneau, Caporossi, & Hansen, 2012; DeSarbo, Oliver, & Rangaswamy, 1989), nonsmooth optimization (Bagirov & Ugon, 2018; Bagirov et al., 2013; 2015a; 2015b) and mixture models (DeSarbo & Cron, 1988; Garcìa-Escudero et al., 2010). In this paper, a new approach for modelling and solving CLR problems is proposed using support vector machines (SVM) for regression (Collobert & Bengio, 2001; Smola & Schölkopf, 2004).
Satisfiability modulo theories for process systems engineering
2018, Computers and Chemical EngineeringCitation Excerpt :For many cases, e.g. where MILP binary variables model existence or assignment, MLLP may result in an easier-to-comprehend model with fewer variables. MLLP may be extended to models with nonlinear constraints, i.e. to mixed logical-nonlinear programming (Bemporad and Giorgetti, 2004; 2006; Bollapragada et al., 2001; Carbonneau et al., 2011; 2012; Türkay and Grossmann, 1996). Even advances such as GDP (Grossmann and Ruiz, 2012) cannot compete with the expressiveness of constraints written in SMT solvers.
The role of optimization in some recent advances in data-driven decision-making
2023, Mathematical Programming