Skip to main content
Log in

Globally Optimal Clusterwise Regression By Column Generation Enhanced with Heuristics, Sequencing and Ending Subset Optimization

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A column generation based approach is proposed for solving the cluster-wise regression problem. The proposed strategy relies firstly on several efficient heuristic strategies to insert columns into the restricted master problem. If these heuristics fail to identify an improving column, an exhaustive search is performed starting with incrementally larger ending subsets, all the while iteratively performing heuristic optimization to ensure a proper balance of exact and heuristic optimization. Additionally, observations are sequenced by their dual variables and by their inclusion in joint pair branching rules. The proposed strategy is shown to outperform the best known alternative (BBHSE) when the number of clusters is greater than three. Additionally, the current work further demonstrates and expands the successful use of the new paradigm of using incrementally larger ending subsets to strengthen the lower bounds of a branch and bound search as pioneered by Brusco's Repetitive Branch and Bound Algorithm (RBBA).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ALOISE, D., CAFIERI, S., CAPOROSSI G., HANSEN, P., PERRON, S., LIBERTI, L. (2010a), "Column Generation Algorithms for Exact Modularity Maximization in Networks," Physical Review E, 82, 046112.

    Article  Google Scholar 

  • ALOISE, D., HANSEN, P., and LIBERTI, L. (2010b), "An Improved Column Generation Algorithm for Minimum Sum-of-Squares Clustering", Mathematical Programming, 1–26.

  • AURIFEILLE, J.M. (2000), "A Bio-Mimetic Approach to Marketing Segmentation: Principles and Comparative Analysis," European Journal of Economic and Social Systems, 14, 93–108.

    Article  MATH  Google Scholar 

  • AURIFEILLE, J.M., and MEDLIN, C. (2001), "A Dyadic Segmentation Approach to Business Partnerships," European Journal of Economic and Social Systems, 15, 3–16.

    Article  MATH  Google Scholar 

  • AURIFEILLE, J.M., and QUESTER, P.G. (2003), "Predicting Business Ethical Tolerance in International Markets: A Concomitant Clusterwise Regression Analysis," International Business Review, 12, 253–272.

    Article  Google Scholar 

  • BARNHART, C., JOHNSON, E.L., NEMHAUSER, G.L., SAVELSBERGH, M.W.P., and VANCE, P.H. (1998), "Branch-and-Price: Column Generation for Solving Huge Integer Programs," Operations Research, 46, 316–329.

    Article  MATH  MathSciNet  Google Scholar 

  • BERKELAAR, M., EIKLAND, K., and NOTEBAERT, P. (2010), "Lp_Solve: Open Source (Mixed-Integer) Linear Programming System," http://lpsolve.sourceforge.net/5.5/

  • BRUSCO, M.J. (2006), "A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning," Psychometrika, 71, 347–363.

    Article  MathSciNet  Google Scholar 

  • BRUSCO, M.J., CRADIT, J.D., STEINLEY, D., and FOX, G.L. (2008), "Cautionary Remarks on the Use of Clusterwise Regression," Multivariate Behavioral Research, 43, 29–49.

    Article  Google Scholar 

  • BRUSCO, M.J., and STAHL, S. (2005), Branch-and-Bound Applications in Combinatorial Data Analysis, New York: Springer Verlag.

    MATH  Google Scholar 

  • CAPOROSSI, G., and HANSEN, P. (2005), "Variable Neighborhood Search for Least Squares Clusterwise Regression," Les Cahiers du GERAD, G-2005-61.

  • CARBONNEAU, R.A., CAPOROSSI, G., and HANSEN, P. (2011a), "Extensions to the Repetitive Branch and Bound Algorithm for Globally Optimal Clusterwise Regression," GERAD, G-2011-10.

  • CARBONNEAU, R.A., CAPOROSSI, G., and HANSEN, P. (2011b), "Globally Optimal Clusterwise Regression by Mixed Logical-Quadratic Programming," European Journal of Operational Research, 212, 213–222.

    Article  MathSciNet  Google Scholar 

  • CHARLES, C. (1977), "Régression Typologique Et Reconnaissance Des Formes," Thèse de doctorat 3ième cycle, Université de Paris IX.

  • CHVÁTAL, V. (1983), Linear Programming, New York: WH Freeman.

    MATH  Google Scholar 

  • CIAMPI, A., RICH, B., DYACHENKO, A., ANTONIANO, I., MURIE, C., AND NADON, R. (2007), "Locally Linear Regression and the Calibration Problem for Micro-Array Analysis," in Selected Contributions in Data Analysis and Classification, eds. P. Brito, G. Cucumel, P. Bertrand and F. de Carvalho, Berlin: Springer, pp. 549–555.

    Chapter  Google Scholar 

  • DANTZIG, G.B., and WOLFE, P. (1960), "Decomposition Principle for Linear Programs," Operations Research, 8, 101–111.

    Article  MATH  Google Scholar 

  • DESARBO, W. (1988), "A Maximum Likelihood Methodology for Clusterwise Linear Regression," Journal of Classification, 5, 249–282.

    Article  MATH  MathSciNet  Google Scholar 

  • DESARBO, W., OLIVER, R., and RANGASWAMY, A. (1989), "A Simulated Annealing Methodology for Clusterwise Linear Regression," Psychometrika, 54, 707–736.

    Article  Google Scholar 

  • DESAULNIERS, G., DESROSIERS, J., and SOLOMON, M.M. (eds.) (2005), Column Generation (Vol. 5), Berlin: Springer Verlag.

    MATH  Google Scholar 

  • DIDAY, E. (1979), Optimization En Classification Automatique, Le Chesnay: INRIA.

    Google Scholar 

  • GENTLEMAN, W.M. (1973), "Least Squares Computations by Givens Transformations without Square Roots," IMA Journal of Applied Mathematics, 12, 329–336.

    Article  MATH  MathSciNet  Google Scholar 

  • HANSEN, P., and MEYER, C. (2011), "A New Column Generation Algorithm for Logical Analysis of Data," Annals of Operations Research, 188, 215–249.

    Article  MATH  Google Scholar 

  • HENNIG, C. (2000), "Identifiablity of Models for Clusterwise Linear Regression," Journal of Classification, 17, 273–296.

    Article  MATH  MathSciNet  Google Scholar 

  • HOOKER, J.N. (2002), "Logic, Optimization, and Constraint Programming," INFORMS Journal on Computing, 14, 295–321.

    Article  MATH  MathSciNet  Google Scholar 

  • HOOKER, J.N. (2007), Integrated Methods for Optimization, New York: Springer.

    MATH  Google Scholar 

  • HOOKER, J.N., and OSORIO, M.A. (1999), "Mixed Logical-Linear Programming," Discrete Applied Mathematics, 96, 395–442.

    Article  MathSciNet  Google Scholar 

  • HOOKER, J.N., OTTOSSON, G., THORSTEINSSON, E.S., and KIM, H.J. (2000), "A Scheme for Unifying Optimization and Constraint Satisfaction Methods," The Knowledge Engineering Review, 15, 11–30.

    Article  Google Scholar 

  • IBM (2009a), Ibm Ilog Cplex 12.1.0, Armonk, NY: IBM.

    Google Scholar 

  • IBM (2009b), Ibm Ilog Opl 6.3, Armonk, NY: IBM.

    Google Scholar 

  • KNUTH, D.E. (1997), "Art of Computer Programming, Volume 2: Seminumerical Algorithms" (3rd ed.), Addison-Wesley Professional.

  • LAU, KN., LEUNG, P.L., and TSE, K.K. (1999), "A Mathematical Programming Approach to Clusterwise Regression Model and Its Extensions," European Journal of Operational Research, 116, 640–652.

    Article  MATH  Google Scholar 

  • LÜBBECKE, M.E., and DESROSIERS, J. (2005), "Selected Topics in Column Generation," Operations Research, 53, 1007–1023.

    Article  MATH  MathSciNet  Google Scholar 

  • MIRKIN, B. (2005), Clustering for Data Mining, London: Chapman & Hall/CRC.

    Book  MATH  Google Scholar 

  • RYAN, D.M., and FOSTER, B.A. (1981), "An Integer Programming Approach to Scheduling," in Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, Amsterdam: North-Holland, pp. 269–280.

    Google Scholar 

  • SPÄTH, H. (1979), "Algorithm 39 Clusterwise Linear Regression," Computing, 22, 367–373.

    Article  MATH  MathSciNet  Google Scholar 

  • SPÄTH, H. (1981), "Correction to Algorithm 39 Clusterwise Linear Regression," Computing, 26, 275–275.

    Article  MATH  MathSciNet  Google Scholar 

  • SPÄTH, H. (1982), "A Fast Algorithm for Clusterwise Linear Regression," Computing, 29, 175–181.

    Article  MATH  Google Scholar 

  • VAN HENTENRYCK, P., LUSTIG, I., MICHEL, L., and PUGET, J.F. (1999), The Opl Optimization Programming Language, Cambridge: MIT Press.

    Google Scholar 

  • WEDEL, M. (1990), "Clusterwise Regression and Market Segmentation. Developments and Applications," doctoral thesis, Landbouwuniversiteit Wageningen.

  • WEDEL, M. (1998), Glimmix: Simultaneous Estimation of Latent Classes and Generalized Models within Each Latent Class, User's Manual, Version 1.0, Groningen: ProGAMMA.

    Google Scholar 

  • WEDEL, M., and DESARBO, W.S. (1995), "A Mixture Likelihood Approach for Generalized Linear Models," Journal of Classification, 12, 21–55.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Réal A. Carbonneau.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carbonneau, R.A., Caporossi, G. & Hansen, P. Globally Optimal Clusterwise Regression By Column Generation Enhanced with Heuristics, Sequencing and Ending Subset Optimization. J Classif 31, 219–241 (2014). https://doi.org/10.1007/s00357-014-9155-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-014-9155-x

Keywords

Navigation