Abstract
A well-known clustering model to represent I × I × J data blocks, the J frontal slices of which consist of I × I object by object similarity matrices, is the INDCLUS model. This model implies a grouping of the I objects into a prespecified number of overlapping clusters, with each cluster having a slice-specific positive weight. An INDCLUS model is fitted to a given data set by means of minimizing a least squares loss function. The minimization of this loss function has appeared to be a difficult problem for which several algorithmic strategies have been proposed. At present, the best available option seems to be the SYMPRES algorithm, which minimizes the loss function by means of a block-relaxation algorithm. Yet, SYMPRES is conjectured to suffer from a severe local optima problem. As a way out, based on theoretical results with respect to optimally designing block-relaxation algorithms, five alternative block-relaxation algorithms are proposed. In a simulation study it appears that the alternative algorithms with overlapping parameter subsets perform best and clearly outperform SYMPRES in terms of optimization performance and cluster recovery.
Similar content being viewed by others
References
ARABIE, P., and CARROLL, J.D. (1980), “MAPCLUS: A Mathematical Programming Approach to Fitting the ADCLUS Model,” Psychometrika, 45, 211–235.
BAKEMAN, R. (2005), “Recommended Effect Size Statistics for Repeated Measures Designs,” Behavior Research Methods, 37, 379–384.
BRO, R., and DE JONG, S. (1997), “A Fast Non-Negativity-Constrained Least Squares Algorithm,” Journal of Chemometrics, 11, 393–401.
CARROLL, J.D., and ARABIE, P. (1983), “INDCLUS: An Individual Differences Generalization of the ADCLUS Model and the MAPCLUS Algorithm,” Psychometrika, 48, 157–169.
CEULEMANS, E., VAN MECHELEN, I., and LEENEN, I. (2007), “The Local Minima Problem in Hierarchical Classes Analysis: An Evaluation of a Simulated Annealing Algorithm and Various Multistart Procedures,” Psychometrika, 72, 377–391.
CHATURVEDI, A., and CARROLL, J.D. (1994), “An Alternating Combinatorial Optimization Approach to Fitting the INDCLUS and Generalized INDCLUSModels,” Journal of Classification, 11, 155–170.
COHEN, J. (1960), “A Coefficient of Agreement for Nominal Scales,” Educational and Psychological Measurement, 20, 37–46.
DE LEEUW, J. (1994), “Block-Relaxation Algorithms in Statistics”, in: Information Systems and Data Analysis, eds. H.H. Bock, W. Lenski, and M.M. Richter, Berlin: Springer-Verlag, pp. 308–325.
DEPRIL, D., VAN MECHELEN, I., and MIRKIN, B.G. (2008), “Algorithms for Additive Clustering of Rectangular Data Tables,” Computational Statistics and Data Analysis, 52, 4923–4938.
DHILLON, I.S., GUAN, Y., and KOGAN, J. (2002), “Refining Clusters in High-Dimensional Text Data”, in: Proceedings of the Workshop on Clustering High Dimensional Data and its Applications at the Second SIAM International Conference on Data Mining, eds. I.S. Dhillon, and J. Kogan, SIAM 2002, pp. 71–82.
GELMAN, A., CARLIN, J.B., STERN, H.S., and RUBIN, D.B. (1995), Bayesian Data Analysis, London: Chapman and Hall.
GOODMAN, J., and SOKAL, A.D. (1989), “Multigrid Monte Carlo Method: Conceptual Foundations,” Physical Review D, 40, 2035–2071.
HARSHMAN, R.A., and LUNDY, M.E. (1984), “The PARAFAC Model”, in: Research Methods for Multimode Data Analysis, eds. H.G. Law, C.W. Snyder, Jr., J.A. Hattie, and R.P. McDonald, New York: Praeger, pp. 122–215.
KIERS, H.A.L. (1997), “A Modification of the SINDCLUS Algorithm for Fitting the ADCLUS and INDCLUS Models,” Journal of Classification, 14, 297–310.
LARSEN, B., and AONE, C. (1999), “Fast and Effective Text Mining Using Linear-Time Document Clustering”, in: Proceedings of the Fifth ACM SIGKDD, San Diego, CA, pp. 16–22.
LAWSON, C.L., and HANSON, R.J. (1974), Solving Least Squares Problems, Englewood Cliffs, NJ: Prentice-Hall Inc.
MILLIGAN, G.W. (1980), “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms,” Psychometrika, 45, 325–342.
MIRKIN, B.G. (1987), “The Method of Principal Clusters,” Automation and Remote Control, 10, 131–143.
MIRKIN, B.G. (1990), “A Sequential Fitting Procedure for Linear Data Analysis Models,” Journal of Classification, 7, 167–195.
MIRKIN, B.G. (1996), Mathematical Classification and Clustering (Nonconvex Optimization and its Applications), Boston-Dordrecht: Kluwer Academic Press.
ROBERTS, G.O., and SAHU, S.K. (1997), “Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler,” Journal of the Royal Statistical Society: Series B, 59, 291–317.
SCHEPERS, J., VAN MECHELEN, I., and CEULEMANS, E. (2006), “Three-Mode Partitioning,” Computational Statistics and Data Analysis, 51, 1623–1642.
SEEWALD, W. (1992), “Discussion on Parameterization Issues in Bayesian Inference (by S. E. Hills and A. F. M. Smith)”, in: Bayesian Statistics 4, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, Oxford: Oxford University Press, pp. 241–243.
SHEPARD, R.N., and ARABIE, P. (1979), “Additive Clustering: Representation of Similarities as Combinations of Discrete Overlapping Properties,” Psychological Review, 86, 87–123.
SMITH, A.F.M., and ROBERTS, G.O. (1993), “Bayesian Computation via the Gibbs Sampler and Related Markov Chain Monte Carlo Methods,” Journal of the Royal Statistical Society: Series B, 55, 3–23.
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, in: Proceedings of the Sixth ACM SIGKDD, World Text Mining Conference, Boston, MA.
TEN BERGE, J.M.F., and KIERS, H.A.L. (2005), “A Comparison of Two Methods for Fitting the INDCLUS Model,” Journal of Classification, 22, 273–286.
TUKEY, J.W. (1953), “The Problem of Multiple Comparisons”, Mimeographed monograph.
TVERSKY, A. (1977), “Features of Similarity,” Psychological Review, 84, 327–352.
WILDERJANS, T.F., CEULEMANS, E., VAN MECHELEN, I., and VAN DEN BERG, R.A. (2011), “Simultaneous Analysis of Coupled Data Matrices Subject to Different Amounts of Noise,” British Journal of Mathematical and Statistical Psychology, 64, 277–290.
WILDERJANS, T.F., CEULEMANS, E., and VAN MECHELEN, I. (2009), “Simultaneous Analysis of Coupled Data Blocks Differing in Size: A Comparison of Two Weighting Schemes,” Computational Statistics and Data Analysis, 53, 1086–1098.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research in this paper was partially supported by the Research Fund of KU Leuven (PDM-kort project 3 H100377, dr. Tom F. Wilderjans; GOA 2005/04, Prof. dr. Iven Van Mechelen), by the Belgian Science Policy (IAP P6/03, Prof. dr. Iven Van Mechelen), and by the Fund of Scientific Research (FWO)-Flanders (project G.0546.09, Prof. dr. I. Van Mechelen). The simulation study was conducted using high performance computational resources provided by KU Leuven (http://ludit.kuleuven.be/hpc). We would like to thank Prof. dr. J. de Leeuw for his helpful advice on the topic of block-relaxation algorithms. We further also would like to thank three anonymous reviewers for their useful comments and suggestions which considerably improved earlier versions of this manuscript.
Rights and permissions
About this article
Cite this article
Wilderjans, T.F., Depril, D. & Van Mechelen, I. Block-Relaxation Approaches for Fitting the INDCLUS Model. J Classif 29, 277–296 (2012). https://doi.org/10.1007/s00357-012-9113-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-012-9113-4