Abstract
Many studies addressing the problem of selecting or weighting variables for cluster analysis assume that all the variables define a unique classification of units. However it is also possible that different classifications of units can be obtained from different subsets of variables. In this paper this problem is considered from a model-based perspective. Limitations and drawbacks of standard latent class cluster analysis are highlighted and a new procedure able to overcome these difficulties is proposed. The results obtained from the application of this procedure on simulated and real data sets are presented and discussed.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
DY, G., BRODLEY, C.E. (2004): Feature Selection for Unsupervised Learning. Journal of Machine Learning Research, 5, 845–889.
FOWLKES, E.B., GNANADESIKAN, R., KETTENRING, J.R. (1988): Variable Selection in Clustering. Journal of Classification, 5, 205–228.
FRALEY, C. and RAFTERY, A.E. (2002a): Model-Based Clustering, Discriminant Analysis and Density Estimation. Journal of the American Statistical Association, 97, 611–631.
FRALEY, C. and RAFTERY, A.E. (2002b): MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis. Technical Report No. 415, Department of Statistics, University of Washington.
FRIEDMAN, J.H. and MEULMAN, J.J. (2004): Clustering Objects on Subsets of Attributes. Journal of the Royal Statistical Society B, 66, 815–849.
GNANADESIKAN, R., KETTENRING, J.R., TSAO, S.L. (1995): Weighting and Selection of Variables for Cluster Analysis. Journal of Classification, 12, 113–136.
GORDON, A.D. (1999): Classification, 2nd Edition. Chapman & Hall, Boca Raton.
GREEN, P.E., CARMONE, F.J., KIM, J. (1990): A Preliminary Study of Optimal Variable Weighting in k-means Clustering. Journal of Classification, 7, 271–285.
HASTIE, T., TIBSHIRANI, R., EISEN, M.B., ALIZADEH, A. et al. (2000): Gene Shaving as a Method for Identifying Distinct Sets of Genes with Similar Expression Patterns. Genome Biology, 1, 1–21.
MCLACHLAN, G., PEEL, D. (2000): Finite Mixture Models. John Wiley & Sons, Chichester.
MILLIGAN, G.W., COOPER, M.C. (1988): A Study of Standardization of Variables in Cluster Analysis. Journal of Classification, 5, 181–204.
MIRKIN, B. (1999): Concept Learning and Feature Selection Based on Square-Error Clustering. Machine Learning, 35, 25–39.
MODHA, D.S., SPANGLER, W.S. (2003): Feature Weighting in k-means Clustering. Machine Learning, 52, 217–237.
SOFFRITTI, G. (2003): Identifying Multiple Cluster Structures in a Data Matrix. Communications in Statistics: Simulation and Computation, 32, 1151–1177.
VERMUNT, J.K. and MAGIDSON, J. (2002): Latent Class Cluster Analysis. In: J.A. Hagenaars and A.L. McCutcheon (Eds.): Applied Latent Class Analysis. Cambridge University Press, Cambridge, 89–106.
VICHI, M. (2001): Double k-means Clustering for Simultaneous Classification of Objects and Variables. In: S. Borra, R. Rocci, M. Vichi and M. Schader (Eds.): Advances in Classification and Data Analysis. Springer-Verlag, Berlin, 43–52.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Galimberti, G., Soffritti, G. (2006). Identifying Multiple Cluster Structures Through Latent Class Models. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_20
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)