Iterative factor clustering of binary data

Iodice D’Enza, Alfonso; Palumbo, Francesco

doi:10.1007/s00180-012-0329-x

Iterative factor clustering of binary data

Original Paper
Published: 19 May 2012

Volume 28, pages 789–807, (2013)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Alfonso Iodice D’Enza¹ &
Francesco Palumbo²

684 Accesses
8 Citations
Explore all metrics

Abstract

Binary data represent a very special condition where both measures of distance and co-occurrence can be adopted. Euclidean distance-based non-hierarchical methods, like the k-means algorithm, or one of its versions, can be profitably used. When the number of available attributes increases the global clustering performance usually worsens. In such cases, to enhance group separability it is necessary to remove the irrelevant and redundant noisy information from the data. The present approach belongs to the category of attribute transformation strategy, and combines clustering and factorial techniques to identify attribute associations that characterize one or more homogeneous groups of statistical units. Furthermore, it provides graphical representations that facilitate the interpretation of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Arabie P, Hubert L (1994) Cluster analysis in marketing research. IEEE Trans Autom Control 19:716–723
Google Scholar
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat A Theory 3:1–27
Article MATH Google Scholar
Chae SS, Dubien JL, Warde WD (2006) A method of predicting the number of clusters using Rands statistic. Comput Stat Data Anal 50:3531–3546
Article MathSciNet MATH Google Scholar
Choi SS, Cha SS, Tappert CC (2010) A survey of binary similarity and sistance measures. J Syst Cybernet Inform 8:43–48
Google Scholar
Dimitriadou E, Dolnicar S, Weingassel A (2002) An examination of indexes for setermining the number of clusters in binary data sets. Psychometrika 67:137–160
Article MathSciNet Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3:1–21
Article Google Scholar
Ertoz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Barbara D, Kamath C (eds) Proceedings of the third SIAM international conference on data mining, vol 112, pp 47–59
Greenacre MJ (2007) Correspondence analysis in practice, 2nd edn. Chapman and Hall, Boca Raton
Guha S, Rastogi S, Shim K (2000) ROCK: a robust clustering algorithm for categorical attribute. Inform Syst 25:512–521
Article Google Scholar
Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning. Springer, New York
MATH Google Scholar
Hwang H, Dillon WR (2010) Simultaneous two-way clustering of multiple correspondence analysis. Multivar Behav Res 45:186–208
Article Google Scholar
Hwang H, Dillon WR, Takane Y (2006) An extension of multiple correspondence analysis for identifying heterogenous subgroups of respondents. Psychometrika 71:161–171
Article MathSciNet Google Scholar
Javed K, Babri H, Saeed M (2012) Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans Knowl Data Eng 24:465–477
Article Google Scholar
Kaufman L, Rousseeuw PJ (2005) Finding groups in data. An introduction to cluster analysis. Wiley, Hoboken
Google Scholar
Kraus MJ, Müssel C, Palm G, Kestler HA (2011) Multi-objective selection for collecting cluster alternatives. Comput Stat 26:341–353
Article Google Scholar
Kuncheva LI, Vetrov DP (2005) Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal 28:1798–1808
Article Google Scholar
Lauro CN, Balbi S (1999) The analysis of structured qualitative data. Appl Stoch Model Data Anal 15:1–27
Article MathSciNet MATH Google Scholar
Lauro CN, D’Ambra L (1984) L’analyse non symmétrique des correspondances. In: Diday E et al (eds) Data analysis and informatics, III. North Holland, Amsterdam, pp 433–446
Lebart L, Morineau A, Warwick K (1984) Multivariate descriptive statistical analysis. Wiley, New York
MATH Google Scholar
Light R, Margolin B (1971) An analysis of variance for categorical data. In J Am Stat Assoc 66:534–544
Article MathSciNet MATH Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297
Mola F, Siciliano R (1997) A fast splitting procedure for classification and regression trees. Stat Comput 7:208–216
Article Google Scholar
Mucha HJ (2002) An intelligent clustering clustering technique based on dual scaling. In: Nishisato S, Baba Y, Bozdogan H, Kanefuji K (eds) Measurement and multivariate analysis. Springer, Tokyo, pp 37–46
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data. Psychometrika 50:159–179
Article Google Scholar
Mirkin B (2001) Eleven ways to look at the Chi-squared coefficient for contingency tables. Am Stat 55:111–120
Article MathSciNet Google Scholar
Mirkin B (2011) Choosing the number of clusters. WIREs Data Mining Knowl Disc 1:252–260
Article Google Scholar
Nocke T, Schumann H, Böhm U (2004) Methods for the visualization of clustered climate data. Comput Stat 19:74–94
Article Google Scholar
Palumbo F, Iodice D’Enza A (2012) Adaptive factorial clustering of binary data. In: Giusti A, Ritter G, Vichi M (eds) Classification and data mining. Studies in classification, data analysis, and knowledge organization, July 2012
Palumbo F, Siciliano R (1999) Factorial discriminant analysis and probabilistic models. In: Metron, LVI, pp 186–198
van Buuren S, Heiser WJ (1989) Clustering \(n\) objects in \(k\) groups under optimal scaling of variables. Psychometrika 54:699–706
Article MathSciNet Google Scholar
Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53:3194–3208
Article MathSciNet MATH Google Scholar
Vichi M, Kiers H (2001) Factorial k-means analysis for two way data. Comput Stat Data Anal 37:49–64
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Scienze Economiche, Università di Cassino, Cassino, Italy
Alfonso Iodice D’Enza
Dipartimento di Teoria e Metodi per le Scienze Umane e Sociali, Università degli Studi di Napoli ‘Federico II’, Naples, Italy
Francesco Palumbo

Authors

Alfonso Iodice D’Enza
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Palumbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Palumbo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iodice D’Enza, A., Palumbo, F. Iterative factor clustering of binary data. Comput Stat 28, 789–807 (2013). https://doi.org/10.1007/s00180-012-0329-x

Download citation

Received: 04 February 2011
Accepted: 12 April 2012
Published: 19 May 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s00180-012-0329-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative factor clustering of binary data

Abstract

Access this article

Similar content being viewed by others

Multiple Correspondence K-Means: Simultaneous Versus Sequential Approach for Dimension Reduction and Clustering

Cluster Analysis of Data with Reduced Dimensionality: An Empirical Study

Variable Selection in Cluster Analysis: An Approach Based on a New Index

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Iterative factor clustering of binary data

Abstract

Access this article

Similar content being viewed by others

Multiple Correspondence K-Means: Simultaneous Versus Sequential Approach for Dimension Reduction and Clustering

Cluster Analysis of Data with Reduced Dimensionality: An Empirical Study

Variable Selection in Cluster Analysis: An Approach Based on a New Index

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation