Skip to main content
Log in

Extending the rand, adjusted rand and jaccard indices to fuzzy partitions

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The first stage of knowledge acquisition and reduction of complexity concerning a group of entities is to partition or divide the entities into groups or clusters based on their attributes or characteristics. Clustering is one of the most basic processes that are performed in simplifying data and expressing knowledge in a scientific endeavor. It is akin to defining classes. Since the output of clustering is a partition of the input data, the quality of the partition must be determined as a way of measuring the quality of the partitioning (clustering) process. The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. This paper looks at some commonly used clustering measures including the rand index (RI), adjusted RI (ARI) and the jaccuard index(JI) that are already defined for crisp clustering and extends them to fuzzy clustering measures giving FRI,FARI and FJI. These new indices give the same values as the original indices do in the special case of crisp clustering. The extension is made by first finding equivalent expressions for the parameters, a, b, c, and d of these indices in the case of crisp clustering. A relationship called bonding that describes the degree to which two cluster members are in the same cluster or class is first defined. Through use in crisp clustering and fuzzy clustering the effectiveness of the indices is demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.

    Google Scholar 

  • Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

    MATH  Google Scholar 

  • Brouwer, R. K. (2006). Clustering without use of prototypes. Kamloops: Thompson Rivers University press TR TRU-CS-CIG-2006-01, May 1.

    Google Scholar 

  • Brouwer, R. K. (2008) A clustering quality measure based on the proximity matrices for the pattern vectors and the membership vectors. International Journal Pattern Recognition and Artificial Intelligence (vol. under review).

  • Brouwer, R. K., & Groenwold, A. (2007). A method of proximity matrix based fuzzy clustering. In L. Wang, & Y. Jin (Eds.) FSKD 2007 Fourth International Conference on Fuzzy Systems and Knowledge Discovery (pp. 91–97). New York: Springer.

    Chapter  Google Scholar 

  • Collins, L. M., & Dent, C. W. (1988). Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions. Multivariate Behavioural Research, 23, 231–42.

    Article  Google Scholar 

  • DeRisi, J. I., Iyer, V. R., & Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680–686.

    Article  Google Scholar 

  • Eisen, M. B., Spellman, P. T., & Brown, P. O. (1998). Cluster analysis and display of genome-wide expression patterns. National Academy of Science of the United States of America, 95, 14863–14868.

    Article  Google Scholar 

  • Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

    Google Scholar 

  • Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. A., & Dudoit, S. (2005). Bioinformatics and computational biology solutions using R and bioconductor. New York: Springer.

    Book  MATH  Google Scholar 

  • Graves, D. (2006). Clustering quality measures. Kamloops: Thompson Rivers University press TR TRU-CIG-2006-07, July.

    Google Scholar 

  • Hoppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis. Hoboken: Wiley.

    Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–198.

    Article  Google Scholar 

  • Hubert, L. J., & Golledge, R. G. (1981). A heuristic method for the comparison of related structures. Journal of Mathematical Psychology, 23, 214–226.

    Article  Google Scholar 

  • Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Upper Saddle River: Prentice Hall.

    MATH  Google Scholar 

  • Li, T., Ma, S., & Ogihara, M. (2004a). Entropy-based criterion in categorical clustering. in 21st International Conference on Machine learning, Banff, Canada, 4–8 July.

  • Li, H., Zhang, K., & Jiang, T. (2004b). Minimum entropy clustering and applications to gene expression analysis. IEEE Computational Systems Bioinformatics Conference, 142–151. DOI 10.1109/CSB.2004.1332427.

  • Milligan, G. W., & Cooper, M. C. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21, 441–558.

    Article  Google Scholar 

  • Morey, L. C., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the rand statistic for chance agreement. Educational and Psychological Measurement, 44, 3–37.

    Article  Google Scholar 

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  • Robert, S., & Ken, S. (1996). A computer program to calculate Hubert and Arabie's adjusted rand index. Journal of Classification, 13, 169–172.

    Article  Google Scholar 

  • Saporta, G., & Youness, G. (2002). Comparing two partitions: Some proposals and experiments. in COMPSTAT, 15th Conference on Computational Statistics. Berlin: Institute of Statistics and Econometrics.

    Google Scholar 

  • Thalamuthu, A., Mukhopadhyay, I., Zheng, X., & Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 22, 2405–2412 October 1.

    Article  Google Scholar 

  • Wen, X., Fuhrman, S., Michaels, G. S., Carr, D. B., Smith, S., Barker, J. I., et al. (1998). Large-scale temporal gene expression mapping of central nervous system development. National Academy of Science of the United States of America, 95, 334–339.

    Article  Google Scholar 

  • Yeung, K. Y., Haynor, D. R., & Ruzzo, W. L. (2001). Validating clustering for gene expression data. Bioinformatics, 17, 309–318 April 1.

    Article  Google Scholar 

  • Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17, 763–774 September 1.

    Article  Google Scholar 

Download references

Acknowledgement

The support of an Natural Sciences and Engineering Research Council grant 227338-04 from the Canadian Government is greatly appreciated as is the support of the Department of Mechanical and Mechatronics Engineering of the University of Stellenbosch. The work of the reviewers in making this a better paper is also appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roelof K. Brouwer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brouwer, R.K. Extending the rand, adjusted rand and jaccard indices to fuzzy partitions. J Intell Inf Syst 32, 213–235 (2009). https://doi.org/10.1007/s10844-008-0054-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-008-0054-7

Keywords

Navigation