Fast fuzzy clustering

doi:10.1016/S0165-0114(96)00232-1

Fuzzy Sets and Systems

Volume 93, Issue 1, 1 January 1998, Pages 49-56

https://doi.org/10.1016/S0165-0114(96)00232-1 Get rights and content

Abstract

This paper presents a multistage random sampling fuzzy c-means-based clustering algorithm, which significantly reduces the computation time required to partition a data set into c classes. A series of subsets of the full data set are used to create initial cluster centers in order to provide an approximation to the final cluster centers. The quality of the final partitions is equivalent to those created by fuzzy c-means. The speed-up is normally a factor of 2–3 times, which is especially significant for high-dimensional spaces and large data sets. Examples of the improved speed of the algorithm in two multi-spectral domains, magnetic resonance image segmentation and satellite image segmentation, are given. The results are compared with fuzzy c-means in terms of both the time required and the final resulting partition. Significant speedup is shown in each example presented in the paper. Further, the convergence properties of fuzzy c-means are preserved.

References (11)

J.C. Bezdek
Pattern Recognition with Fuzzy Objective Function Algorithm
(1981)
J.C. Bezdek et al.
Review of MR image segmentation techniques using pattern recognition
Med. Phys.
(1993)
R.L. Cannon et al.
Efficient implementation of the fuzzy c-means clustering algorithms
IEEE Trans. Pattern Anal. Machine Intelligence
(1986)
R.L. Cannon et al.
Efficient implementation of the fuzzy c-means clustering algorithms
IEEE Trans. Pattern Anal. Machine Intelligence
(1986)
T.W. Cheng et al.
Fast clustering with application to fuzzy rule generation

There are more references available in the full text version of this article.

Cited by (116)

IBRIDIA: A hybrid solution for processing big logistics data
2019, Future Generation Computer Systems
Citation Excerpt :
Although the algorithm has the disadvantage of needing an expert intervention to specify many parameters before it works, its performance is better than HPStream algorithm [27]. In [28], a multi-level unordered sampling technique was suggested to boost the time performance of fuzzy C-means. The technique is double phased.
Internet of Things (IoT) is leading to a paradigm shift within the logistics industry. Logistics services providers use sensor technologies such as GPS or telemetry to track and manage their shipment processes. Additionally, they use external data that contain critical information about events such as traffic, accidents, and natural disasters. Correlating data from different sensors and social media and performing analysis in real-time provide opportunities to predict events and prevent unexpected delivery delay at run-time. However, collecting and processing data from heterogeneous sources foster problems due to the variety and velocity of data. In addition, processing data in real-time is heavily challenging that it cannot be dealt with using conventional logistics information systems. In this paper, we present a hybrid framework for processing massive volume of data in batch style and real-time. Our framework is built upon Johnson’s hierarchical clustering (HCL) algorithm which produces a dendrogram that represents different clusters of data objects.
A convex semi-nonnegative matrix factorisation approach to fuzzy c-means clustering
2015, Fuzzy Sets and Systems
We propose an alternative approach to fuzzy c-means clustering which eliminates the weighting exponent parameter of conventional algorithms. It is based on a particular convex factorisation of data matrix. The proposed method is invariant under certain linear transformations of the data including principal component analysis. We tested its accuracy using both synthetic data and real datasets, and compared it to that provided by the usual fuzzy c-means algorithm. We were able to ascertain that our proposal can be a credible yet easier alternative to this approach to fuzzy clustering. Moreover, it showed no noticeable sensitivity to the initial guess of the partition matrix.
Approximate spectral clustering with utilized similarity information using geodesic based hybrid distance measures
2015, Pattern Recognition
Citation Excerpt :
We briefly explain these three methods below. Various sampling methods have been employed for clustering of large datasets to make spectral clustering feasible [12,16,17], to speed-up the fuzzy c-means [29], or to make support vector machines scalable [30]. Among them, random sampling is the fastest due to its non-parametric straightforward approach in expense of relatively high error rates.
Spectral clustering has been popular thanks to its ability to extract clusters of varying characteristics without using a parametric model in expense of high computational cost required for eigendecomposition of pairwise similarities. In order to utilize its advantages in large datasets where it is infeasible due to its computational burden, approximate spectral clustering (ASC) methods apply spectral clustering on a reduced set of points (data representatives) selected by sampling or quantization. This two-step approach (i.e. finding the representatives and then clustering them) brings new opportunities for precise similarity definition such as manifold based topological relations, data distribution within the Voronoi polyhedra of the representatives, and their geodesic distance information, which are often ignored in similarity definition for ASC. In this study, we propose geodesic based hybrid similarity criteria which enable the use of different types of information for accurate similarity representation in ASC. Despite the fact that geodesic concept has been widely used in clustering, our contribution is the unique way of representing data topology to form geodesic relations and jointly harnessing various information types including topology, distance and density. The proposed criteria are tested using both sampling (selective sampling) and quantization (neural gas and k-means++) approaches. Experiments on artificial datasets, well-known small/medium-size real datasets, and four large datasets (four remote-sensing images), with different types of clusters, show that the proposed geodesic based hybrid similarity criteria outperform traditional similarity criteria in terms of clustering accuracies and several cluster validity indices.
Generalization rules for the suppressed fuzzy c-means clustering algorithm
2014, Neurocomputing
Citation Excerpt :
The first accelerated FCM algorithms [5,25] used integer computation only. Cheng et al. [8] proposed data reduction based on random sampling, leading to a fast approximative FCM clustering. Higher speed has been also reached via data reduction.
Intending to achieve an algorithm characterized by the quick convergence of hard c-means (HCM) and finer partitions of fuzzy c-means (FCM), suppressed fuzzy c-means (s-FCM) clustering was designed to augment the gap between high and low values of the fuzzy membership functions. Suppression is produced via modifying the FCM iteration by creating a competition among clusters: for each input vector, lower degrees of membership are proportionally reduced, being multiplied by a previously set constant suppression rate, while the largest fuzzy membership grows to maintain the probabilistic constraint. Even though so far it was not treated as an optimal algorithm, it was employed in a series of applications, and reported to be accurate and efficient in various clustering problems. In this paper we introduce some generalized formulations of the suppression rule, leading to an infinite number of new clustering algorithms. Further on, we identify the close relation between s-FCM clustering models and the so-called FCM algorithm with generalized improved partition (GIFP-FCM). Finally we reveal the constraints under which the generalized s-FCM clustering models minimize the objective function of GIFP-FCM, allowing us to call our suppressed clustering models optimal. Based on a large amount of numerical tests performed in multidimensional environment, several generalized forms of suppression proved to give more accurate partitions than earlier solutions, needing significantly less iterations than the conventional FCM.
From Soft Clustering to Hard Clustering: A Collaborative Annealing Fuzzy c-Means Algorithm
2024, IEEE Transactions on Fuzzy Systems
Micro-segmentation of retinal image lesions in diabetic retinopathy using energy-based fuzzy C-Means clustering (EFM-FCM)
2024, Microscopy Research and Technique

View all citing articles on Scopus

View full text

Short communicationFast fuzzy clustering

Abstract

Pattern Recognition with Fuzzy Objective Function Algorithm

Review of MR image segmentation techniques using pattern recognition

Med. Phys.

Efficient implementation of the fuzzy c-means clustering algorithms

IEEE Trans. Pattern Anal. Machine Intelligence

Efficient implementation of the fuzzy c-means clustering algorithms

IEEE Trans. Pattern Anal. Machine Intelligence

Fast clustering with application to fuzzy rule generation

Short communication
Fast fuzzy clustering