Multiobjective clustering analysis using particle swarm optimization
Introduction
It is well known that huge amounts of data are currently being stored and collected in databases, and that this quantity continues to grow rapidly. Valuable information, still hidden in data, should be revealed to improve the decision-making process in organizations. Data mining consists of all methodologies that apply data analysis techniques to discover previously-unknown valid patterns and relationships in large datasets. These methods include a number of technical approaches, such as classification, data summarization, dependency network finding, regression, anomaly detection, and clustering (Han & Kamber, 2000). As for clustering, it is the process of partitioning data into groups with the desired properties that data in each group should be similar, while data from different groups should be dissimilar. Different areas, such as data mining, machine learning, biology, and statistics, include the roots of data clustering (Cheng, Yang, Cao, 2013, Kao, Zahara, Kao, 2008, Leung, Zhang, Xu, 2000, Nguyen, Cios, 2008, Qiu, Xu, Gao, Li, Chi, 2016, Saha, Alok, Ekbal, 2016, Sahoo, Zuo, Tiwari, 2012, Thong, et al., 2015).
Generally speaking, hierarchical and partitional clustering encompass most of the existing clustering methods. Hierarchical clustering results in a tree in which each internal node embodies other nodes (i.e., its children), until leaves are encountered (Leung et al., 2000). Hierarchical clustering algorithms do not need to know in advance the number of clusters and are independent from the initial conditions. On the other hand, they are typically “greedy”, meaning that objects that belong to a cluster cannot be reassigned to other clusters in the clustering process. Moreover, due to lack of information about the global shape or size of the clusters, these algorithms may not be able to separate overlapping clusters (Jain, Murty, & Flynn, 1999). Also partitional clustering typically decomposes a dataset into a set of disjoint clusters. Many partitional clustering algorithms try to minimize some measure of dissimilarity for objects that belong to the same cluster while maximizing the dissimilarity for objects that belong to different clusters. Summarizing, the main drawbacks of hierarchical algorithms usually become advantages for partitional algorithms, and vice versa (Frigui & Krishnapuram, 1999).
Swarm intelligence (SI) is an innovative subcategory of artificial intelligence, inspired by the intelligent behavior of insect or animal groups in nature, including ant colonies, bird flocks, fish schools, bee colonies, and bacterial swarms (Kennedy & Eberhart, 2001). In recent years, SI methods like swarm-based clustering algorithms have been successfully used to deal with clustering problems (Abraham, Das, Roy, 2008, Bharne, Gulhane, Yewale, 2011, Das, Abraham, Konar, 2008, Grosan, Abraham, Chis, 2006, Jiang, Li, Yi, Wang, Hu, 2011, Omran, Salman, Engelbrecht, 2006). For this reason, the research community has recently given them special attention, mainly due to the fact that swarm-based approaches are particularly suited to perform exploratory analysis and also because many issues are still open in this field (Abraham et al., 2008).
In this paper, we confine ourselves to the application of particle swarm optimization (PSO) to clustering. Similar to other SI methods, PSO is inspired by a phenomenon that occurs in nature –i.e., the social behavior of bird flocking or fish schooling (Poli, Kennedy, & Blackwell, 2007). Two PSO-based clustering methods are reported in Rana, Jasola, and Kumar (2011): the first method is used to find the centroids for a user-specified number of clusters and the second method is aimed at extending PSO with K-means (used to seed the initial swarm). It is shown that the latter algorithm has better convergence, compared to the classical version of K-means. Yang et al. propose a hybrid clustering algorithm based on PSO and K-harmonic (KHM) means (PSOKHM) (Yang, Sun, & Zhang, 2009). They show that the PSOKHM algorithm increases the convergence speed of PSO, is capable of escaping from local optima, and has better performance than PSO and KHM clustering on seven datasets. A multiobjective PSO and simulated annealing clustering algorithm (MOPSOSA) is proposed in Abubaker, Baharum, and Alrefaei (2015). This method simultaneously optimizes three different objective functions, which are used as cluster validity indexes for finding the proper number of clusters (and the clusters) according to the given dataset. Euclidean distance, point symmetry and short distances are considered validity indexes in MOPSOSA. The method obtains more promising results in comparison with other conventional clustering algorithms. Several other PSO-based clustering algorithms have been proposed in the literature (for a comprehensive review about PSO-based clustering the interested reader may consult (Cura, 2012, Izakian, Abraham, 2011, Kalyani, Swarup, 2011, Sarkar, Roy, Purkayastha, 2013, Tsai, Kao, 2011)). However, they mostly consider a single function as the objective of the clustering problem and, to the best of our knowledge, all recent works on multiobjective clustering do not apply the concept of Pareto optimal solutions (Kasprzak & Lewis, 2001).
In this paper, a multiobjective clustering particle swarm optimization (MCPSO, hereinafter) framework is proposed, which obtains well-separated, connected, and compact clusters, regardless from the expected optimal number of clusters and their characteristics. MCPSO is also able to automatically determine the optimal number of clusters. To achieve these goals, two conflicting objective functions are defined, based on the concepts of connectivity and cohesion, and MCPSO uses them to find a set of non-dominated clustering solutions, called Pareto front. A simple decision maker is then used to select the best solution among Pareto solutions. A comparison of the MCPSO performance against those obtained using four state-of-the-art clustering algorithms has also been made. As selected datasets are in fact labeled, we have been able to measure the average “accuracy” on clusters, assuming that each cluster actually accounts for a unique label. The accuracy measured on the results of clustering, together with the required computational time, are used as performance metrics in the comparative analysis.
The rest of this paper is organized as follows. In Section 2, swarm intelligence and multiobjective optimization are defined. The proposed MCPSO algorithm and the clustering objective functions are described in detail in Section 3. A comprehensive set of experimental results are provided in Section 4. Section 5 reports conclusions.
Section snippets
Multiobjective optimization and swarm intelligence
In the area of metaheuristics, swarm intelligence (SI) belongs to the group of approaches that apply the self-organized and decentralized characteristics of natural or artificial phenomena to deal with complex optimization problems. In particular, the behavior of natural individuals who relate to each other and to their environment plays a significant role in designing SI algorithms. Many of these algorithms have been introduced in recent years and have been successfully applied to different
Multiobjective clustering with particle swarm optimization
In this section, we describe the MCPSO method. As already pointed out, it is based on the particle swarm optimization algorithm (Kennedy & Eberhart, 2001), in a multiobjective setting. MCPSO consists of two main phases: optimization and decision making. Two conflicting objective functions are defined, based on connectivity and cohesion with the aim of obtaining well-separated, compact, and connected clusters. The optimization phase results in a set of optimal solutions for the given clustering
Experimental results and discussion
In this section, we empirically evaluate the performance of MCPSO. After performing a set of experiments aimed at finding a preliminary setting for the MCPSO parameters (by using some pilot datasets), the performance of the proposed algorithm has been compared with other clustering algorithms over a set of known benchmarks. MCPSO has been implemented in Python 2.7.6 on a Intel Core i7, with 2.4 GHz, 8 GB RAM in an Ubuntu 14.04 environment.
Conclusions
Clustering is one of the key tasks of exploratory data mining and the subject of active research in several research fields, including finance, information retrieval, network management, biology, and medicine. These fields need accurate grouping of huge datasets which may come with a variety of features and/or data characteristics. Swarm intelligence (SI) is a relatively new interdisciplinary field of research, which has gained huge popularity in the data mining area. SI methodologies, such as
References (60)
- et al.
Clustering using pk-d: a connectivity and density dissimilarity
Expert Systems with Applications
(2016) A new clustering algorithm based on near neighbor influence
Expert Systems with Applications
(2015)- et al.
Dynamic genetic algorithms for the dynamic load balanced clustering problem in mobile ad hoc networks
Expert Systems with Applications
(2013) - et al.
Chaotic particle swarm optimization for data clustering
Expert systems with Applications
(2011) A particle swarm optimization approach to clustering
Expert Systems with Applications
(2012)- et al.
An adaptive neighbourhood construction algorithm based on density and connectivity
Pattern Recognition Letters
(2015) - et al.
Fuzzy c-means and fuzzy swarm for fuzzy clustering problem
Expert Systems with Applications
(2011) - et al.
Combinatorial particle swarm optimization (CPSO) for partitional clustering problem
Applied Mathematics and Computation
(2007) - et al.
A new hybrid method based on partitioning-based DBscan and ant clustering
Expert Systems with Applications
(2011) - et al.
Particle swarm optimization based k-means clustering approach for security assessment in power systems
Expert systems with applications
(2011)
A hybridized approach to data clustering
Expert Systems with Applications
Gakrem: a novel hybrid clustering algorithm
Information Sciences
Artificial bee colony (abc) for multi-objective design optimization of composite structures
Applied Soft Computing
Multi-stage design space reduction and metamodeling optimization method based on self-organizing maps and fuzzy clustering
Expert Systems with Applications
Application of a fuzzy feasibility Bayesian probabilistic estimation of supply chain backorder aging, unfilled backorders, and customer wait time using stochastic simulation with Markov blankets
Expert Systems with Applications
Brain image segmentation using semi-supervised clustering
Expert Systems with Applications
A data clustering algorithm for stratified data partitioning in artificial neural network
Expert Systems with Applications
Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization
Expert Systems with Applications
HIFCF: An effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis
Expert Systems With Applications
Particle swarm optimization with selective particle regeneration for data clustering
Expert Systems with Applications
An efficient hybrid data clustering method based on k-harmonic means and particle swarm optimization
Expert Systems with Applications
Swarm intelligence algorithms for data clustering
Proceedings of the soft computing for knowledge discovery and data mining
Automatic clustering using multi-objective particle swarm and simulated annealing
PlOS One
Multiple objective ant colony optimisation
Swarm Intelligence
UCI machine learning repository
Data clustering algorithms based on swarm intelligence
Proceedings of the 3rd international conference on electronics computer technology (ICECT)
A new cluster validity measure and its application to image compression
Pattern Analysis and Applications
Solving multiobjective optimization problems using an artificial immune system
Genetic Programming and Evolvable Machines
Mopso: A proposal for multiple objective particle swarm optimization
Proceedings of the congress on evolutionary computation, CEC’02.
Automatic clustering using an improved differential evolution algorithm
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
Cited by (87)
Intelligent Fuzzy Controller Design: Disturbance Rejection Cases
2022, Applied Soft ComputingApplication of the novel harmony search optimization algorithm for DBSCAN clustering
2021, Expert Systems with ApplicationsA novel elastic net-based NGBMC(1,n) model with multi-objective optimization for nonlinear time series forecasting
2021, Communications in Nonlinear Science and Numerical SimulationMulti-objective evolutionary clustering with complex networks
2021, Expert Systems with ApplicationsCitation Excerpt :This section presents the simulation results of running the proposed algorithm and compares it with some clustering algorithms. The CMMOEC’s results are compared to K-means (Jain Anil, 2008), single-linkage (Voorhees, 1985), DBSCAN (Ester et al., 1996), NC-closures (Inkaya & Özdemirel, 2013), and MCPSO (Armano & Farmani, 2016). K-means belongs to partitional clustering, single-linkage is a hierarchical clustering algorithm, DBSCAN is a clustering technique based on density, and NC-closures is the method of using the concept of neighborhood construction to create clusters by merging the obtained closures.
Multiobjective multiple features fusion: A case study in image segmentation
2021, Swarm and Evolutionary ComputationA novel Whale Optimization Algorithm integrated with Nelder–Mead simplex for multi-objective optimization problems
2021, Knowledge-Based SystemsCitation Excerpt :Also, the particle swarm optimization (PSO) [32] was adopted for tackling the multi-objective optimization problems, where it used an alternative repository of particles to help the other particles in their own flight. The multi-objective particle swarm optimization (MPSO) [33] was used to perform partitional clustering. Mousa et al. [34] proposed an approach that makes use of the merits of both the genetic algorithms and the PSO algorithm in one approach.