A novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithm
Introduction
Clustering as a popular technique in data analysis and data mining will partition a set of unlabeled data into several groups (or clusters), so that data in one cluster have the most similarity with each other and are different with data in other clusters as much as possible (Armano and Farmani, 2016, Panagiotakis, 2015). Cluster analysis has a wide application in many areas, including pattern recognition (Kalhori and Zarandi, 2015, Liu et al., 2016), image processing (Rodriguez and Laio, 2014, Saha et al., 2016, Thong, 2015), web mining (Forsati et al., 2015, Huang et al., 2014), compression (Hejrati, Fathi, & Abdali-Mohammadi, 2017), and information retrieval (Bordogna and Pasi, 2012, Chifu et al., 2015). So far, different clustering methods have been proposed and a useful review of these methods can be found in (Hruschka et al., 2009, Saxena et al., 2017, Xu and Wunsch, 2005). The existing algorithms are usually divided into two groups: hierarchical clustering and partitional clustering. In hierarchical clustering, data are arranged in a hierarchical tree structure based on the similarity between data points. In these methods, when a data point is assigned to a cluster at the initial steps of clustering, it cannot be assigned to another cluster. Therefore, the formation of clusters is static. Moreover, the total shape and size of clusters are ignored. On the other hand, partitional clustering attempts to directly analyze the dataset inside a set of separate clusters, so that intra-cluster and inter-cluster dissimilarity become small and large, respectively. Partitional clustering algorithms suppose that the number of clusters in a dataset must be predetermined. While in many real-world problems, information on the number of clusters is not predetermined. Therefore, under such condition, automatic determination of appropriate number of clusters and provision of a proper partition for dataset are challenging in this area. Automatic clustering is a promising solution for this challenge which automatically determines the number and structure of clusters in a dataset (Kuo, Huang, Lin, Wu, & Zulvia, 2014). Implementation of automatic clustering in a dataset is difficult due to large dimensionality and massive volume; especially when clusters are very different in terms of shape, size, and density. It is also hard when there is an overlap between groups (José-García & Gómez-Flores, 2016). In recent years, many efforts have been made to develop automatic clustering methods. Three comprehensive reviews of these methods can be found in (Hancer and Karaboga, 2017, José-García and Gómez-Flores, 2016, Mirkin, 2011). In a study conducted by Hancer and Karaboga (2017), automatic clustering methods were divided into three groups: traditional, merge-split based, and evolutionary computation (EC) based approaches. In traditional approaches, a cluster validity index is used and a traditional clustering algorithm is implemented for all the possible numbers of clusters successively in order to find a clustering with the best validity measure. Nevertheless, it is boring and computationally expensive. Also, many validity indices only work well when their assumptions about the structure of cluster are true (Tan, Ting, & Teng, 2011). Merge-split based approaches will merge and split clusters in a dataset based on the predetermined criteria. EC-based approaches use evolutionary algorithms for automatic clustering. In this method, the clustering is considered as an optimization problem, minimizing the dissimilarity within cluster and maximizing the dissimilarity between clusters (Kuo et al., 2014). EC-based approaches are better than traditional approaches and merge-split based approaches; because they obtain the correct numbers of clusters and high-quality clustering. In other words, these methods have the ability to find better solutions due to their global search mechanisms. As a result, these methods are more robust than two other methods (Hancer & Karaboga, 2017). In recent years, EC-based approaches such as genetic algorithm (GA) (Tseng & Yang, 2001), particle swarm optimization (PSO) (Omran, Salman, & Engelbrecht, 2006), differential evolution (DE) (Das & Konar, 2009), bee colony optimization (BCO) (Kuo et al., 2014), and improved versions of some of these algorithms (Ali, 2016, Das et al., 2008a, Ling et al., 2016, Ozturk et al., 2015, Sheng et al., 2016) have been used for the automatic clustering.
Imperialist competitive algorithm (ICA) is an evolutionary optimization algorithm which simulates the social and political behavior of imperialist countries in an attempt to dominate weaker countries. This algorithm was proposed by Atashpaz-Gargari and Lucas in 2007. In recent years, ICA and its improved types obtained successful results in solving practical and numerical optimization methods (Ardeh et al., 2017, Niknam et al., 2011, Xu et al., 2017, Aliniya and Keyvanpour, 2018b, Aliniya and Keyvanpour, 2018a). The most important advantage of ICA compared to other evolutionary optimization methods is its high convergence rate. Thus, this algorithm obtained significant results in a shorter time (Xu et al., 2017). An improved type of ICA called hybrid K-MICA was presented by Niknam et al. in 2011 for solving clustering problems. Experiments showed that hybrid K-MICA could be considered as an efficient metaheuristic method for finding the optimal or suboptimal solutions in solving clustering problems. This algorithm is competitable with other evolutionary methods such as PSO, GA, and ACO in terms of the quality of obtained solution and it has superiority over them in terms of the convergence rate. However, in this algorithm, the number of clusters must be predetermined.
According to the aforementioned issues, in this paper, a new algorithm called automatic clustering using ICA (AC-ICA) is proposed for finding the optimal number of clusters. To the best of our knowledge, this is the first application of ICA algorithm in solving automatic clustering problems. AC-ICA can simultaneously find the number of clusters and the corresponding partitions. In the proposed algorithm, by changing the movement of colonies toward the imperialist in assimilation step, the ability to explore solutions space is appropriately reinforced. Furthermore, a new method for changing the number of centers in solutions is proposed and an efficient method is also introduced for reinitializing the empty cluster center. Also, to use AC-ICA in automatic clustering, the initialization and imperialist competition steps were changed. Based on changes in these two steps, a framework is proposed for changing different ICA types in order to efficiently solve automatic clustering problems. With the help of this framework, the basic ICA and its three recently developed types including EXPLICA (Ardeh et al., 2017), Hybrid K-MICA (Niknam et al., 2011), and IICA-G (Xu et al., 2017) have been changed and their performance in automatic clustering have been compared with AC-ICA. In this paper, Taguchi designing approach (Al Khaled & Hosseini, 2015) has been used to calibrate parameters in the proposed algorithm. The comparison of results obtained from AC-ICA with basic ICA, its three recently developed types and several state-of-art automatic clustering methods, shows the success of the proposed algorithm. At the end, the AC-ICA was applied to a real application (i.e., face recognition), and achieved acceptable results. The rest of the paper is organized as follows:
In the next section, automatic clustering techniques are briefly reviewed. In Section 3, automatic clustering approaches was compared. In Section 4, the basic ICA is provided. Motivation and mathematic foundation for the proposed algorithm are provided in Section 5. Synthetic and Real-world datasets, experimental setups, and experimental results are reported in Section 6. The results of AC-ICA on face recognition area were also provided in Section 6. Finally, in Section 7, the paper is concluded by providing the results and statements for future researches.
Section snippets
Related work
As already mentioned, compared to the traditional and merge-split based approaches, EC-based approaches generally perform well in obtaining the correct number of clusters and the high clustering quality. In this section, some successful EC-based methods are reviewed.
In the late 1990s, the automatic clustering problem gave rise to a new era in cluster analysis with the application of nature-inspired metaheuristics. The earliest attempt at automatic clustering based on GA was done by Tseng and
Comparison of automatic clustering approaches
As already mentioned, in a study conducted by Hancer and Karaboga (2017), automatic clustering methods were divided into three groups: traditional, merge-split based, and evolutionary computation (EC) based approaches. Fig. 1 shows the overall structure of these three approaches. In traditional approaches, first, an internal validity index is chosen and a range of clusters [Kmin,Kmax] is defined. Then, the clustering algorithm with all values of number of clusters is run. The index value of
The basic ICA
Since the proposed method in this paper is based on ICA, in this section, the steps of ICA algorithm are briefly discussed. ICA is a population-based random search algorithm (Atashpaz-Gargari & Lucas, 2007). This algorithm has 7 steps as follows:
Step 1: Initialization of the empires
ICA is randomly initialized by a set of Npop produced solutions. Each solution is a 1 × D array which is called “the country”. In Eq. (1), xi is the ith parameters of the solution and D is the number of parameters
The proposed method
In this section, our motivation and then the proposed algorithm will be explained for the automatic clustering. The goal of the proposed algorithm is to find a corrected number of clusters as well as high-quality clustering for an automated clustering problem.
Experimental results
In this section, the performance of the proposed AC-ICA is compared with the basic ICA and three improved types as well as other automatic clustering methods. Section 6.1 explains the synthetic and real-world datasets. Section 6.2 shows the application of Taguchi method for adjusting parameters. Section 6.3 briefly reviews the criteria used for evaluating the results of algorithms. In Section 6.4, the performance of AC-ICA on synthetic datasets is evaluated. In Section 6.5, the performance of
Conclusion and future works
In this paper, an automatic clustering algorithm based on ICA is proposed which is called AC-ICA. For this purpose, a new method for changing the number of centers in solutions is proposed during the evolution and an efficient method is also introduced for reinitializing the empty cluster center. In the proposed algorithm, by changing the movement of colonies toward the imperialist in assimilation step, the ability to explore solutions space and maintain diversity among all solutions with
Author contribution statement
Zahra Aliniya: Formal analysis; Investigation; Methodology; Software; Validation; Visualization; Roles/Writing - original draft. Seyed Abolghasem Mirroshandel: Conceptualization; Methodology; Supervision; Validation; Writing - review & editing.
References (70)
- et al.
EXPLICA: An Explorative Imperialist Competitive Algorithm based on the notion of Explorers with an expansive retention policy
Applied Soft Computing
(2017) - et al.
Multiobjective clustering analysis using particle swarm optimization
Expert Systems with Applications
(2016) - et al.
Genetic clustering for automatic evolution of clusters and application to image classification
Pattern Recognition
(2002) - et al.
A quality driven hierarchical data divisive soft clustering for information retrieval
Knowledge-based Systems
(2012) - et al.
Natural neighbor-based clustering algorithm with local representatives
Knowledge-based Systems
(2017) - et al.
Word sense discrimination in information retrieval: A spectral clustering-based approach
Information Processing & Management
(2015) - et al.
Automatic kernel clustering with a multi-elitist particle swarm optimization algorithm
Pattern Recognition Letters
(2008) - et al.
Automatic image pixel clustering with an improved differential evolution
Applied Soft Computing
(2009) - et al.
Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm
Information Sciences
(2010) - et al.
Face recognition using histograms of oriented gradients
Pattern Recognition Letters
(2011)
A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number
Swarm and Evolutionary Computation
Black hole: A new heuristic optimization approach for data clustering
Information Sciences
A combined approach for clustering based on K-means and gravitational search algorithms
Swarm and Evolutionary Computation
Efficient lossless multi-channel EEG compression based on channel clustering
Biomedical Signal Processing and Control
Automatic multi-objective clustering based on game theory
Expert Systems with Applications
A survey on the imperialist competitive algorithm metaheuristic: Implementation in engineering domain and directions for future research
Applied Soft Computing
Automatic clustering using nature-inspired metaheuristics: A survey
Applied Soft Computing
Automatic cluster evolution using gravitational search algorithm and its application on image segmentation
Engineering Applications of Artificial Intelligence
Automatic kernel clustering with bee colony optimization algorithm
Information Sciences
Integration of particle swarm optimization and genetic algorithm for dynamic clustering
Information Sciences
How many clusters? A robust PSO-based local density model
Neurocomputing
Analyzing documents with Quantum Clustering: A novel pattern recognition algorithm based on quantum mechanics
Pattern Recognition Letters
Gene transposon based clone selection algorithm for automatic clustering
Information Sciences
An efficient hybrid algorithm based on modified imperialist competitive algorithm and K-means for data clustering
Engineering Applications of Artificial Intelligence
A novel binary artificial bee colony algorithm based on genetic operators
Information Sciences
Validity index for crisp and fuzzy clusters
Pattern recognition
An automatic clustering algorithm inspired by membrane computing
Pattern Recognition Letters
Brain image segmentation using semi-supervised clustering
Expert Systems with Applications
A review of clustering techniques and developments
Neurocomputing
HIFCF: An effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis
Expert Systems with Applications
A genetic approach to the automatic clustering problem
Pattern Recognition
Density core-based clustering algorithm with dynamic scanning radius
Knowledge-based Systems
Fuzzy adaptive imperialist competitive algorithm for global optimization
Neural Computing and Applications
Unsupervised clustering based an adaptive particle swarm optimization algorithm
Neural Processing Letters
Solving constrained optimization problems using the improved imperialist competitive algorithm and Deb's technique
Journal of Experimental & Theoretical Artificial Intelligence (TETA)
Cited by (39)
A novel combinational response mechanism for dynamic multi-objective optimization
2023, Expert Systems with ApplicationsA new approach for optimal chiller loading using an improved imperialist competitive algorithm
2023, Energy and BuildingsAn improved imperialist competition algorithm with adaptive differential mutation assimilation strategy for function optimization
2023, Expert Systems with Applications