Elsevier

Neurocomputing

Volume 97, 15 November 2012, Pages 241-250
Neurocomputing

A new approach for data clustering using hybrid artificial bee colony algorithm

https://doi.org/10.1016/j.neucom.2012.04.025Get rights and content

Abstract

Data clustering is a popular data analysis technique needed in many fields. Recent years, some swarm intelligence-based approaches for clustering were proposed and achieved encouraging results. This paper presents a Hybrid Artificial Bee Colony (HABC) algorithm for data clustering. The incentive mechanism of HABC is enhancing the information exchange (social learning) between bees by introducing the crossover operator of Genetic Algorithm (GA) to ABC. With a test on ten benchmark functions, the proposed HABC algorithm is proved to have significant improvement over canonical ABC and several other comparison algorithms. The HABC algorithm is then employed for data clustering. Six real datasets selected from the UCI machine learning repository are used. The results show that the HABC algorithm achieved better results than other algorithms and is a competitive approach for data clustering.

Introduction

Swarm Intelligence (SI) is an innovative artificial intelligence technique inspired by intelligent behaviors of insect or animal groups in nature, such as ant colonies, bird flocks, bee colonies, bacterial swarms, and so on. In recent years, many SI algorithms have been proposed, such as Ant Colony Optimization (ACO) [1], Particle Swarm Optimization (PSO) [2], Immune Algorithm (IA) [3], Bacterial Foraging Optimization (BFO) [4]. Artificial Bee Colony (ABC) algorithm is a novel swarm intelligent algorithm inspired by the foraging behaviors of honeybee colony. It was first introduced by Karaboga in 2005 [5]. Since the ABC is simple in concept, easy to implement, and has fewer control parameters, it has attracted the attention of researchers and been widely used in solving many numerical optimization [6], [7] and engineering optimization problems [8], [9], [10].

However, the convergence speed of ABC algorithm will decrease as the dimension of the problem increases [6]. This is easy to explain: in ABC algorithm, bees exchange information on one dimension with a random neighbor in each food source searching process. When dimension increases, the information exchange is limited and its effect is weakened. In this paper, a Hybrid Artificial Bee Colony (HABC) algorithm is proposed to improve the optimization ability of canonical ABC. In HABC, the crossover operator of Genetic Algorithm (GA) is introduced to enhance the information exchange between bees. A large set of benchmark functions are used to test the performance of HABC algorithm compared with several other algorithms. The results show that the HABC algorithm outperforms the other algorithms in terms of accuracy, robustness, and convergence speed obviously.

Clustering is a widely encountered problem that must often need to be solved as a part of complicated tasks in data mining [11], pattern recognition [12], image analysis [13] and other fields of science and engineering. The aim of data clustering is to partition a set of data into several clusters according to some predefined attributes, under which the data in the same cluster are much similar with each other and data in different clusters are dissimilar. The existing clustering algorithms can be simply classified into two categories: hierarchical clustering and partitional clustering [14]. The goal of hierarchical clustering is partitioning the objects into successively fewer structures while the partitional clustering is dividing the objects into a predefined number of clusters according some optimization criterions. In this paper, we focus on partitional clustering and the hierarchical clustering will not be mentioned in detail. The most popular algorithms for partitional clustering are the center-based clustering algorithms. Among them, K-means algorithm is a typical one. Due to its simplicity and efficiency, K-means algorithm has been widely used in past years. However, it has its shortcomings: the algorithm is sensitive to its initial cluster centers and is easily trapped in local minima. In order to overcome these problems, many heuristic clustering algorithms have been introduced. For example, Krishna and Murty proposed a novel approach called genetic K-means algorithm for clustering analysis. In the algorithm, a specific distance-based mutation based on the mutation operator of GA was used [15]. Selim and Al-Sultan proposed a simulated annealing approach for solving the clustering problem [16].

Over the last decade, as the swarm intelligence optimization technology attracts many researchers' attention, different swarm intelligence-based clustering approaches were proposed. Shelokar introduced an evolutionary algorithm based on ACO algorithm for clustering problem [17], Merwe et al. used PSO algorithm to solve the clustering problem [18], [19] Karaboga and Ozturk, and Zhang et al. used the ABC algorithm to solve the problem [20], [21]. Zou et al. proposed a Cooperative Article Bee Colony (CABC) algorithm to solve the clustering problem [22], in which the Cooperative search technique was introduced. In this paper, according to excellent performance of HABC algorithm on benchmark functions, it is employed for data clustering. The algorithm is tested on six well-kwon real datasets provided from the UCI database [23]. Several other mentioned algorithms are tested as a comparison. The test shows that the proposed HABC algorithm achieved better results than the other algorithms on most datasets.

The rest of the paper is organized as follows. In Section 2, we will introduce the canonical ABC algorithm. Section 3 will discuss how crossover operator is used in ABC. Details of the HABC algorithm will be presented in this section. In Section 4, the HABC algorithm is tested on a set of benchmark functions compared with several other algorithms. Results are presented and discussed. Section 5 introduces the data clustering problem and how K-means algorithm and HABC algorithm are used for clustering. Test of algorithms including HABC on real datasets clustering are given and discussed in Section 6. Finally, conclusions are drawn in Section 7.

Section snippets

Artificial bee colony algorithm

Artificial Bee Colony algorithm is a recently proposed swarm intelligence algorithm inspired by the foraging behaviors of bee colonies. It was first proposed by Karaboga [5] and then further developed by Karaboga, Basturk and Akay et al. [6], [7], [24], [25]. In ABC algorithm, the search space is simulated as the foraging environment and each point in the search space corresponds to a food source (solution) that the artificial bees could exploit. The nectar amount of a food source represents

Hybrid artificial bee colony algorithm

The social learning is the most important factor in the formation of the collective knowledge of swarm intelligence. In ABC algorithm, this is realized mainly through the employed bees and onlooker bees' neighbor searching procedure. However, as it has motioned above, in canonical ABC algorithm, the new food source is produced by changing value on its randomly chosen dimension learning from a randomly chosen bee. It means that information on only one bee and its one dimension is exchanged in

Experiment

The proposed HABC algorithm will be tested on a set of benchmark functions. Five other algorithms are used as a comparison. They are canonical ABC, PSO, GA, CABC by Zou et al. [22] and Cooperative Particle Swarm Optimization (CPSO) by van den Bergh and Engelbrecht [30], three classic original algorithms and two variations. CABC algorithm is a well performed algorithm proposed recently. In the algorithm, cooperative search strategy is introduced. A virtual super best solution gb is recorded and

Data clustering

As it has mentioned above, in this paper, we mainly focus on partitional clustering. In a partitional clustering problem, we need to divide a set of n objects into k clusters. Let O (o1, o2, …, on) be the set of n objects. Each object has p characters and each character is quantified with a real-value. Let Xn×p be the character data matrix. It has n rows and p columns. Each rows presents a data and xi,j corresponding the jth feature of the ith data (i=1, 2, …, n, j=1, 2, …, p).

Let C= (C1, C2,…,

Datasets and parameters setting

To evaluate the performance of HABC algorithm for data clustering, we compared it with ABC, CABC, PSO, CPSO, GA and the classic K-means algorithm on six real datasets selected from the UCI machine learning repository [23]. The datasets are as followed. N is the number of data records. P is the number of characters for each record. K is the number of clusters to be divided to.

Iris data (N =150, P=4, K=3): this dataset is with 150 random samples of flowers from the iris species setosa,

Conclusion

This paper presents a Hybrid Artificial Bee Colony (HABC) algorithm, in which the crossover operator of GA is introduced in to improve the original ABC algorithm. With the new operator, information is exchanged fully between bees and the good individuals are utilized. In the early stage of the algorithm, the searching ability of the algorithm is enhanced, and at the end of the algorithm, as the difference between individuals' decreases, the perturbation of crossover operator decreases and can

Acknowledgment

This work is supported by the National Natural Science Foundation of China (Grant no. 61174164, 61003208, 61105067). And the authors are very grateful to the anonymous reviewers for their valuable suggestions and comments to improve the quality of this paper.

Xiaohui Yan received his B.S. degree in Industry Engineering from Huazhong University of Science and Technology, Wuhan, China, in 2007. He is currently pursuing the Ph.D. degree at Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China. His current research interests include swarm intelligence, bioinformatics and computational biology, neural networks, and the application of the intelligent optimization methods on data mining and scheduling.

References (39)

  • K. Polat et al.

    Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism

    Expert Syst. Appl.

    (2007)
  • M. Dorigo et al.

    Ant colony system: a cooperating learning approach to the travelling salesman problem

    IEEE Trans. Evol. Comput.

    (1997)
  • J. Kennedy, R.C. Eberhart, Particle swarm optimization, In: Proceedings of the 1995 IEEE International Conference on...
  • K.M. Passino

    Biomimicry of bacterial foraging for distributed optimization and control

    IEEE Control Syst. Mag.

    (2002)
  • D. Karaboga

    An idea based on honey bee swarm for numerical optimization, technical report-TR06, Erciyes University, Engineering Faculty

    Comput. Eng. Dep.

    (2005)
  • D. Karaboga et al.

    A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm

    J. Global Optim.

    (2007)
  • D. Karaboga et al.

    Artificial bee colony (ABC) optimization algorithm for solving constrained optimization problems

    Lec. Notes Comput. Sci

    (2007)
  • D. Karaboga et al.

    Artificial bee colony (ABC) optimization algorithm for training feed-forward neural networks

    Modeling Decisions for Artif. Intell.

    (2007)
  • A. Baykasoglu et al.

    Artificial bee colony algorithm and its application to generalized assignment problem

    Swarm Intelligence: Focus on Ant and Particle Swarm Optimization

    (2007)
  • Cited by (145)

    • A new standard error based artificial bee colony algorithm and its applications in feature selection

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      One of the powerful solutions to these problems is to reduce the dimension of the dataset by removing those irrelevant features before any further process. Feature selection is capable of finding relevant features and patterns in large datasets and has been widely used in several applications, such as medical data processing (Cong et al., 2016; Yan et al., 2012), text recognition (Shima et al., 2004; Tutkan et al., 2016) and computer-aided diagnosis (CAD) (Cheng et al., 2003). After a feature selection process, the datasets have lower calculation complexity, while preserving their inclusive characteristics.

    • Opposition learning based Harris hawks optimizer for data clustering

      2023, Journal of Ambient Intelligence and Humanized Computing
    • Enhancement of Kernel Clustering Based on Pigeon Optimization Algorithm

      2023, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
    View all citing articles on Scopus

    Xiaohui Yan received his B.S. degree in Industry Engineering from Huazhong University of Science and Technology, Wuhan, China, in 2007. He is currently pursuing the Ph.D. degree at Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China. His current research interests include swarm intelligence, bioinformatics and computational biology, neural networks, and the application of the intelligent optimization methods on data mining and scheduling.

    Yunlong Zhu is the Director of the Key Laboratory of Industrial Informatics, Shenyang Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. in 2005 from the Chinese Academy of Sciences, China. He has research interests in various aspects of Enterprise Information Management but he has ongoing interests in artificial intelligence, data mining, complex systems and related areas. Prof. Zhu's research has led to a dozen professional publications in these areas.

    Wenping Zou earned his B.S. degree in Computer Sciences and Technology from Shenyang University of Technology in Shenyang, Liaoning, China, in 2006. He is now pursuing his Ph.D. in Shenyang Institute of Automation of the Chinese Academy of Sciences. His current research interests include swarm intelligence, bioinformatics and computational biology, with an emphasis on evolutionary and other stochastic optimization methods.

    Wang Liang obtained his M.S. degree in automatic control from Northeast University, Shenyang, China, in 2009. He is currently pursuing the Ph.D. degree at Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China. His current research interests include data mining, social computing and decision support systems.

    View full text