Elsevier

Neurocomputing

Volume 116, 20 September 2013, Pages 317-325
Neurocomputing

Improving constrained clustering via swarm intelligence

https://doi.org/10.1016/j.neucom.2012.03.031Get rights and content

Abstract

By simulating the clustering behavior of the real-world ant colonies, we propose in this paper a constrained ant clustering algorithm. This algorithm is embedded with the heuristic walk mechanism based on random walk to deal with the constrained clustering problems given pairwise must-link and cannot-link constraints. Experimental results show that our approach is more effective on both the synthetic datasets and the real datasets compared with the Cop-Kmeans and ant-based clustering algorithm.

Introduction

Clustering is one of the most important data analysis methods in the research area of machine learning and data mining. The purpose of clustering is to gain a reasonable division over the datasets. Clustering algorithms are often regarded as unsupervised learning without using the target attribute during the learning process. However, in many real-world applications, we can obtain some prior knowledge or domain information for our problems at hand. How to utilize these limited but important constraints has become more and more important in recent days [3].

Swarm intelligence [7] is a popular research area in artificial intelligence. It refers to social insects that live in groups with a high degree of intelligence [13]. Ant colony can accomplish the tasks that a single ant cannot; just similar to what happened in the human society. Inspired by the acts such as breeding, foraging, nest construction, garbage collection, and territory defense that performed by ants and other social insects, researchers designed a series of algorithms that successfully applies to function optimization, combinatorial optimization, robotics, and other areas.

Some researchers use artificial ants to deal with the clustering problems and have made remarkable achievements. During the development of applying swarm intelligence to clustering, the earliest model proposed by Deneubourg [2], often called the basic model (BM, Basic Model), is used to explain the ants' behavior of piling bodies together to form an ant's grave. Its main idea is to pick up bodies in sparse areas, and drop it at a place where there are more of the same types of bodies. By adding the real data vectors which contain the similarity of data objects, Lumer and Faieta modified the BM model, often called LF [2], to form the de facto clustering model. He and Hui used ant-based clustering (Ant-C) [9] algorithms to analyze the gene expression data; El-Feghi et al. presented AACA [10] algorithm which takes the properties of aggregation pheromone and perception of the environment into account to improve the rate of the convergence. Mohamed Jafar Abul Hasan gave a survey about the evolution of the clustering based on swarm intelligence [11]. Han and Shi improved ant colony algorithm for fuzzy clustering in image segmentation [12]. Lutz Herrmann and Alfred Ultsch proposed an artificial life system [4] based on ESM [5], [8] to deal with the clustering problem. Recently Xu et al. presented an ant sleeping model [6] to improve performances of ant-based clustering.

In our previous work, we have suggested a new ant clustering framework RWAC (Random Walk Ant Clustering). In the traditional ant-based clustering algorithms, the ants pick up and drop down the data objects to form clusters. The main difference of RWAC from the traditional ant-based clustering algorithms lies in that each ant represents a data point which is more simple and direct. The ants randomly walk on the grid to find a place where it feels fit enough to sleep. They perceive the fitness of neighborhood to decide their action: stay to sleep or wake up to leave. In this paper, based on RWAC, we integrate a heuristic walk mechanism to accelerate the convergence speed of RWAC and this method can be easily extended to deal with semi-supervised learning when the domain knowledge is provided in the form of pairwise constraints, hence we call it CAC (Constrained Ant Clustering). CAC is a simple and effective ant-based semi-supervised clustering algorithm.

Section snippets

Constrained clustering

Cluster analysis or clustering is the assignment of a set of data points into subsets (called clusters), so that high intra-similarity and low inter-similarity can be achieved. In the real life, while doing cluster analysis, we can always get a small amount of domain information, such as labels or constraints. Constraints, in the form of two data points that must be assigned to the same cluster or different clusters, sometimes can be easily accessed. Utilizing the pairwise constraints can

Ant-based clustering models

This section briefly describes the first ant-based clustering model BM framework, outlining the principles and operations during its procession. Based on BM we make an introduction of RWAC clustering framework as an improvement of BM, which is more direct to simulate the behavior of social groups. After that we give out the main idea of the RWAC and the algorithm framework.

CAC framework

In this section, we propose a constrained ant clustering framework by integrating a heuristic walk mechanism to accelerate the convergence speed of RWAC and extend it to constrained clustering with a little effort this clustering framework can be changed to handle constrained situation. First, we give out the heuristic walk mechanism in detail. After that the whole algorithm framework is given.

Experiments

In this section, we first compare the CAC framework without any constraints with both Kmeans and RWAC. Then for each data set we randomly generate different numbers of pairwise constraints, and compare RWAC with CAC and COP-Kmeans [1], which is a state-of-the-art constrained clustering algorithm adapting kmeans to satisfy the provided constraints. Three evaluation criteria (purity, F measure and rand index) are used to evaluate the performance of the algorithms. The experiments on 7 UCI

Conclusions

In this paper, we propose a constrained ant clustering framework by embedding heuristic walk mechanism when the domain knowledge is provided in the form of pairwise constraints. The experimental results illustrate that our CAC framework outperforms RWAC and COP-Kmeans algorithms on both artificial dataset and real-world UCI datasets.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant No. 61003180, No. 61070047 and No. 61103018; Natural Science Foundation of Education Department of Jiangsu Province under contract 09KJB20013; Natural Science Foundation of Jiangsu Province under contract BK2010318, BK2011442 and BK2012128; Research Innovation Program for College graduates of Jiangsu Province (CXLX12_0917); The New Century Talent Project of Yangzhou University.

Xiaohua Xu received his Ph.D. degree in computer science from Nanjing University of Aeronautics and Astronautics of China in 2008, and M.S. degree from Yangzhou University of China in 2005. His research interests include machine learning, evolutionary computation, and parallel algorithms.

References (15)

  • M. Dorigo et al.

    Ant algorithms and stigmergy

    Future Gener. Comput. Syst.

    (2000)
  • Y. Han et al.

    An improved ant colony algorithm for fuzzy clustering in image segmentation

    Neurocomputing

    (2007)
  • K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained k-means clustering with background knowledge, in:...
  • X. Zhu, Semi-supervised Learning with Graphs, Doctoral Dissertation, Carnegie Mellon University, CMU-LTI-05-192,...
  • L. Herrmann, A. Ultsch, An artificial life approach for semi-supervised Learning. Data analysis, machine learning and...
  • A. Ultsch, L. Herrmann, Automatic Clustering with U⁎C. Technical Report, Department of Mathematics and Computer...
  • Xu Xiaohua et al.

    A novel ant clustering algorithm based on cellular automata

    Web Intell. Agent Syst.

    (2007)
There are more references available in the full text version of this article.

Cited by (16)

  • Enhancing instance-level constrained clustering through differential evolution

    2021, Applied Soft Computing
    Citation Excerpt :

    There have been attempts to solve the constrained clustering problem with nature-inspired algorithms, such as the adaptation of the Biased Random-key Genetic Algorithm (BRKGA) presented in [25]. Swarm-based methods have also been applied to constrained clustering, such as the one presented in [26]. Differential Evolution (DE) is an evolution-based algorithm that has proven to be excellent in real-domain problem solving [27].

  • Parsimonious memory unit for recurrent neural networks with application to natural language processing

    2018, Neurocomputing
    Citation Excerpt :

    These high WER are mainly due to speech disfluencies and to adverse acoustic environments (for example, calls from noisy streets with mobile phones). The categorization task of the 20-Newsgroups [38] dataset is employed to exhibit long-term dependencies. This corpus is a collection of roughly 1000 postings on 20 use net newsgroups.

  • An improved bee colony optimization algorithm with an application to document clustering

    2015, Neurocomputing
    Citation Excerpt :

    The aim of clustering is to group a set of data objects into a set of meaningful sub-classes, called clusters which could be disjoint or not. Clustering is a fundamental tool in exploratory data analysis with practical importance in a wide variety of applications such as data mining, machine learning, pattern recognition, statistical data analysis, data compression, and vector quantization [88]. The aim of clustering is to find the hidden structure underlying a given collection of data points.

View all citing articles on Scopus

Xiaohua Xu received his Ph.D. degree in computer science from Nanjing University of Aeronautics and Astronautics of China in 2008, and M.S. degree from Yangzhou University of China in 2005. His research interests include machine learning, evolutionary computation, and parallel algorithms.

Lin Lu received his B.S. degree in 2011 at Yangzhou University. He is currently pursuing the M.S. degree at Yangzhou University. His research interests include swarm intelligence and machine learning.

Ping He received her M.S. degree from Yangzhou University of China in 2008. She is currently pursuing the Ph.D. degree at Nanjing University of Aeronautics and Astronautics of China. Her research interests include machine learning, data mining and bioinformatics.

Zhoujin Pan received his B.S. degree in 2009 at Yangzhou University. He is currently pursuing the M.S. degree at Yangzhou University. His research interests include swarm intelligence and machine learning.

Ling Chen is a professor in the Computer Science Department at Yangzhou University, Yangzhou, P.R. China. He did research for two years on parallel algorithms and architectures at the University of Pittsburgh, PA, first as a visiting scholar in 1986 and, later, as a visiting associate professor in 1992. His research interests include parallel algorithm design, artificial intelligence, and bioinformatics. Professor Chen is a member of IEEE Computer Society and Chinese Computer Society.

View full text