Elsevier

Pattern Recognition Letters

Volume 31, Issue 3, 1 February 2010, Pages 226-233
Pattern Recognition Letters

A rough set approach to feature selection based on ant colony optimization

https://doi.org/10.1016/j.patrec.2009.10.013Get rights and content

Abstract

Rough set theory is one of the effective methods to feature selection, which can preserve the meaning of the features. The essence of rough set approach to feature selection is to find a subset of the original features. Since finding a minimal subset of the features is a NP-hard problem, it is necessary to investigate effective and efficient heuristic algorithms. Ant colony optimization (ACO) has been successfully applied to many difficult combinatorial problems like quadratic assignment, traveling salesman, scheduling, etc. It is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. However, ants can discover the best feature combinations as they traverse the graph. In this paper, we propose a new rough set approach to feature selection based on ACO, which adopts mutual information based feature significance as heuristic information. A novel feature selection algorithm is also given. Jensen and Shen proposed a ACO-based feature selection approach which starts from a random feature. Our approach starts from the feature core, which changes the complete graph to a smaller one. To verify the efficiency of our algorithm, experiments are carried out on some standard UCI datasets. The results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features.

Introduction

Feature selection can be viewed as one of the most fundamental problems in the field of machine learning. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features (Dash and Liu, 1997). In real world problems, feature selection is a must due to the abundance of noisy, irrelevant or misleading features (Jensen, 2005). By removing these factors, learning from data techniques can benefit greatly. As Liu pointed out in (Liu and Motoda, 1998), the motivation of feature selection in data mining and machine learning is to: reduce the dimensionality of feature space, improve the predictive accuracy of a classification algorithm, and improve the visualization and the comprehensibility of the induced concepts.

In recent years, a lot of feature selection methods have been proposed. There are two key issues in constructing a feature selection method: search strategies and evaluating measures. With respect to search strategies, complete (Somol et al., 2004), heuristic (Zhong and Dong, 2001), random (Raymer et al., 2000, Lai et al., 2006) strategies were proposed. And with respect to evaluating measures, these methods can be roughly divided into two classes: classifiers-specific (Kohavi, 1994, Guyon et al., 2002, Neumann et al., 2005, Gasca et al., 2006, Xie et al., 2006) and classifier independent (Kira and Rendell, 1992, Modrzejewski, 1993, Dash and Liu, 2003). The former employs a learning algorithm to evaluate the goodness of selected features based on the classification accuracies or contribution to the classification boundary, such as the so-called wrapper method (Kohavi, 1994) and weight based algorithms (Guyon et al., 2002, Xie et al., 2006). While the latter constructs a classifier independent measure to evaluate the significance of features, such as inter-class distance (Kira and Rendell, 1992) mutual information (Yao, 2003, Miao and Hou, 2004), dependence measure (Modrzejewski, 1993) and consistency measure (Dash and Liu, 2003).

Rough set theory (RST) was proposed by Pawlak (1982), which is a valid mathematic tool to handle imprecision, uncertainty and vagueness. As an effective method to feature selection, rough sets can preserve the meaning of the features. It has been widely applied in many fields such as machine learning (Swiniarski and Skowron, 2003), data mining (Duan et al., 2007), etc. (Mi et al., 2004). The essence of rough set approach to feature selection is to find a subset of the original features. Rough set theory provides a mathematical tool that can be used to find out all possible feature subsets. Unfortunately, the number of possible subsets is always very large when N is large because there are 2N subsets for N features. Hence examining exhaustively all subsets of features for selecting the optimal one is NP-hard. Previous methods employed an incremental hill-climbing (greedy) algorithm to select feature (Hu, 1995, Deogun et al., 1998). However, this often led to a non-minimal feature combination. Therefore, many researchers have shifted to metaheuristic, such as genetic algorithm (GA) (Wrblewski, 1995, Zhai et al., 2002), tabu search (TS) (Hedar et al., 2006) and ant colony optimization (ACO) (Dorigo and Caro, 1999, Jensen and Shen, 2003, Jensen and Shen, 2004), etc.

ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest path to food sources. Metaheuristic optimization algorithm based on ACO was introduced in the early 1990s byDorigo and Caro (1999). ACO is a branch of newly developed form of artificial intelligence called Swarm Intelligence, which studies “the emergent collective intelligence of groups of simple agents” (Bonabeau et al., 1999). ACO algorithm is inspired of ant’s social behavior. Ants have no sight and are capable of finding the shortest route between a food source and their nest by chemical materials called pheromone that they leave when moving. ACO algorithm was firstly used in solving traveling salesman problem (TSP) (Dorigo et al., 1996). Then has been successfully applied to a large number of difficult problems like the quadratic assignment problem (QAP) (Maniezzo and Colorni, 1999), routing in telecommunication networks, graph coloring problems, scheduling, feature selection, etc. ACO is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. On the other hand, if features are represented as a graph, ants can discover the best feature combinations as they traverse the graph.

Since most common methods for RST-based feature selection often led to a non-minimal feature combination. In this paper we propose a novel feature selection algorithm based on rough sets and ACO, which adopts mutual information based feature significance as heuristic information for ACO. We also introduce the concept of feature core to the algorithm, by requiring that all ants must start from the core, when they begin their search through the feature space. Therefore those features near the core will be selected by the ants more quickly. The performance of our algorithm will be compared with that of RST-based algorithms and other metaheuristic-based algorithms.

This paper is organized as follows. In Sections 2 Preliminary, 3 Ant colony optimization, we introduce some preliminaries in rough set theory and ACO. In Section 4, we propose the approach to feature selection based on rough sets and ACO. And the pseudo-code of our algorithm is also given. Experimental results are given in Sections 5 Experimental results, 6 Conclusion concludes the paper.

Section snippets

Preliminary concepts of RST

This section recalls some essential definitions from RST that are used for feature selection. Detailed description and formal definitions of the theory can be found in (Pawlak, 1982).

The notion of information table has been studied by many authors as a simple knowledge representation method. Formally, an information table is a quadruple I=(U,A,V,f), where: U is a nonempty finite set of objects, A is a nonempty finite set of features, V is the union of feature domains such that V=aAVa for Va

Ant colony optimization

In the real world, ants (initially) wander randomly, and upon finding food return to their colony while laying down pheromone trails. If other ants find such a path, they are likely not to keep traveling at random, but to instead follow the trail, returning and reinforcing it if they eventually find food. Thus, when one ant finds a good (i.e. short) path from the colony to a food source, other ants are more likely to follow that path, and positive feedback eventually leads all the ants

Feature selection based on rough sets and ant colony optimization

Following the standard ACO algorithmic scheme for combinatorial optimization problems, Jensen and Shen propose a method for feature selection based on rough sets and ACO (JSACO) (Jensen and Shen, 2003). The basic procedure of JSACO is as follows: given a colony of k artificial ants to search through the feature space, these k ants perform a number of iterations. During every iteration t, each ant starts from a random feature, then selects the best route and the pheromone is updated. The

Experimental results

In this section, we shall demonstrate the performance of our algorithm RSFSACO given in Section 4. The algorithm is tested on a personal computer running windows XP with 2.0 GHZ processor and 1 GB memory. In our experiments, we set the parameters α=1, β=0.01, ρ=0.9, q=0.1, ε=0.001, and the initial pheromone was set to 0.5 with a small random perturbation added, the number of ants was half the number of features and the maximum number of cycles equals 100. These parameters are determined based on

Conclusion

This paper discusses the shortcomings of conventional hill-climbing rough set approaches to feature selection. These techniques often fail to find optimal reducts, as no perfect heuristic can guarantee optimality. On the other hand, complete searches are not feasible for even medium sized datasets. So, ACO approaches provide a promising feature selection mechanism.

We proposed a novel feature selection technique based on rough sets and ACO. ACO has the ability to quickly converge. It has a

Acknowledgements

The research is supported by the National Natural Science Foundation of China under Grant Nos: 60775036, 60475019, and the Research Fund for the Doctoral Program of Higher Education of China under Grant No: 20060247039.

References (36)

  • M. Dorigo et al.

    The Ant system: Optimization by a colony of cooperating agents

    IEEE Trans. Syst. Man Cybernet. Part B

    (1996)
  • Q.G. Duan et al.

    Personalized Web retrieval based on rough-fuzzy method

    J. Comput. Inform. Systems

    (2007)
  • I. Guyon et al.

    Gene selection for cancer classification using support vector machines

    Machine Learn.

    (2002)
  • Hedar, A., Wang, J., Fukushima, M., 2006. Tabu search for attribute reduction in rough set theory, Technical Report...
  • Hu, X., 1995. Knowledge discovery in databases: An attribute oriented rough set approach. Ph.D. Thesis, Regina...
  • Jensen, R., 2005. Combining rough and fuzzy sets for feature selection, Ph.D. Thesis, Univ. Of...
  • Jensen, R., Shen, Q., 2003. Finding rough set reducts with ant colony optimization. In: Proceeding of 2003 UK Workshop...
  • R. Jensen et al.

    Semantics-preserve dimensionality reduction: Rough and fuzzy-rough-based approaches

    IEEE Trans. Knowledge Data Eng.

    (2004)
  • Cited by (257)

    • Fuzzy granular convolutional classifiers

      2022, Fuzzy Sets and Systems
    View all citing articles on Scopus
    View full text