A rough set approach to feature selection based on ant colony optimization
Introduction
Feature selection can be viewed as one of the most fundamental problems in the field of machine learning. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features (Dash and Liu, 1997). In real world problems, feature selection is a must due to the abundance of noisy, irrelevant or misleading features (Jensen, 2005). By removing these factors, learning from data techniques can benefit greatly. As Liu pointed out in (Liu and Motoda, 1998), the motivation of feature selection in data mining and machine learning is to: reduce the dimensionality of feature space, improve the predictive accuracy of a classification algorithm, and improve the visualization and the comprehensibility of the induced concepts.
In recent years, a lot of feature selection methods have been proposed. There are two key issues in constructing a feature selection method: search strategies and evaluating measures. With respect to search strategies, complete (Somol et al., 2004), heuristic (Zhong and Dong, 2001), random (Raymer et al., 2000, Lai et al., 2006) strategies were proposed. And with respect to evaluating measures, these methods can be roughly divided into two classes: classifiers-specific (Kohavi, 1994, Guyon et al., 2002, Neumann et al., 2005, Gasca et al., 2006, Xie et al., 2006) and classifier independent (Kira and Rendell, 1992, Modrzejewski, 1993, Dash and Liu, 2003). The former employs a learning algorithm to evaluate the goodness of selected features based on the classification accuracies or contribution to the classification boundary, such as the so-called wrapper method (Kohavi, 1994) and weight based algorithms (Guyon et al., 2002, Xie et al., 2006). While the latter constructs a classifier independent measure to evaluate the significance of features, such as inter-class distance (Kira and Rendell, 1992) mutual information (Yao, 2003, Miao and Hou, 2004), dependence measure (Modrzejewski, 1993) and consistency measure (Dash and Liu, 2003).
Rough set theory (RST) was proposed by Pawlak (1982), which is a valid mathematic tool to handle imprecision, uncertainty and vagueness. As an effective method to feature selection, rough sets can preserve the meaning of the features. It has been widely applied in many fields such as machine learning (Swiniarski and Skowron, 2003), data mining (Duan et al., 2007), etc. (Mi et al., 2004). The essence of rough set approach to feature selection is to find a subset of the original features. Rough set theory provides a mathematical tool that can be used to find out all possible feature subsets. Unfortunately, the number of possible subsets is always very large when is large because there are subsets for features. Hence examining exhaustively all subsets of features for selecting the optimal one is NP-hard. Previous methods employed an incremental hill-climbing (greedy) algorithm to select feature (Hu, 1995, Deogun et al., 1998). However, this often led to a non-minimal feature combination. Therefore, many researchers have shifted to metaheuristic, such as genetic algorithm (GA) (Wrblewski, 1995, Zhai et al., 2002), tabu search (TS) (Hedar et al., 2006) and ant colony optimization (ACO) (Dorigo and Caro, 1999, Jensen and Shen, 2003, Jensen and Shen, 2004), etc.
ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest path to food sources. Metaheuristic optimization algorithm based on ACO was introduced in the early 1990s byDorigo and Caro (1999). ACO is a branch of newly developed form of artificial intelligence called Swarm Intelligence, which studies “the emergent collective intelligence of groups of simple agents” (Bonabeau et al., 1999). ACO algorithm is inspired of ant’s social behavior. Ants have no sight and are capable of finding the shortest route between a food source and their nest by chemical materials called pheromone that they leave when moving. ACO algorithm was firstly used in solving traveling salesman problem (TSP) (Dorigo et al., 1996). Then has been successfully applied to a large number of difficult problems like the quadratic assignment problem (QAP) (Maniezzo and Colorni, 1999), routing in telecommunication networks, graph coloring problems, scheduling, feature selection, etc. ACO is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. On the other hand, if features are represented as a graph, ants can discover the best feature combinations as they traverse the graph.
Since most common methods for RST-based feature selection often led to a non-minimal feature combination. In this paper we propose a novel feature selection algorithm based on rough sets and ACO, which adopts mutual information based feature significance as heuristic information for ACO. We also introduce the concept of feature core to the algorithm, by requiring that all ants must start from the core, when they begin their search through the feature space. Therefore those features near the core will be selected by the ants more quickly. The performance of our algorithm will be compared with that of RST-based algorithms and other metaheuristic-based algorithms.
This paper is organized as follows. In Sections 2 Preliminary, 3 Ant colony optimization, we introduce some preliminaries in rough set theory and ACO. In Section 4, we propose the approach to feature selection based on rough sets and ACO. And the pseudo-code of our algorithm is also given. Experimental results are given in Sections 5 Experimental results, 6 Conclusion concludes the paper.
Section snippets
Preliminary concepts of RST
This section recalls some essential definitions from RST that are used for feature selection. Detailed description and formal definitions of the theory can be found in (Pawlak, 1982).
The notion of information table has been studied by many authors as a simple knowledge representation method. Formally, an information table is a quadruple , where: is a nonempty finite set of objects, is a nonempty finite set of features, is the union of feature domains such that for
Ant colony optimization
In the real world, ants (initially) wander randomly, and upon finding food return to their colony while laying down pheromone trails. If other ants find such a path, they are likely not to keep traveling at random, but to instead follow the trail, returning and reinforcing it if they eventually find food. Thus, when one ant finds a good (i.e. short) path from the colony to a food source, other ants are more likely to follow that path, and positive feedback eventually leads all the ants
Feature selection based on rough sets and ant colony optimization
Following the standard ACO algorithmic scheme for combinatorial optimization problems, Jensen and Shen propose a method for feature selection based on rough sets and ACO (JSACO) (Jensen and Shen, 2003). The basic procedure of JSACO is as follows: given a colony of artificial ants to search through the feature space, these ants perform a number of iterations. During every iteration , each ant starts from a random feature, then selects the best route and the pheromone is updated. The
Experimental results
In this section, we shall demonstrate the performance of our algorithm RSFSACO given in Section 4. The algorithm is tested on a personal computer running windows XP with 2.0 GHZ processor and 1 GB memory. In our experiments, we set the parameters , , , , , and the initial pheromone was set to 0.5 with a small random perturbation added, the number of ants was half the number of features and the maximum number of cycles equals 100. These parameters are determined based on
Conclusion
This paper discusses the shortcomings of conventional hill-climbing rough set approaches to feature selection. These techniques often fail to find optimal reducts, as no perfect heuristic can guarantee optimality. On the other hand, complete searches are not feasible for even medium sized datasets. So, ACO approaches provide a promising feature selection mechanism.
We proposed a novel feature selection technique based on rough sets and ACO. ACO has the ability to quickly converge. It has a
Acknowledgements
The research is supported by the National Natural Science Foundation of China under Grant Nos: 60775036, 60475019, and the Research Fund for the Doctoral Program of Higher Education of China under Grant No: 20060247039.
References (36)
- et al.
Feature selection for classification
Intell. Data Anal.
(1997) - et al.
Consistency-based search in feature selection
Artif. Intell.
(2003) - et al.
Eliminating redundancy and irrelevance using a new MLP-based feature selection method
Pattern Recognition
(2006) - et al.
Random subspace method for multivariate feature selection
Pattern Recognition Lett.
(2006) - et al.
Approaches to knowledge reduction based on variable precision rough set model
Inform. Sci.
(2004) - et al.
Rough set methods in feature selection and recognition
Pattern Recognition Lett.
(2003) Feature extraction using rough set theory and genetic algorithms: An application for the simplification of product quality evaluation
Comput. Indust. Eng.
(2002)- et al.
Swarm Intelligence: From Natural to Artificial Systems
(1999) - et al.
Feature selection and effective classifiers
J. ASIS
(1998) - Dorigo, M., Caro, G.D., 1999. Ant colony optimization: A new meta-heuristic. In: Proc. Congress on Evolutionary...
The Ant system: Optimization by a colony of cooperating agents
IEEE Trans. Syst. Man Cybernet. Part B
Personalized Web retrieval based on rough-fuzzy method
J. Comput. Inform. Systems
Gene selection for cancer classification using support vector machines
Machine Learn.
Semantics-preserve dimensionality reduction: Rough and fuzzy-rough-based approaches
IEEE Trans. Knowledge Data Eng.
Cited by (257)
Manifold assistant multi-modal multi-objective differential evolution algorithm and its application in actual rolling bearing fault diagnosis
2024, Engineering Applications of Artificial IntelligenceAn improved decision tree algorithm based on variable precision neighborhood similarity
2022, Information SciencesAn intuitionistic fuzzy bireduct model and its application to cancer treatment
2022, Computers and Industrial EngineeringFuzzy granular convolutional classifiers
2022, Fuzzy Sets and SystemsGene selection for high dimensional biological datasets using hybrid island binary artificial bee colony with chaos game optimization
2024, Artificial Intelligence Review