Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application to heart disease and liver disorders problems
Introduction
While new complex and hard problems are coming into scene, current problem solving tools are becoming insufficient and new tools are being developed for this need. Techniques like artificial neural networks (ANN), genetic algorithms (GA) are effectively used methods developed as a result of that need and they brought artificial intelligence (AI) concept to the problem solving field.
Artificial immune system (AIS) is a new AI technique which can be applied to various branches of problem space like classification, virus detection, robotics, optimization, etc. Though its generality in this wide range of application area, successful studies obtaining better results than available methods are not so many. In their study, Hart and Timmis scrutinized the reason for this and they concluded their paper with emphasizing the need for the correct modeling in correct application field (Hart & Timmis, 2005). As they stated, natural immune system has this potential of solving complex problems but this potential has not been utilized so far because of insufficient modeling in inappropriate application areas. In our previous studies, we tried to point out a deficiency in proposed AIS algorithms by developing an attribute weighted artificial immune system (AWAIS) (Şahan et al., 2004, Şahan et al., 2005).
In developing an AIS, one needs a representation scheme to model immune system units in the system. Shape-space representation method was developed for this purpose and it has been used almost in every AIS (Perelson & Oster, 1979). Whereas being very plausible from the biological perspective, it does not carry any classification bias if pure distance criterion is used like Euclidean or Manhattan distance. However, we know that classification bias is the bone of a classifier system. Thus, some kind of bias must be used in developed classifier either in the used classification scheme or in the representation method. The basic AIS algorithm resembles to the IBL algorithms in that a distance function is used to determine the dissimilarity between the object to be classified and system units. Except from some like (Watkins, 2001, Carter, 2000), many AIS generate their units without any bias. One opportunity is to give this bias via representation schemes. Our previous study (Şahan et al., 2004) tried to do this by generating weights for attributes and then using these weights in a simple AIS classifier. In that study, we calculated weights using statistical information in dataset such as standard deviation and mean value of attributes. We applied AWAIS to the two medical diagnosis problem; Heart Disease and Diabetes disease classification using datasets in UCI Machine Learning repository (http://www.phys.uni.torun.pl/kmk/projects/datasets.html#Sheart). Compared to other methods in literature, AWAIS has obtained reasonable results but not better than the state-of-art works (Şahan, Kodaz, Güneş, & Polat, 2005). In this study, we utilized from the GA for determining weights which were then used in AIS classification. Successful results obtained from our system inspired us to apply this method to other real-world classification problems. A classification accuracy of 87.43% was reached for the Statlog Heart Disease while the result for the BUPA Liver Disorders was good, too, with an accuracy of 85.21%. For these datasets, GA-AWAIS over-performed to AWAIS in a good deal. The comparison was also conducted with literature and it has seen that GA-AWAIS reached the highest classification accuracy for Statlog Heart Disease and BUPA Liver Disorders datasets among other classifiers applied to these datasets.
The paper was organized as follows. Next section gives the background information about attribute weighting and AIS. The following section introduces AWAIS to the reader and then GA-AWAIS configuration used in this paper was explained in section four. Our last Section 6 follows Section 5 in which application results were given.
Section snippets
Attribute weighting in pattern recognition
In a classification process the contribution of attributes may be different. So, giving weights to attributes may correct this imbalance and improve classification accuracy. Feature weighting has being used in pattern recognition applications for a long time.
In their study, Wettschereck, Aha, and Mohri (1997) reviewed feature weighting methods for a class of five dimensions which were bias, weight space, representation, generality and knowledge (see reference Wettschereck et al., 1997). Among
AWAIS
In the systems that use a distance criterion, some shape-space related problems may exist in case of irrelevant attributes. Sahan et. al. aimed to reach higher classification accuracy by assigning weights to important attributes in classification with their study in Şahan et al. (2004). This was done with using some statistical properties of training set in calculating weights for features and then using these weights in distance calculation. By doing so, a system named attribute weighted
GA-AWAIS
Genetic algorithms (GA), which is the one of the natural optimization methods, is a branch of evolutionary algorithms that model biological processes to optimize rather complex cost functions. This method has proposed by John Holland (1975) and has been made popular by one of his student, David Goldberg. GA depends on the modeling of genetic processes in living organisms. The processes are based on the evolution of individuals belonging to a population after crossover and mutation. These
Application and results
To evaluate the performance of our new configuration, GA-AWAIS, we conducted applications on two well known medical classification problem: Heart disease and Liver disorders classification problems. We took the necessary datasets which were named as Statlog Heart Disease and BUPA Liver Disorders from UCI machine learning database. We prefer these datasets because these are used commonly among the researchers trying to solve medical classification problems with their proposed systems. Also,
Conclusion
Although AIS is bringing a new tool for solving complex problems with newly developed algorithms, we can not completely say that it over-performs to other systems especially for the classification field. A main problem in AIS classifiers is that many of them do not carry classification bias. Most of them use a pure distance criterion to calculate affinity degrees of system units to the presented data. However, today, many systems using some kind of distance function utilize from attribute
Acknowledgement
This study is supported by the Scientific Research Projects of Selçuk University (project no. 05401069).
References (19)
- et al.
Theoretical studies of clonal selection: Minimal antibody repertoire size and reliability of self-nonself discrimination
Journal of Theoretical Biology
(1979) - et al.
On the convergence of multiattribute weighting methods
European Journal of Operational Research
(2001) - et al.
Cellular and molecular immunology
(2000) - et al.
Approximate reduct computation by rough set based attribute weighting
IEEE International Conference on Granular Computing
(2005) - et al.
MACLAW: A modular approach for clustering with local attribute weighting
Pattern Recognition Letters
(2006) The immune system as a model for pattern recognition and classification
Journal of the American Medical Informatics Association
(2000)- et al.
Evalutionary algorithms in engineering applications
(1997) - et al.
Artificial immune systems: A new computational intelligence approach
(2002) - Frigui, H., & Nasraoui O. (2002). A fast algorithm for discovering categories and attribute relevance in web data. In...