Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application to heart disease and liver disorders problems

https://doi.org/10.1016/j.eswa.2007.09.063Get rights and content

Abstract

An increasing number of algorithms and applications have coming into scene in the field of artificial immune systems (AIS) day by day. Whereas this increase is bringing successful studies, still, AIS is not an effective problem solver in some problem fields such as classification, regression, pattern recognition, etc. So far, many of the developed AIS algorithms have used a distance or similarity measure as the case in instance based learning (IBL) algorithms. The efficiency of IBL algorithms lies mainly in the weighting scheme they used. This weighting idea was taken as the objective of our study in that we used genetic algorithms to determine the weights of attributes and then used these weights in our previously developed Artificial Immune System (AWAIS). We evaluated the performance of new configuration (GA-AWAIS) on two medical datasets which were Statlog Heart Disease and BUPA Liver Disorders dataset. We also compared it with AWAIS for those problems. The obtained classification accuracy was very good with respect to both AWAIS and other common classifiers in literature.

Introduction

While new complex and hard problems are coming into scene, current problem solving tools are becoming insufficient and new tools are being developed for this need. Techniques like artificial neural networks (ANN), genetic algorithms (GA) are effectively used methods developed as a result of that need and they brought artificial intelligence (AI) concept to the problem solving field.

Artificial immune system (AIS) is a new AI technique which can be applied to various branches of problem space like classification, virus detection, robotics, optimization, etc. Though its generality in this wide range of application area, successful studies obtaining better results than available methods are not so many. In their study, Hart and Timmis scrutinized the reason for this and they concluded their paper with emphasizing the need for the correct modeling in correct application field (Hart & Timmis, 2005). As they stated, natural immune system has this potential of solving complex problems but this potential has not been utilized so far because of insufficient modeling in inappropriate application areas. In our previous studies, we tried to point out a deficiency in proposed AIS algorithms by developing an attribute weighted artificial immune system (AWAIS) (Şahan et al., 2004, Şahan et al., 2005).

In developing an AIS, one needs a representation scheme to model immune system units in the system. Shape-space representation method was developed for this purpose and it has been used almost in every AIS (Perelson & Oster, 1979). Whereas being very plausible from the biological perspective, it does not carry any classification bias if pure distance criterion is used like Euclidean or Manhattan distance. However, we know that classification bias is the bone of a classifier system. Thus, some kind of bias must be used in developed classifier either in the used classification scheme or in the representation method. The basic AIS algorithm resembles to the IBL algorithms in that a distance function is used to determine the dissimilarity between the object to be classified and system units. Except from some like (Watkins, 2001, Carter, 2000), many AIS generate their units without any bias. One opportunity is to give this bias via representation schemes. Our previous study (Şahan et al., 2004) tried to do this by generating weights for attributes and then using these weights in a simple AIS classifier. In that study, we calculated weights using statistical information in dataset such as standard deviation and mean value of attributes. We applied AWAIS to the two medical diagnosis problem; Heart Disease and Diabetes disease classification using datasets in UCI Machine Learning repository (http://www.phys.uni.torun.pl/kmk/projects/datasets.html#Sheart). Compared to other methods in literature, AWAIS has obtained reasonable results but not better than the state-of-art works (Şahan, Kodaz, Güneş, & Polat, 2005). In this study, we utilized from the GA for determining weights which were then used in AIS classification. Successful results obtained from our system inspired us to apply this method to other real-world classification problems. A classification accuracy of 87.43% was reached for the Statlog Heart Disease while the result for the BUPA Liver Disorders was good, too, with an accuracy of 85.21%. For these datasets, GA-AWAIS over-performed to AWAIS in a good deal. The comparison was also conducted with literature and it has seen that GA-AWAIS reached the highest classification accuracy for Statlog Heart Disease and BUPA Liver Disorders datasets among other classifiers applied to these datasets.

The paper was organized as follows. Next section gives the background information about attribute weighting and AIS. The following section introduces AWAIS to the reader and then GA-AWAIS configuration used in this paper was explained in section four. Our last Section 6 follows Section 5 in which application results were given.

Section snippets

Attribute weighting in pattern recognition

In a classification process the contribution of attributes may be different. So, giving weights to attributes may correct this imbalance and improve classification accuracy. Feature weighting has being used in pattern recognition applications for a long time.

In their study, Wettschereck, Aha, and Mohri (1997) reviewed feature weighting methods for a class of five dimensions which were bias, weight space, representation, generality and knowledge (see reference Wettschereck et al., 1997). Among

AWAIS

In the systems that use a distance criterion, some shape-space related problems may exist in case of irrelevant attributes. Sahan et. al. aimed to reach higher classification accuracy by assigning weights to important attributes in classification with their study in Şahan et al. (2004). This was done with using some statistical properties of training set in calculating weights for features and then using these weights in distance calculation. By doing so, a system named attribute weighted

GA-AWAIS

Genetic algorithms (GA), which is the one of the natural optimization methods, is a branch of evolutionary algorithms that model biological processes to optimize rather complex cost functions. This method has proposed by John Holland (1975) and has been made popular by one of his student, David Goldberg. GA depends on the modeling of genetic processes in living organisms. The processes are based on the evolution of individuals belonging to a population after crossover and mutation. These

Application and results

To evaluate the performance of our new configuration, GA-AWAIS, we conducted applications on two well known medical classification problem: Heart disease and Liver disorders classification problems. We took the necessary datasets which were named as Statlog Heart Disease and BUPA Liver Disorders from UCI machine learning database. We prefer these datasets because these are used commonly among the researchers trying to solve medical classification problems with their proposed systems. Also,

Conclusion

Although AIS is bringing a new tool for solving complex problems with newly developed algorithms, we can not completely say that it over-performs to other systems especially for the classification field. A main problem in AIS classifiers is that many of them do not carry classification bias. Most of them use a pure distance criterion to calculate affinity degrees of system units to the presented data. However, today, many systems using some kind of distance function utilize from attribute

Acknowledgement

This study is supported by the Scientific Research Projects of Selçuk University (project no. 05401069).

References (19)

  • A.S. Perelson et al.

    Theoretical studies of clonal selection: Minimal antibody repertoire size and reliability of self-nonself discrimination

    Journal of Theoretical Biology

    (1979)
  • M. Pöyhönen et al.

    On the convergence of multiattribute weighting methods

    European Journal of Operational Research

    (2001)
  • A.K. Abbas et al.

    Cellular and molecular immunology

    (2000)
  • Q.A. Al-Radaideh et al.

    Approximate reduct computation by rough set based attribute weighting

    IEEE International Conference on Granular Computing

    (2005)
  • A. Blanché et al.

    MACLAW: A modular approach for clustering with local attribute weighting

    Pattern Recognition Letters

    (2006)
  • J.H. Carter

    The immune system as a model for pattern recognition and classification

    Journal of the American Medical Informatics Association

    (2000)
  • D. Dasgupta et al.

    Evalutionary algorithms in engineering applications

    (1997)
  • L.N. De Castro et al.

    Artificial immune systems: A new computational intelligence approach

    (2002)
  • Frigui, H., & Nasraoui O. (2002). A fast algorithm for discovering categories and attribute relevance in web data. In...
There are more references available in the full text version of this article.

Cited by (0)

View full text