Elsevier

Applied Soft Computing

Volume 16, March 2014, Pages 34-49
Applied Soft Computing

A novel ant colony optimization based single path hierarchical classification algorithm for predicting gene ontology

https://doi.org/10.1016/j.asoc.2013.11.012Get rights and content

Highlights

  • hAntMiner-C is a hierarchical classifier and handle tree and DAG topologies.

  • Our classifier can handle single path hierarchical classification.

  • Detailed review of hierarchical single and multi-label classification is included.

  • hAntMiner-C is tested over ion-channel datasets.

  • Our classifier is statistically significantly better as compared to the competitors.

Abstract

There exist numerous state of the art classification algorithms that are designed to handle the data with nominal or binary class labels. Unfortunately, less attention is given to the genre of classification problems where the classes are organized as a structured hierarchy; such as protein function prediction (target area in this work), test scores, gene ontology, web page categorization, text categorization etc. The structured hierarchy is usually represented as a tree or a directed acyclic graph (DAG) where there exist IS-A relationship among the class labels. Class labels at upper level of the hierarchy are more abstract and easy to predict whereas class labels at deeper level are most specific and challenging for correct prediction. It is helpful to consider this class hierarchy for designing a hypothesis that can handle the tradeoff between prediction accuracy and prediction specificity. In this paper, a novel ant colony optimization (ACO) based single path hierarchical classification algorithm is proposed that incorporates the given class hierarchy during its learning phase. The algorithm produces IF–THEN ordered rule list and thus offer comprehensible classification model. Detailed discussion on the architecture and design of the proposed technique is provided which is followed by the empirical evaluation on six ion-channels data sets (related to protein function prediction) and two publicly available data sets. The performance of the algorithm is encouraging as compared to the existing methods based on the statistically significant Student's t-test (keeping in view, prediction accuracy and specificity) and thus confirm the promising ability of the proposed technique for hierarchical classification task.

Introduction

Advanced sensing, capturing and computing technologies enable us to collect large amount of complex (possibly, raw) data in many fields of lives but how do we know which portion of data is important and gives us an insight to help our decision making process? Moreover, when this problem is faced in the medical domain and the decision become a matter of life and death of a patient, finding the correct set of information become more prominent and crucial. Data mining techniques can be used to extract implicit, previously unknown and potentially useful [1] patterns and knowledge of interests from these vast data stores for varied purposes. One such important data mining technique, known as classification, is successfully used in a myriad of applications e.g. decision making, fraud detection, medical diagnosis, credit scoring, customer relationship management, character recognition, speech recognition, protein function prediction etc.

Numerous classification techniques are proposed throughout the decades, such as Decision Trees, Neural Networks (NN), k-Nearest Neighbor's (k-NN), Logistic Regression and Support Vector Machine (SVM), etc. [2], [3], [4]. Some of these techniques (e.g. NN and SVM) produce incomprehensible classification models that are usually opaque to common users, while others e.g., IF–THEN rule list produced with decision trees, are easily comprehensible to the experts working in different domains. These techniques are reported to perform well in various domains and considered computationally efficient (e.g. SVM), robust to noisy data (e.g. decision trees) and easy to learn (e.g. k-NN). However, most of these classification techniques are designed to handle the data with binary or nominal class labels (where class labels are independent). These classical strategies lack the ability to handle the problems where the class labels are related and are organized based on a class hierarchical structure (CHS). The later one is a complex instance of classification problems, known as hierarchical classification, as compared to the one level flat classification problems [5]. Hierarchical classification has applications in various domains.

This research work focuses on hierarchical protein function prediction defined under the scheme of Gene Ontology (GO), specifically “molecular function” domain. The GO structure [6] represents the relationship among the protein functions using directed acyclic graph (DAG) CHS. It is well known that a protein can perform multiple functions (considered as class labels) and these functions are usually related (modeled as tree or DAG based IS-A relationship), makes protein function prediction a suitable and ideal problem for proposed algorithm. There is a large amount of uncharacterized protein data which is available for analysis that has led to an increased interest in computational methods to support the investigation of the role of proteins in an organism [7], [8]. Analyzing the functions of proteins in different organisms is crucial to improve biological knowledge, diagnosis and treatment of diseases. It is not possible to conduct the biological experiments for the functional essay analysis of every uncharacterized protein due to involvement of high cost and human based analysis [9]. It therefore raise the need of the development of computational methods (especially related to data mining domain, like the one proposed in this paper) to be used for this purpose.

In case of hierarchical classification, the class labels to be predicted, are naturally organized as a class hierarchy/taxonomy, typically represented as a tree or DAG, see e.g. Fig. 1a and b. The class labels in the hierarchy are represented as nodes and the relationships between the class labels are shown with undirected edges. For tree structure, a node can have only one parent whereas no such restriction is imposed over DAG CHS. Predicting a single class label in the hierarchy, implies that all the ancestor class labels are also predicted. In other words, a single class label is a path from root node to the predicted child node (explained later) that is consistent to the IS-A relationship.

Considering the class hierarchy, nodes at the upper levels represent more general class labels whereas the nodes at the lower levels represent the more specific class labels. General class labels are easy to predict as numerous examples related to them are available (to the hypothesis learner). On the other hand, classes at the deeper levels of the hierarchy (i.e. specific classes) are difficult to predict as less information is available to discriminate among them. There is always a tradeoff between generality and specificity in hierarchical classification. In Fig. 2, a dataset is given with corresponding CHS for different animals. Given a test example (from this dataset), if we predict the class label ‘Animal’, the prediction is 100% accurate but we get no valuable information about the specific class of animal. Predicting specific class of animal is more important but the chances of erroneous prediction is high.

In order to identify the types of problems that our proposed algorithm can handle, more information is provided in follows. Based on class label(s) associated with an example, the hierarchical classification has further two categories [9]:

  • (1)

    Hierarchical Single Label (path) Classification: In this type of classification, an example is associated with only one class label at any level of the class hierarchy.

  • (2)

    Hierarchical Multi-label (path) Classification: In this type of classification, an example can be associated with more than one class label at any level of the hierarchy (multi-paths in the hierarchy).

The hierarchical classification problems can further be divided in two categories based on the level (depth) of the predicted class label [9], [10]: (1) Mandatory Leaf Node Prediction (MLNP), and (2) Optional Leaf Node Prediction (OLNP) or Non-Mandatory Leaf Node Prediction (NMLNP). Based on MLNP, it is mandatory for a classifier to predict at least one of the leaf class labels (from CHS) for classifying a test example. The OLNP problems are somewhat flexible and classifier can predict class label(s) for a test example at any level of the class hierarchy. The proposed algorithm can only handle hierarchical single path classification problems, considering only the OLNP case.

The remainder of this paper is organized as follows. In the next section, we review related research for hierarchical classification task. In Section 3, we briefly present the basics and the background of ACO meta-heuristic. In Section 4, the architecture of the proposed solution will be discussed. Subsequently, in Section 5, we present simulation results to show the promising ability of our technique. Finally, Section 6 will conclude this work.

Section snippets

Related work

One simple approach to deal with the hierarchical classification problems is to completely ignore the given CHS by using a flat classification algorithm (e.g. decision tree or SVM, etc.), predicting only leaf class nodes. This approach provides an indirect solution to the hierarchical classification problem as if a class at leaf node is predicted, all the ancestor classes (considering the IS-A relationship) are also implicitly assigned to the instance being classified. However, this approach

Ant colony optimization

Swarm Intelligence [16], [17], [18], which deals with the collective behavior of small and simple entities, has been used in many application domains. ACO, proposed in the early 90s [19], [20], [21], [22], is one of the most famous meta-heuristic under the umbrella of Swarm Intelligence. Since its inception, ACO has been used to solve many complex problems including those related to data mining [23], [24], [25] as well as other combinatorial optimization problems. ACO is inspired by the food

Method

In this section, we discuss different stages of the proposed ACO based hierarchical classification algorithm. We begin with the definition of the problem tackled in this work followed by a brief general description of the proposed algorithm. Afterwards, each and every stage of the approach is further discussed in a fair amount of details. The stages are: search space design, rule construction based on pheromone and a correlation based heuristic function, rule evaluation, rule pruning, pheromone

Results and discussion

In this section, we present the simulation results of our proposed method (hAM-C) in comparison with another hierarchical single path classification ACO based algorithm (hAM), proposed in the work of Otero et al. [7]. The proposed algorithm is implemented in the Microsoft Visual Studio (2008) development environment using C-Sharp language. On the other hand, for hAM [7], JAVA based implementation is kindly made available by Otero. All the experiments are conducted on an Intel Core i3 Processor

Conclusion

In this article, we have presented a novel ant colony optimization based single path hierarchical classification algorithm, named hAM-C. A detailed review of different types of hierarchical classification problems and different categories of corresponding solutions is also provided to enhance the understanding regarding the target problem and to facilitate the readers. Extending on the ideas of our previous flat classification algorithm AntMiner-C, the hAM-C is tailor to handle the hierarchical

References (37)

  • Y.-L. Chen et al.

    Constructing a decision tree from data with hierarchical class labels

    Expert Systems with Applications

    (2009)
  • J.R. Quinlan

    C4.5: Programs for Machine Learning

    (1993)
  • J.R. Quinlan

    Generating production rules from decision trees

  • V.N. Vapnik

    The Nature of Statistical Learning Theory

    (1995)
  • A. Freitas et al.

    A tutorial on hierarchical classification with applications in bioinformatics

  • M. Ashburner

    The gene ontology: tool for the unification of biology

    Nature Genetics

    (2000)
  • F.E.B. Otero et al.

    A hierarchical classification ant colony algorithm for predicting gene ontology terms

  • F.E.B. Otero et al.

    A hierarchical multi-label classification ant colony algorithm for protein function prediction

    Memetic Computing

    (2010)
  • F.E.B. Otero

    New Ant colony optimization algorithms for hierarchical classification of protein functions

    (2010)
  • C.N. Cilla et al.

    A survey of hierarchical classification across different application domains

    Data Mining & Knowledge Discovery

    (2010)
  • D. Koller et al.

    Hierarchically classifying documents using very few words

  • C. Vens et al.

    Decision trees for hierarchical multi-label classification

    Machine Learning

    (2008)
  • H. Blockeel et al.

    Top-down induction of clustering trees

  • R.S. Parpinelli et al.

    Data mining with an ant colony optimization algorithm

    IEEE Transaction on Evolutionary Computation

    (2002)
  • F. Otero et al.

    cAnt-Miner: an ant colony classification algorithm to cope with continuous attributes

  • A.P. Engelbrecht

    Computational Intelligence, An Introduction

    (2007)
  • A.P. Engelbrecht

    Fundamentals of Computational Swarm Intelligence

    (2005)
  • J. Kennedy et al.

    Swarm Intelligence

    (2001)
  • Cited by (22)

    • Mapping ontology vertices to a line using hypergraph framework

      2020, International Journal of Cognitive Computing in Engineering
    • Low-carbon cold chain logistics using ribonucleic acid-ant colony optimization algorithm

      2019, Journal of Cleaner Production
      Citation Excerpt :

      constructed the multi-objective optimization model based on the low-carbon method, which minimized the cost of the logistics and the amount of carbon emissions. Those prior models were solved and optimized using the ACO algorithm (Jangam and Chakraborti, 2007; Khan et al., 2014; Lan et al., 2015; Chen et al., 2019). Wang et al 2017, and Zohal and Soleimani (2016) constructed the multi-temperature, joint-delivery route optimization model which provides a time limit on the logistics under random demand.

    • Prediction of water temperature in prawn cultures based on a mechanism model optimized by an improved artificial bee colony

      2017, Computers and Electronics in Agriculture
      Citation Excerpt :

      Therefore, to overcome the deficiencies of traditional search approaches, it is necessary to research other advanced search technologies to improve the efficiency of parameter identification. With the enormous advances in machine learning and artificial intelligence over the last few decades, modern metaheuristic approaches have been developed, such as simulated annealing algorithms (SA) (Bahrami et al., 2016; Yannibelli and Amandi, 2013), differential evolution algorithms (DE) (Onan et al., 2016; Venske et al., 2014), genetic algorithms (GA) (Sawyerr et al., 2014; Liu et al., 2013a,b,c), particle swarm optimization algorithms (PSO) (Sugandhi et al., 2015), and ant colony algorithm (AC) (Beltramo et al., 2016; Khan et al., 2014). Among these techniques, swarm intelligence is usually embedded with the characteristics of a feedback mechanism, randomness, and synergy to develop a powerful and efficient mechanism, and has become increasingly popular for parameter identification in different application areas (Bahrami et al., 2016; Onan et al., 2016; Sawyerr et al., 2014; Liu et al., 2013a,b,c; Sugandhi et al., 2015; Beltramo et al., 2016).

    • Ant colony optimization based hierarchical multi-label classification algorithm

      2017, Applied Soft Computing Journal
      Citation Excerpt :

      The other solution strategy to deal with hierarchical classification problem is known as big-bang (or global) classification system, when a single classifier handling the entire class hierarchy (at once) is used, looking at the problem instance from a global perspective. The focus of article is on global hierarchical classification models, the readers are kindly referred to [49] for a detailed discussion on local classifier approaches for hierarchical classification. In this approach, all the related class labels as per the given CHS are considered at once from a global point of view.

    View all citing articles on Scopus
    View full text