Ensemble of ML-KNN for classification algorithm recommendation

https://doi.org/10.1016/j.knosys.2021.106933Get rights and content

Highlights

  • Achieve better generalization and recommendation performance by ensemble of ML-KNN.

  • Take advantage of the diversity among different types of meta-features.

  • Recommend varied proper number of algorithms for different classification problems.

Abstract

With the mountains of classification algorithms proposed in the literature, the study of how to select suitable classifier(s) for a given problem is important and practical. Existing methods rely on a single learner built on one type of meta-features or a simple combination of several types of meta-features to address this problem. In this paper, we propose a two-layer classification algorithm recommendation method called EML (Ensemble of ML-KNN for classification algorithm recommendation) to leverage the diversity of different sets of meta-features. The proposed method can automatically recommend different numbers of appropriate algorithms for different dataset, rather than specifying a fixed number of appropriate algorithm(s) as done by the ML-KNN, SLP-based and OBOE methods. Experimental results on 183 public datasets show the effectiveness of the EML method compared to the three baseline methods.

Introduction

Classification is one of the most important problems in data mining and has been widely studied. There are a large variety of classification algorithms proposed in the literature, such as tree-based (e.g. C4.5 [1] and CART [2]), probability-based (e.g. Naive Bayes [3] and AODE [4]), and rule-based (e.g. OneR [5] and Ripper [6]). In addition, an increasing number of new classification algorithms are being proposed based on distinct mechanisms for different classification problems [7].

However, both the “No Free Lunch” theory [8] and experimental results [9], [10] have demonstrated that there is no specific classification algorithm applicable to all classification problems, namely all classification datasets. Thus, how to select the appropriate classification algorithm(s) for a given dataset is a major challenge, especially for non-expert.

Studies have shown that the appropriate algorithms vary with different datasets, and the performance of an algorithm is closely associated with the characteristics of the datasets [11]. Therefore, the key to the challenge is to explore the relationship between the characteristics of the datasets and the performance of the candidate algorithms, and then construct a model to recommend appropriate algorithm(s) for a new dataset. This is a hot spot called algorithm recommendation in the field of data mining, which has drawn the attention of many researchers [12], [13], [14], [15].

Generally, researchers [12], [16] view classification algorithm recommendation as a meta-learning problem, where the meta-features are the characteristics of a dataset and the meta-target represents the relative performance of candidate algorithms on the dataset.

Formally, the classification algorithm recommendation problem can be viewed as a learning problem that can be solved in two steps: (1) searching for a function f:XY, where XRp is the meta-feature space with p meta-features and Y{0,1}q is the meta-target space with q candidate algorithms; (2) recommending appropriate algorithms YnewY for a new dataset dnew according to f(Xnew), where XnewX is the meta-features of dnew.

There are three different representations of meta-target Y that have been commonly used in the literature, including single-label-based, ranking-label-based and multilabel-based. With different representations of meta-target, different methods can be used to solve the classification algorithm recommendation problem. For single-label-based meta-target, single-label learning methods can be utilized to recommend a single algorithm that achieves the best performance [9], [14], [17]. For ranking-label-based meta-target, ranking learning methods [10], [16] or regression methods [18], [19], [20] can be adopted to recommend a ranked list of the candidate algorithms according to their predictive performance.

Wang et al. [21] noted and demonstrated that classification algorithm recommendation was closer to a multilabel learning problem. They took all the algorithms that performed statistically equivalent to the best algorithm as appropriate algorithms. Their meta-target was in multilabel form with all the appropriate algorithms. ML-KNN [22] was used as the multilabel learning method to recommend the top r algorithms for a dataset, where the value of r needed to be specified in advance. Their empirical study validated the effectiveness of the ML-KNN-based classification algorithm recommendation method compared to single-label-based and ranking-label-based methods.

Zhu et al. [23] applied a link prediction-based method to recommend the top r appropriate algorithms. It achieved better performance than the ML-KNN-based method because it considered the impact of more datasets, rather than only the neighbors of the given dataset.

Yang et al. [24] proposed OBOE, which utilized collaborative filtering, for classification algorithm recommendation. By choosing and testing several initial candidate algorithms, it predicted the performance of all the candidate algorithms according to the performance of the initial candidate algorithms. It selected the top r algorithms for classification of the new problem.

All the ML-KNN, SLP-based and OBOE multilabel form classification algorithm recommendation methods depend on only one learner, which affects the recommendation performance and leaves room for improvement. In addition, the number of recommended algorithms r needed to be specified in advance. However, this number varies with different datasets. Wang et al. [21] had shown that when taking 13 algorithms into account, the number of appropriate algorithms for different datasets ranged from 1 to 13. Thus, designating a fixed number of appropriate algorithms for all the datasets is improper, in addition to the difficulty on determining this varying number of appropriate algorithms for different datasets.

To overcome the problem discussed above, in this paper, we propose a two-layer learning method EML, which is an ensemble of ML-KNN, for classification algorithm recommendation. The EML method is based on the framework of stacking [25], which has three primary advantages:

  • (1)

    Taking ML-KNN-based methods as the weaker learners, it is expected to produce a stronger learner by combining multiple ML-KNN to achieve better recommendation performance;

  • (2)

    It can take advantage of the diversity of various sets of meta-features;

  • (3)

    It does not need to specify the number of appropriate algorithms beforehand. Instead, it can automatically recommend the proper number of algorithms varying by different classification problems.

The main contributions of the paper include:

  • (1)

    An ensemble of ML-KNN is proposed to improve the performance of classification algorithm recommendations;

  • (2)

    Different types of meta-features are fully utilized;

  • (3)

    An extensive empirical study is conducted to validate the effectiveness of the proposed method.

The rest of this paper is organized as follows. Section 2 provides a summary of the related work. Section 3 shows the preliminary nature of the proposed EML method while Section 4 describes the details of EML. Section 5 conducts an empirical study. Section 6 presents the threats to validity. The conclusion of this work is given in Section 7.

Section snippets

Related work

Different researchers have studied the classification algorithm recommendation problem from different perspectives, most of which focus on analyzing the relationship between characteristics of the datasets and performance of the classification algorithms with experimental approaches. The existing classification algorithm recommendation methods can be divided into two categories: theoretical and experimental.

Brodley [26] proposed a heuristic approach that can recognize the best classification

Preliminary

In contrast to conventional machine learning methods, ensemble learning method attempts to construct a set of learning models, i.e., base learners, in a unified framework, and then combine the multiple base learners to make a prediction. The generalization ability of employing an ensemble learner is often more robust than using single constituent base learners [36], [37].

Ensemble learner usually shows better performance since that it can partly overcome the following problems encountered by a

The proposed EML method

In this section, we first give a general view of the proposed recommendation method. We then describe the proposed method in detail. To facilitate presentation and understanding, a description of the main notations used in the paper is given in Table 1.

Empirical study

To evaluate the effectiveness of the EML method, an empirical study is conducted in this section. First, the experimental setup is explained. Then, the performance of EML is compared with the baseline methods. Finally, the comparison on the number of recommended algorithms and recommendation time between EML and the baseline methods is presented.

Threats to validity

A possible threat to the validity of the work lies in whether the 183 datasets and the 20 candidate classification algorithms used in the empirical study are representative to the broader population of datasets and classification algorithms. Preferring to choose the widely used datasets and well-known classification algorithms is the primary means by which this study attempts to avoid sample bias.

Another threat is the use of ACC and ARR as the classification evaluation metrics. These two

Conclusion

In this paper, we propose a two-layer classification algorithm recommendation method, EML, which is based on ensemble learning. The proposed EML method can take advantage of different combinations of meta-features and automatically recommend different numbers of appropriate algorithm(s) for different classification problems.

The proposed EML method consists of three steps: (1) meta-data extraction, including meta-target identification and meta-feature collection; (2) model construction, which

CRediT authorship contribution statement

Xiaoyan Zhu: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Chenzhen Ying: Methodology, Software, Writing - original draft, Writing - review & editing. Jiayin Wang: Validation, Funding acquisition. Jiaxuan Li: Software. Xin Lai: Writing - review & editing. Guangtao Wang: Conceptualization, Methodology.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their insightful and helpful comments and suggestions, which resulted in substantial improvements to this work. This work is supported by the National Natural Science Foundation of China (Grant Nos. 92046009, 71872146) and the Natural Science Basic Research Program of Shaanxi, China (Grant No. 2020JC-01).

References (54)

  • LuJ. et al.

    Transfer learning using computational intelligence: A survey

    Knowl.-Based Syst.

    (2015)
  • QuinlanJ.R.

    C4. 5: Programs for Machine Learning

    (2014)
  • BreimanL. et al.

    Classification and Regression Trees

    (1984)
  • MooreA.W. et al.

    Internet traffic classification using bayesian analysis techniques

  • WebbG.I. et al.

    Not so naive bayes: aggregating one-dependence estimators

    Mach. Learn.

    (2005)
  • HolteR.C.

    Very simple classification rules perform well on most commonly used datasets

    Mach. Learn.

    (1993)
  • W.W. Cohen, Fast effective rule induction, in: Proceedings of the twelfth international conference on machine learning,...
  • HuangW. et al.

    Relation classification via knowledge graph enhanced transformer encoder

    Knowl.-Based Syst.

    (2020)
  • D.H. Wolpert, The supervised learning no-free-lunch theorems, in: World Conference on Soft Computing, 2002, pp....
  • P.B. Brazdil, C. Soares, A comparison of ranking methods for classification algorithm selection, in: European...
  • L. Chekina, L. Rokach, B. Shapira, Meta-learning for selecting a multi-label classification algorithm, in: IEEE...
  • WangG. et al.

    An improved data characterization method and its application in classification algorithm recommendation

    Appl. Intell.

    (2015)
  • GoreS. et al.

    Dynamic algorithm selection for data mining classification

    Int. J. Sci. Eng. Res.

    (2013)
  • BrazdilP.B. et al.

    Ranking learning algorithms: Using ibl and meta-learning on accuracy and time results

    Mach. Learn.

    (2003)
  • LeeJ.W. et al.

    Automatic selection of classification learning algorithms for data mining practitioners

    Intell. Data Anal.

    (2013)
  • BensusanH. et al.

    Estimating the predictive accuracy of a classifier

  • ReifM. et al.

    Automatic classifier selection for non-experts

    Pattern Anal. Appl.

    (2014)
  • Cited by (0)

    View full text