Abstract
The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approaches have emerged to handle the automatic algorithm selection challenge, including meta-learning. Meta-learning is a popular approach that leverages accumulated experience for future learning and typically involves dataset characterization. Existing meta-learning methods often represent a dataset using predefined features and thus cannot be generalized across different ML tasks, or alternatively, learn a dataset’s representation in a supervised manner and therefore are unable to deal with unsupervised tasks. In this study, we propose a novel learning-based task-agnostic method for producing dataset representations. Then, we introduce TRIO, a meta-learning approach, that utilizes the proposed dataset representations to accurately recommend top-performing algorithms for previously unseen datasets. TRIO first learns graphical representations for the datasets, using four tools to learn the latent interactions among dataset instances and then utilizes a graph convolutional neural network technique to extract embedding representations from the graphs obtained. We extensively evaluate the effectiveness of our approach on 337 datasets and 195 ML algorithms, demonstrating that TRIO significantly outperforms state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.










Similar content being viewed by others
Notes
A number of graph convolutional layers {2,3,4,5,6} and embedding dimensions of {50, 100, 200, 300, 400, 500} were tested, and, respectively, 4 and 300 were found to produce the best results with reasonable efficiency across all models.
References
Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260
van Rijn JN, Abdulrahman SM, Brazdil P, Vanschoren J (2015) Fast algorithm selection using learning curves. In: International symposium on intelligent data analysis. Springer, pp 298–309
Olson RS, Moore JH (2016) Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on automatic machine learning. PMLR, pp 66–74
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: ACM SIGKDD, pp 847–855
Brazdil P, Giraud-Carrier C, Soares C, Vilalta R (2009) Metalearning—applications to data mining
Vainshtein R, Greenstein-Messica A, Katz G, Shapira B, Rokach L (2018) A hybrid approach for automatic model recommendation. In: Proceedings of the 27th ACM CIKM, pp 1623–1626
Ferrari DG, De Castro LN (2015) Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf Sci 301:181–194
Pimentel BA, de Carvalho AC (2019) A new data characterization for selecting clustering algorithms using meta-learning. Inf Sci 477:203–219
Alcobaça E, Siqueira F, Rivolli A, Garcia LP, Oliva JT, de Carvalho AC et al (2020) Mfe: towards reproducible meta-feature extraction. J Mach Learn Res 21(111):1–5
Cohen-Shapira N, Rokach L (2021) Automatic selection of clustering algorithms using supervised graph embedding. Inf Sci 577:824–851
Cohen-Shapira N, Rokach L, Shapira B, Katz G, Vainshtein R (2019) Autogrd: model recommendation through graphical dataset representation. In: Proceedings of the 28th ACM CIKM, pp 821–830
Cohen-Shapira N, Rokach L (2021) Trio: task-agnostic dataset representation optimized for automatic algorithm selection. In: Proceedings of the 21th IEEE international conference on data mining ICDM
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Drori I, Krishnamurthy Y, Rampin R, Lourenço R, One J, Cho K, Silva C, Freire J (2018) Alphad3m: machine learning pipeline synthesis. In: AutoML workshop at ICML
Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognit 45(7):2672–2689
Edwards H, Storkey A (2016) Towards a neural statistician. arXiv preprint arXiv:1606.02185
Yaveroğlu, Malod-Dognin N, Davis D, Levnajić Z, Janjic V, Karapandza R, Stojmirovic A, Przulj N (2014) Revealing the hidden language of complex networks. Sci Rep 4:4547
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: 2010 IEEE international conference on data mining. IEEE, pp 911–916
Zhou Z-H, Feng J (2017) Deep forest: towards an alternative to deep neural networks. In: IJCAI, pp 3553–3559
Feng J, Zhou Z (2018) Autoencoder by forest. In: AAAI conference on AI
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, pp 413–422
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Rokach L (2016) Decision forest: twenty years of research. Inf Fusion 27:111–125
Bühlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30(4):927–961
Poona N, Van Niekerk A, Ismail R (2016) Investigating the utility of oblique tree-based ensembles for the classification of hyperspectral data. Sensors 16(11):1918
Setiono R, Liu H (1999) A connectionist approach to generating oblique decision trees. IEEE Trans Syst Man Cybern B Cybern 29(3):440–444
Montañana R, Gámez JA, Puerta JM (2021) Stree: a single multi-class oblique decision tree based on support vector machines. In: Conference of the Spanish Association for artificial intelligence. Springer, pp 54–64
Vens C, Costa F (2011) Random forest based feature induction. In: 2011 IEEE 11th ICDM, pp 744–753
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R news 2(3):18–22
Dianati N (2016) Unwinding the hairball graph: pruning algorithms for weighted complex networks. Phys Rev E 93(1):012304
Chen F, Pan S, Jiang J, Huo H, Long G (2019) Dagcn: dual attention graph convolutional networks. In: IJCNN. IEEE, pp 1–8
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web, pp 1067–1077
Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X et al (2019) Deep graph library: Agraph-centric, highly-performant package for graph neural net. arXiv preprint arXiv:1909.01315
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J AI Res 11:169–198
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Ml Res 15(1):3133–3181
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):66
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):66
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cohen-Shapira, N., Rokach, L. Learning dataset representation for automatic machine learning algorithm selection. Knowl Inf Syst 64, 2599–2635 (2022). https://doi.org/10.1007/s10115-022-01716-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01716-2