research-article

AutoGRD: Model Recommendation Through Graphical Dataset Representation

Authors:

Noy Cohen-Shapira,

Bracha Shapira,

Roman VainshteinAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 821 - 830

https://doi.org/10.1145/3357384.3357896

Published: 03 November 2019 Publication History

Abstract

The widespread use of machine learning algorithms and the high level of expertise required to utilize them have fuelled the demand for solutions that can be used by non-experts. One of the main challenges non-experts face in applying machine learning to new problems is algorithm selection - the identification of the algorithm(s) that will deliver top performance for a given dataset, task, and evaluation measure. We present AutoGRD, a novel meta-learning approach for algorithm recommendation. AutoGRD first represents datasets as graphs and then extracts their latent representation that is used to train a ranking meta-model capable of accurately recommending top-performing algorithms for previously unseen datasets. We evaluate our approach on 250 datasets and demonstrate its effectiveness both for classification and regression tasks. AutoGRD outperforms state-of-the-art meta-learning and Bayesian methods.

References

[1]

H. Bensusan and C. Giraud-Carrier. 2000. Discovering Task Neighbourhoods through Landmark Learning Performances (PKDD '00). 325--330.

[2]

P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. 2008. Metalearning: Applications to Data Mining. Springer Publishing Company, Incorporated.

[3]

Pavel B. Brazdil, C. Soares, and Joaquim Pinto da Costa. 2003. Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Machine Learning 50, 3 (2003), 251--277.

Digital Library

[4]

Z. Cao, T. Qin, Tie-Yan Liu, Ming-Feng Tsai, and H. Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach (ICML '07). 129--136.

[5]

S. Cavallari, VincentW. Zheng, H. Cai, Kevin Chen-Chuan Chang, and E. Cambria. 2017. Learning Community Embedding with Community Detection and Node Embedding on Graphs (CIKM '17). 377--386.

[6]

T. Chen and C. Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. CoRR abs/1603.02754 (2016).

[7]

Silvia N. das Dôres, L. Alves, Duncan D. Ruiz, and Rodrigo C. Barros. 2016. A Meta-learning Framework for Algorithm Recommendation in Software Fault Prediction (SAC '16). 1486--1491.

[8]

J. Dean and S. Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 107--113.

[9]

J. Demar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7 (2006), 1--30.

Digital Library

[10]

I. Drori, Y. Krishnamurthy, R. Rampin, R. Lourenço, J. Ono, K. Cho, C. Silva, and J. Freire. 2018. AlphaD3M: Machine Learning Pipeline Synthesis (AutoML Workshop at ICML).

[11]

J. Feng and Z. Zhou. 2018. Autoencoder by forest. In AAAI Conference on Artificial Intelligence.

[12]

M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15 (2014), 3133--3181.

Digital Library

[13]

M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and Robust Automated Machine Learning (NIPS'15). 2755--2763.

[14]

M. Feurer, J. T. Springenberg, and F. Hutter. 2015. Initializing Bayesian Hyperparameter Optimization via Meta-learning (AAAI'15). 1128--1135.

[15]

P. Goyal and E. Ferrara. 2018. Graph Embedding Techniques, Applications, and Performance: A Survey. Knowl. -Based Syst. 151 (2018), 78--94.

[16]

A. Grover and J. Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks (KDD '16). 855--864.

[17]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.r Reutemann, and I. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD 11 (2009), 10--18.

Digital Library

[18]

F. Hutter, Holger H. Hoos, and K. Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration (LION'05). 507--523.

[19]

G. Katz, E. C. R. Shin, and D. Song. 2016. ExploreKit: Automatic Feature Generation and Selection. In ICDM.

[20]

C. Lemke, M. Budka, and B. Gabrys. 2015. Metalearning: a survey of trends and technologies. Artificial Intelligence Review 44 (2015), 117--130.

Digital Library

[21]

L. Li, Kevin G. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. 2017. Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits. In 5th International Conference on Learning Representations.

[22]

M. A. Muñoz, Y. Sun, M. Kirley, and S. K. Halgamuge. 2015. Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges. Information Sciences 317 (2015), 224 -- 245.

Digital Library

[23]

R. S. Olson and J. H. Moore. 2016. TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning (Proceedings of Machine Learning Research), Vol. 64. 66--74.

[24]

Y. Peng, Peter A. Flach, C. Soares, and P. Brazdil. 2002. Improved Dataset Characterisation for Meta-learning. 141--152.

[25]

B. Perozzi, R. Al-Rfou, and S. Skiena. 2014. DeepWalk: Online Learning of Social Representations (KDD). 701--710.

[26]

F. Pinto, C. Soares, and J. Mendes-Moreira. 2016. Towards Automatic Generation of Metafeatures. In Pacific-Asia. 215--226.

[27]

M. D. Plummer. 2007. Graph factors and factorization: 1985--2003: A survey. Discrete Mathematics 307 (2007), 791 -- 821.

Digital Library

[28]

M. Reif, F. Shafait, M. Goldstein, T. Breuel, and A. Dengel. 2012. Automatic Classifier Selection for Non-Experts. Pattern Analysis and Applications 17 (2012), 83--96.

Digital Library

[29]

L. Rokach. 2016. Decision forest: Twenty years of research. Information Fusion 27 (2016), 111 -- 125.

Digital Library

[30]

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. 2015. LINE: Large-scale Information Network Embedding (WWW). 1067--1077.

Digital Library

[31]

C. Thornton, F. Hutter, Holger H. Hoos, and K. Leyton-Brown. 2013. Auto- WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms (KDD '13). 847--855.

[32]

R. Vainshtein, A. Greenstein-Messica, G. Katz, B. Shapira, and L. Rokach. 2018. A Hybrid Approach for Automatic Model Recommendation. ACM, 1623--1626.

[33]

J. Vanschoren. 2010. Understanding machine learning performance with experiment databases. lirias. kuleuven. be (2010).

[34]

Joaquin Vanschoren. 2018. Meta-Learning: A Survey. CoRR abs/1810.03548 (2018).

[35]

O. Nebil Yaveroglu. 2013. Graphlet correlations for network comparison and modelling : World Trade Network example.

[36]

Ö. Yaverolu, N.l Malod-Dognin, D. Davis, Z. Levnaji, V. Janjic, R. Karapandza, A.r Stojmirovic, and N. Przulj. 2014. In Scientific reports. 4547.

[37]

Z. Zhou and J. Feng. 2017. Deep forest: Towards an alternative to deep neural networks. (2017).

Cited By

de Hoog JAnwar AHellinckx PMercelis S(2024)Selection of image classifiers for noisy images through metalearningProceedings of the 2024 7th International Conference on Machine Vision and Applications10.1145/3653946.3653960(92-99)Online publication date: 12-Mar-2024
https://dl.acm.org/doi/10.1145/3653946.3653960
Mu TWang HTang HShao X(2024)ShrinkHPO: Towards Explainable Parallel Hyperparameter Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00371(4897-4910)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00371
Xu FChen JShi YRuan TWu QZhang X(2024)3D meta-classification: A meta-learning approach for selecting 3D point-cloud classification algorithmInformation Sciences10.1016/j.ins.2024.120272(120272)Online publication date: Feb-2024
https://doi.org/10.1016/j.ins.2024.120272
Show More Cited By

Index Terms

AutoGRD: Model Recommendation Through Graphical Dataset Representation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Learning dataset representation for automatic machine learning algorithm selection
Abstract
The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML ...
Graph Convolutional Network Based Generative Adversarial Networks for the Algorithm Selection Problem in Classification
CCRIS '20: Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System

In this work, we address the algorithm selection problem for classification via meta-learning and generative adversarial networks. We focus on the dataset representation question. The matrix representation of classification dataset is not sensitive to ...
A review on preprocessing algorithm selection with meta-learning
Abstract
Several AutoML tools aim to facilitate the usability of machine learning algorithms, automatically recommending algorithms using techniques such as meta-learning, grid search, and genetic programming. However, the preprocessing step is usually not ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Industry Paper
Best Paper

Author Tags

Qualifiers

Research-article

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
998
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)8

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

de Hoog JAnwar AHellinckx PMercelis S(2024)Selection of image classifiers for noisy images through metalearningProceedings of the 2024 7th International Conference on Machine Vision and Applications10.1145/3653946.3653960(92-99)Online publication date: 12-Mar-2024
https://dl.acm.org/doi/10.1145/3653946.3653960
Mu TWang HTang HShao X(2024)ShrinkHPO: Towards Explainable Parallel Hyperparameter Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00371(4897-4910)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00371
Xu FChen JShi YRuan TWu QZhang X(2024)3D meta-classification: A meta-learning approach for selecting 3D point-cloud classification algorithmInformation Sciences10.1016/j.ins.2024.120272(120272)Online publication date: Feb-2024
https://doi.org/10.1016/j.ins.2024.120272
Dagan IVainshtein RKatz GRokach L(2024)Automated algorithm selection using meta-learning and pre-trained deep convolution neural networksInformation Fusion10.1016/j.inffus.2023.102210105(102210)Online publication date: May-2024
https://doi.org/10.1016/j.inffus.2023.102210
Garouani MAhmad ABouneffa MHamlich M(2023)Autoencoder-kNN meta-model based data characterization approach for an automated selection of AI algorithmsJournal of Big Data10.1186/s40537-023-00687-710:1Online publication date: 3-Feb-2023
https://doi.org/10.1186/s40537-023-00687-7
Li ZQi BSun HGao XFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)AutoMRM: A Model Retrieval Method Based on Multimodal Query and Meta-learningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614787(1228-1237)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614787
Ding HZou PWang ZZhao JWang YZhou Q(2023)A ModelOps-Based Framework for Intelligent Medical Knowledge Extraction2023 IEEE International Conference on Medical Artificial Intelligence (MedAI)10.1109/MedAI59581.2023.00039(254-259)Online publication date: 18-Nov-2023
https://doi.org/10.1109/MedAI59581.2023.00039
Mu TWang HZheng SLiang ZWang CShao XLiang Z(2023)TSC-AutoML: Meta-learning for Automatic Time Series Classification Algorithm Selection2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00084(1032-1044)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00084
Shao XWang HZhu XXiong FMu TZhang Y(2023)EFFECT: Explainable framework for meta-learning in automatic classification algorithm selectionInformation Sciences10.1016/j.ins.2022.11.144622(211-234)Online publication date: Apr-2023
https://doi.org/10.1016/j.ins.2022.11.144
Santhiappan SShravan NRavindran B(2023)CIAMS: clustering indices-based automatic classification model selectionInternational Journal of Data Science and Analytics10.1007/s41060-023-00441-5Online publication date: 19-Aug-2023
https://doi.org/10.1007/s41060-023-00441-5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten