skip to main content
10.1145/3357384.3357896acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

AutoGRD: Model Recommendation Through Graphical Dataset Representation

Published: 03 November 2019 Publication History

Abstract

The widespread use of machine learning algorithms and the high level of expertise required to utilize them have fuelled the demand for solutions that can be used by non-experts. One of the main challenges non-experts face in applying machine learning to new problems is algorithm selection - the identification of the algorithm(s) that will deliver top performance for a given dataset, task, and evaluation measure. We present AutoGRD, a novel meta-learning approach for algorithm recommendation. AutoGRD first represents datasets as graphs and then extracts their latent representation that is used to train a ranking meta-model capable of accurately recommending top-performing algorithms for previously unseen datasets. We evaluate our approach on 250 datasets and demonstrate its effectiveness both for classification and regression tasks. AutoGRD outperforms state-of-the-art meta-learning and Bayesian methods.

References

[1]
H. Bensusan and C. Giraud-Carrier. 2000. Discovering Task Neighbourhoods through Landmark Learning Performances (PKDD '00). 325--330.
[2]
P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. 2008. Metalearning: Applications to Data Mining. Springer Publishing Company, Incorporated.
[3]
Pavel B. Brazdil, C. Soares, and Joaquim Pinto da Costa. 2003. Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. Machine Learning 50, 3 (2003), 251--277.
[4]
Z. Cao, T. Qin, Tie-Yan Liu, Ming-Feng Tsai, and H. Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach (ICML '07). 129--136.
[5]
S. Cavallari, VincentW. Zheng, H. Cai, Kevin Chen-Chuan Chang, and E. Cambria. 2017. Learning Community Embedding with Community Detection and Node Embedding on Graphs (CIKM '17). 377--386.
[6]
T. Chen and C. Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. CoRR abs/1603.02754 (2016).
[7]
Silvia N. das Dôres, L. Alves, Duncan D. Ruiz, and Rodrigo C. Barros. 2016. A Meta-learning Framework for Algorithm Recommendation in Software Fault Prediction (SAC '16). 1486--1491.
[8]
J. Dean and S. Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 107--113.
[9]
J. Demar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7 (2006), 1--30.
[10]
I. Drori, Y. Krishnamurthy, R. Rampin, R. Lourenço, J. Ono, K. Cho, C. Silva, and J. Freire. 2018. AlphaD3M: Machine Learning Pipeline Synthesis (AutoML Workshop at ICML).
[11]
J. Feng and Z. Zhou. 2018. Autoencoder by forest. In AAAI Conference on Artificial Intelligence.
[12]
M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15 (2014), 3133--3181.
[13]
M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and Robust Automated Machine Learning (NIPS'15). 2755--2763.
[14]
M. Feurer, J. T. Springenberg, and F. Hutter. 2015. Initializing Bayesian Hyperparameter Optimization via Meta-learning (AAAI'15). 1128--1135.
[15]
P. Goyal and E. Ferrara. 2018. Graph Embedding Techniques, Applications, and Performance: A Survey. Knowl. -Based Syst. 151 (2018), 78--94.
[16]
A. Grover and J. Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks (KDD '16). 855--864.
[17]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.r Reutemann, and I. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD 11 (2009), 10--18.
[18]
F. Hutter, Holger H. Hoos, and K. Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration (LION'05). 507--523.
[19]
G. Katz, E. C. R. Shin, and D. Song. 2016. ExploreKit: Automatic Feature Generation and Selection. In ICDM.
[20]
C. Lemke, M. Budka, and B. Gabrys. 2015. Metalearning: a survey of trends and technologies. Artificial Intelligence Review 44 (2015), 117--130.
[21]
L. Li, Kevin G. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. 2017. Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits. In 5th International Conference on Learning Representations.
[22]
M. A. Muñoz, Y. Sun, M. Kirley, and S. K. Halgamuge. 2015. Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges. Information Sciences 317 (2015), 224 -- 245.
[23]
R. S. Olson and J. H. Moore. 2016. TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning (Proceedings of Machine Learning Research), Vol. 64. 66--74.
[24]
Y. Peng, Peter A. Flach, C. Soares, and P. Brazdil. 2002. Improved Dataset Characterisation for Meta-learning. 141--152.
[25]
B. Perozzi, R. Al-Rfou, and S. Skiena. 2014. DeepWalk: Online Learning of Social Representations (KDD). 701--710.
[26]
F. Pinto, C. Soares, and J. Mendes-Moreira. 2016. Towards Automatic Generation of Metafeatures. In Pacific-Asia. 215--226.
[27]
M. D. Plummer. 2007. Graph factors and factorization: 1985--2003: A survey. Discrete Mathematics 307 (2007), 791 -- 821.
[28]
M. Reif, F. Shafait, M. Goldstein, T. Breuel, and A. Dengel. 2012. Automatic Classifier Selection for Non-Experts. Pattern Analysis and Applications 17 (2012), 83--96.
[29]
L. Rokach. 2016. Decision forest: Twenty years of research. Information Fusion 27 (2016), 111 -- 125.
[30]
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. 2015. LINE: Large-scale Information Network Embedding (WWW). 1067--1077.
[31]
C. Thornton, F. Hutter, Holger H. Hoos, and K. Leyton-Brown. 2013. Auto- WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms (KDD '13). 847--855.
[32]
R. Vainshtein, A. Greenstein-Messica, G. Katz, B. Shapira, and L. Rokach. 2018. A Hybrid Approach for Automatic Model Recommendation. ACM, 1623--1626.
[33]
J. Vanschoren. 2010. Understanding machine learning performance with experiment databases. lirias. kuleuven. be (2010).
[34]
Joaquin Vanschoren. 2018. Meta-Learning: A Survey. CoRR abs/1810.03548 (2018).
[35]
O. Nebil Yaveroglu. 2013. Graphlet correlations for network comparison and modelling : World Trade Network example.
[36]
Ö. Yaverolu, N.l Malod-Dognin, D. Davis, Z. Levnaji, V. Janjic, R. Karapandza, A.r Stojmirovic, and N. Przulj. 2014. In Scientific reports. 4547.
[37]
Z. Zhou and J. Feng. 2017. Deep forest: Towards an alternative to deep neural networks. (2017).

Cited By

View all
  • (2024)Selection of image classifiers for noisy images through metalearningProceedings of the 2024 7th International Conference on Machine Vision and Applications10.1145/3653946.3653960(92-99)Online publication date: 12-Mar-2024
  • (2024)ShrinkHPO: Towards Explainable Parallel Hyperparameter Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00371(4897-4910)Online publication date: 13-May-2024
  • (2024)3D meta-classification: A meta-learning approach for selecting 3D point-cloud classification algorithmInformation Sciences10.1016/j.ins.2024.120272(120272)Online publication date: Feb-2024
  • Show More Cited By

Index Terms

  1. AutoGRD: Model Recommendation Through Graphical Dataset Representation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
    November 2019
    3373 pages
    ISBN:9781450369763
    DOI:10.1145/3357384
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Industry Paper
    • Best Paper

    Author Tags

    1. algorithm selection
    2. automl
    3. classification
    4. dataset representation
    5. graph embedding
    6. meta-learning
    7. regression

    Qualifiers

    • Research-article

    Conference

    CIKM '19
    Sponsor:

    Acceptance Rates

    CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Selection of image classifiers for noisy images through metalearningProceedings of the 2024 7th International Conference on Machine Vision and Applications10.1145/3653946.3653960(92-99)Online publication date: 12-Mar-2024
    • (2024)ShrinkHPO: Towards Explainable Parallel Hyperparameter Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00371(4897-4910)Online publication date: 13-May-2024
    • (2024)3D meta-classification: A meta-learning approach for selecting 3D point-cloud classification algorithmInformation Sciences10.1016/j.ins.2024.120272(120272)Online publication date: Feb-2024
    • (2024)Automated algorithm selection using meta-learning and pre-trained deep convolution neural networksInformation Fusion10.1016/j.inffus.2023.102210105(102210)Online publication date: May-2024
    • (2023)Autoencoder-kNN meta-model based data characterization approach for an automated selection of AI algorithmsJournal of Big Data10.1186/s40537-023-00687-710:1Online publication date: 3-Feb-2023
    • (2023)AutoMRM: A Model Retrieval Method Based on Multimodal Query and Meta-learningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614787(1228-1237)Online publication date: 21-Oct-2023
    • (2023)A ModelOps-Based Framework for Intelligent Medical Knowledge Extraction2023 IEEE International Conference on Medical Artificial Intelligence (MedAI)10.1109/MedAI59581.2023.00039(254-259)Online publication date: 18-Nov-2023
    • (2023)TSC-AutoML: Meta-learning for Automatic Time Series Classification Algorithm Selection2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00084(1032-1044)Online publication date: Apr-2023
    • (2023)EFFECT: Explainable framework for meta-learning in automatic classification algorithm selectionInformation Sciences10.1016/j.ins.2022.11.144622(211-234)Online publication date: Apr-2023
    • (2023)CIAMS: clustering indices-based automatic classification model selectionInternational Journal of Data Science and Analytics10.1007/s41060-023-00441-5Online publication date: 19-Aug-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media