Skip to main content

A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search Behaviour

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2024)

Abstract

In this study, the challenge of developing a dissimilarity metric for machine learning pipeline optimization is addressed. Traditional approaches, limited by simplified operator sets and pipeline structures, fail to address the full complexity of this task. Two novel metrics are proposed for measuring structural, and hyperparameter, dissimilarity in the decision space. A hierarchical approach is employed to integrate these metrics, prioritizing structural over hyperparameter differences. The Tree-based Pipeline Optimization Tool (TPOT) is utilized as the primary automated machine learning framework, applied on the abalone dataset. Novel visual representations of TPOT’s search dynamics are also proposed, providing some deeper insights into its behaviour and evolutionary trajectories, under different search conditions. The effects of altering the population selection mechanism and reducing population size are explored, highlighting the enhanced understanding these methods provide in automated machine learning pipeline optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)

    Google Scholar 

  2. De Rainville, F.M., Fortin, F.A., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: a Python framework for evolutionary algorithms. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 85–92 (2012)

    Google Scholar 

  3. Garciarena, U., Santana, R., Mendiburu, A.: Analysis of the complexity of the automatic pipeline generation problem. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2018)

    Google Scholar 

  4. Gijsbers, P., et al.: AMLB: an AutoML benchmark. arXiv preprint arXiv:2207.12560 (2022)

  5. Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction, vol. 2. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

  6. Hutter, F., Kotthoff, L., Vanschoren, J.: Automated Machine Learning. Methods, Systems, Challenges. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5

  7. Kenny, A., Ray, T., Limmer, S., Singh, H.K., Rodemann, T., Olhofer, M.: Hybridizing TPOT with Bayesian optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 502–510 (2023)

    Google Scholar 

  8. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)

    Article  MathSciNet  Google Scholar 

  9. Müller, A.C., Guido, S.: Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media, Inc. (2016)

    Google Scholar 

  10. Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: 2016 Proceedings of the Genetic and Evolutionary Computation Conference, pp. 485–492 (2016)

    Google Scholar 

  11. Pimenta, C.G., de Sá, A.G.C., Ochoa, G., Pappa, G.L.: Fitness landscape analysis of automated machine learning search spaces. In: Paquete, L., Zarges, C. (eds.) EvoCOP 2020. LNCS, vol. 12102, pp. 114–130. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43680-3_8

    Chapter  Google Scholar 

  12. Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. Lulu Enterprises, UK Ltd. (2008)

    Google Scholar 

  13. Pushak, Y., Hoos, H.: AutoML loss landscapes. ACM Trans. Evol. Learn. 2(3), 1–30 (2022)

    Article  Google Scholar 

  14. Selkow, S.M.: The tree-to-tree editing problem. Inf. Process. Lett. 6(6), 184–186 (1977)

    Article  MathSciNet  Google Scholar 

  15. Teixeira, M.C., Pappa, G.L.: Understanding AutoML search spaces with local optima networks. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 449–457 (2022)

    Google Scholar 

  16. Teixeira, M.C., Pappa, G.L.: On the effect of solution representation and neighborhood definition in AutoML fitness landscapes. In: Pérez Cáceres, L., Stützle, T. (eds.) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2023. LNCS, vol. 13987, pp. 227–243. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30035-6_15

  17. Teixeira, M.C., Pappa, G.L.: Fitness landscape analysis of TPOT using local optima network. In: Naldi, M.C., Bianchi, R.A.C. (eds.) Intelligent Systems, BRACIS 2023. LNCS, vol. 14197, pp. 65–79. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-45392-2_5

  18. Tong, H., Minku, L.L., Menzel, S., Sendhoff, B., Yao, X.: What makes the dynamic capacitated arc routing problem hard to solve: insights from fitness landscape analysis. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 305–313 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angus Kenny .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 66274 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kenny, A., Ray, T., Limmer, S., Singh, H.K., Rodemann, T., Olhofer, M. (2024). A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search Behaviour. In: Smith, S., Correia, J., Cintrano, C. (eds) Applications of Evolutionary Computation. EvoApplications 2024. Lecture Notes in Computer Science, vol 14635. Springer, Cham. https://doi.org/10.1007/978-3-031-56855-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56855-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56854-1

  • Online ISBN: 978-3-031-56855-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics