skip to main content
10.1145/3319619.3326899acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Towards better estimation of statistical significance when comparing evolutionary algorithms

Published: 13 July 2019 Publication History

Abstract

The use of well-established statistical testing procedures to compare the performance of evolutionary algorithms often yields pessimistic results. This requires increasing the number of independent samples, and thus the computation time, in order to get results with the necessary precision.
We aim at improving this situation by developing statistical tests that are good in answering typical questions coming from benchmarking of evolutionary algorithms. Our first step, presented in this paper, is a procedure that determines whether the performance distributions of two given algorithms are identical for each of the benchmarks. Our experimental study shows that this procedure is able to spot very small differences in the performance of algorithms while requiring computational budgets which are by an order of magnitude smaller (e.g. 15x) compared to the existing approaches.

References

[1]
2015. Bayesian statistics. Nature Methods 12 (2015), 377--378.
[2]
William Jay Conover. 1999. Practical Nonparametric Statistics (3rd ed.). Wiley.
[3]
Axel de Perthuis de Laillevault, Benjamin Doerr, and Carola Doerr. 2015. Money for Nothing: Speeding Up Evolutionary Algorithms Through Better Initialization. In Proceedings of Genetic and Evolutionary Computation Conference. 815--822.
[4]
Joaquin Derrac, Salvador Garcia, Daniel Molina, and Francisco Herrera. 2011. A Practical Tutorial on the Use of Nonparametric Statistical Tests as a Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms. Swarm and Evolutionary Computation 1, 1 (2011), 3--18.
[5]
Benjamin Doerr and Carola Doerr. 2016. The Impact of Random Initialization on the Runtime of Randomized Search Heuristics. Algorithmica 75, 3 (2016), 529--553.
[6]
Olive Jean Dunn. 1961. Multiple Comparisons Among Means. J. Amer. Statist. Assoc. 56, 293 (1961), 52--64.
[7]
Milton Friedman. 1940. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11, 1 (1940), 86--92.
[8]
Yosef Hochberg. 1988. A Sharper Bonferroni Procedure for Multiple Tests of Significance. Biometrika 75, 4 (1988), 800--802.
[9]
Myles Hollander, Douglas A. Wolfe, and Eric Chicken. 2007. Nonparametric Statistical Methods (3rd ed.). Wiley.
[10]
Andrey Kolmogorov. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari 4 (1933), 83--91.
[11]
William H. Kruskal and W. Allen Wallis. 1952. Use of ranks in one-criterion variance analysis. J. Amer. Statist. Assoc. 47 (1952), 583--621.
[12]
Henry B. Mann and Donald R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Annals of Mathematical Statistics 18, 1 (1947), 50--60.
[13]
R Core Team. 2013. R: A Language and Environment for Statistical Computing. http://www.R-project.org/. http://www.R-project.org/
[14]
John A. Rice. 2007. Mathematical Statistics and Data Analysis (3rd ed.). Cengage Learning.
[15]
Nikolai Smirnov. 1948. Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19, 2 (1948), 279--281.
[16]
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1, 6 (1945), 80--83.

Cited By

View all
  • (2021)Reproducibility in Evolutionary ComputationACM Transactions on Evolutionary Learning and Optimization10.1145/34666241:4(1-21)Online publication date: 13-Oct-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2019
2161 pages
ISBN:9781450367486
DOI:10.1145/3319619
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multiple comparisons
  2. statistical significance

Qualifiers

  • Research-article

Funding Sources

Conference

GECCO '19
Sponsor:
GECCO '19: Genetic and Evolutionary Computation Conference
July 13 - 17, 2019
Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Reproducibility in Evolutionary ComputationACM Transactions on Evolutionary Learning and Optimization10.1145/34666241:4(1-21)Online publication date: 13-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media