research-article

Towards better estimation of statistical significance when comparing evolutionary algorithms

Author:

Maxim BuzdalovAuthors Info & Claims

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1782 - 1788

https://doi.org/10.1145/3319619.3326899

Published: 13 July 2019 Publication History

Abstract

The use of well-established statistical testing procedures to compare the performance of evolutionary algorithms often yields pessimistic results. This requires increasing the number of independent samples, and thus the computation time, in order to get results with the necessary precision.

We aim at improving this situation by developing statistical tests that are good in answering typical questions coming from benchmarking of evolutionary algorithms. Our first step, presented in this paper, is a procedure that determines whether the performance distributions of two given algorithms are identical for each of the benchmarks. Our experimental study shows that this procedure is able to spot very small differences in the performance of algorithms while requiring computational budgets which are by an order of magnitude smaller (e.g. 15x) compared to the existing approaches.

References

[1]

2015. Bayesian statistics. Nature Methods 12 (2015), 377--378.

[2]

William Jay Conover. 1999. Practical Nonparametric Statistics (3rd ed.). Wiley.

[3]

Axel de Perthuis de Laillevault, Benjamin Doerr, and Carola Doerr. 2015. Money for Nothing: Speeding Up Evolutionary Algorithms Through Better Initialization. In Proceedings of Genetic and Evolutionary Computation Conference. 815--822.

Digital Library

[4]

Joaquin Derrac, Salvador Garcia, Daniel Molina, and Francisco Herrera. 2011. A Practical Tutorial on the Use of Nonparametric Statistical Tests as a Methodology for Comparing Evolutionary and Swarm Intelligence Algorithms. Swarm and Evolutionary Computation 1, 1 (2011), 3--18.

[5]

Benjamin Doerr and Carola Doerr. 2016. The Impact of Random Initialization on the Runtime of Randomized Search Heuristics. Algorithmica 75, 3 (2016), 529--553.

Digital Library

[6]

Olive Jean Dunn. 1961. Multiple Comparisons Among Means. J. Amer. Statist. Assoc. 56, 293 (1961), 52--64.

[7]

Milton Friedman. 1940. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11, 1 (1940), 86--92.

[8]

Yosef Hochberg. 1988. A Sharper Bonferroni Procedure for Multiple Tests of Significance. Biometrika 75, 4 (1988), 800--802.

[9]

Myles Hollander, Douglas A. Wolfe, and Eric Chicken. 2007. Nonparametric Statistical Methods (3rd ed.). Wiley.

[10]

Andrey Kolmogorov. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari 4 (1933), 83--91.

[11]

William H. Kruskal and W. Allen Wallis. 1952. Use of ranks in one-criterion variance analysis. J. Amer. Statist. Assoc. 47 (1952), 583--621.

[12]

Henry B. Mann and Donald R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Annals of Mathematical Statistics 18, 1 (1947), 50--60.

[13]

R Core Team. 2013. R: A Language and Environment for Statistical Computing. http://www.R-project.org/. http://www.R-project.org/

[14]

John A. Rice. 2007. Mathematical Statistics and Data Analysis (3rd ed.). Cengage Learning.

[15]

Nikolai Smirnov. 1948. Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19, 2 (1948), 279--281.

[16]

Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1, 6 (1945), 80--83.

Cited By

López-ibáñez MBranke JPaquete L(2021)Reproducibility in Evolutionary ComputationACM Transactions on Evolutionary Learning and Optimization10.1145/34666241:4(1-21)Online publication date: 13-Oct-2021
https://dl.acm.org/doi/10.1145/3466624

Index Terms

Towards better estimation of statistical significance when comparing evolutionary algorithms
1. Mathematics of computing
  1. Mathematical software
    1. Statistical software
  2. Probability and statistics
    1. Nonparametric statistics
    2. Probabilistic inference problems
      1. Hypothesis testing and confidence interval computation

Recommendations

A comparison of statistical significance tests for information retrieval evaluation
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's ...
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-On Tutorial
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

This hands-on half-day tutorial consists of two sessions. Part~I covers the following topics: Preliminaries; Paired and two-sample t-tests, confidence intervals; One-way ANOVA and two-way ANOVA without replication; Familiwise error rate. Part~II covers ...
Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to recent surveys on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion

July 2019

2161 pages

ISBN:9781450367486

DOI:10.1145/3319619

Editor:
Manuel López-Ibáñez
University of Manchester, UK
,
General Chairs:
Anne Auger
Inria and Ecole Polytechnique, France
,
Thomas Stützle
IRIDIA, Université libre de Bruxelles Belgium

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Russian Science Foundation

Conference

GECCO '19

Sponsor:

SIGEVO

GECCO '19: Genetic and Evolutionary Computation Conference

July 13 - 17, 2019

Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
66
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

López-ibáñez MBranke JPaquete L(2021)Reproducibility in Evolutionary ComputationACM Transactions on Evolutionary Learning and Optimization10.1145/34666241:4(1-21)Online publication date: 13-Oct-2021
https://dl.acm.org/doi/10.1145/3466624

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten