Abstract
While clinical trials are the state-of-the-art methods to assess the effect of new medication in a comparative manner, benchmarking in the field of medical image analysis is performed by so-called challenges. Recently, comprehensive analysis of multiple biomedical image analysis challenges revealed large discrepancies between the impact of challenges and quality control of the design and reporting standard. This work aims to follow up on these results and attempts to address the specific question of the reproducibility of the participants methods. In an effort to determine whether alternative interpretations of the method description may change the challenge ranking, we reproduced the algorithms submitted to the 2019 Robust medical image segmentation challenge (ROBUST-MIS). The leaderboard differed substantially between the original challenge and reimplementation, indicating that challenge rankings may not be sufficiently reproducible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Siddique N et al. U-net and its variants for medical image segmentation: a review of theory and applications. IEEE Access. 2021.
Maier-Hein L et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat Commun. 2018;9(1):1–13.
Ross T et al. Comparative validation of multi-instance instrument segmentation in endoscopy: results of the ROBUST-MIS 2019 challenge. Med Image Anal. 2021;70:101920.
Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297–302.
WiesenfarthMet al. Methods and open-source toolkit for analyzing and visualizing challenge results. Sci Rep. 2021;11(1):1–15.
Kendall MG. A new measure of rank correlation. Biometrika. 1938;30(1/2):81–93.
Pham HV et al. Problems and opportunities in training deep learning software systems: an analysis of variance. Proc IEEE/ACM Int Conf Autom Softw EngWorkshops. 2020:771–83.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Der/die Autor(en), exklusiv lizenziert an Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature
About this paper
Cite this paper
Reinke, A., Grab, G., Maier-Hein, L. (2023). Challenge Results are not Reproducible. In: Deserno, T.M., Handels, H., Maier, A., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2023. BVM 2023. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-41657-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-658-41657-7_43
Published:
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-41656-0
Online ISBN: 978-3-658-41657-7
eBook Packages: Computer Science and Engineering (German Language)