Skip to main content

Challenge Results are not Reproducible

  • Conference paper
  • First Online:
Bildverarbeitung für die Medizin 2023 (BVM 2023)

Part of the book series: Informatik aktuell ((INFORMAT))

Included in the following conference series:

  • 839 Accesses

Abstract

While clinical trials are the state-of-the-art methods to assess the effect of new medication in a comparative manner, benchmarking in the field of medical image analysis is performed by so-called challenges. Recently, comprehensive analysis of multiple biomedical image analysis challenges revealed large discrepancies between the impact of challenges and quality control of the design and reporting standard. This work aims to follow up on these results and attempts to address the specific question of the reproducibility of the participants methods. In an effort to determine whether alternative interpretations of the method description may change the challenge ranking, we reproduced the algorithms submitted to the 2019 Robust medical image segmentation challenge (ROBUST-MIS). The leaderboard differed substantially between the original challenge and reimplementation, indicating that challenge rankings may not be sufficiently reproducible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Siddique N et al. U-net and its variants for medical image segmentation: a review of theory and applications. IEEE Access. 2021.

    Google Scholar 

  2. Maier-Hein L et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat Commun. 2018;9(1):1–13.

    Google Scholar 

  3. Ross T et al. Comparative validation of multi-instance instrument segmentation in endoscopy: results of the ROBUST-MIS 2019 challenge. Med Image Anal. 2021;70:101920.

    Google Scholar 

  4. Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297–302.

    Google Scholar 

  5. WiesenfarthMet al. Methods and open-source toolkit for analyzing and visualizing challenge results. Sci Rep. 2021;11(1):1–15.

    Google Scholar 

  6. Kendall MG. A new measure of rank correlation. Biometrika. 1938;30(1/2):81–93.

    Google Scholar 

  7. Pham HV et al. Problems and opportunities in training deep learning software systems: an analysis of variance. Proc IEEE/ACM Int Conf Autom Softw EngWorkshops. 2020:771–83.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annika Reinke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Der/die Autor(en), exklusiv lizenziert an Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reinke, A., Grab, G., Maier-Hein, L. (2023). Challenge Results are not Reproducible. In: Deserno, T.M., Handels, H., Maier, A., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2023. BVM 2023. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-41657-7_43

Download citation

Publish with us

Policies and ethics