Skip to main content

Unified Retrieval for Streamlining Biomedical Image Dataset Aggregation and Standardization

  • Conference paper
  • First Online:
Bildverarbeitung für die Medizin 2024 (BVM 2024)

Part of the book series: Informatik aktuell ((INFORMAT))

Included in the following conference series:

  • 298 Accesses

Abstract

Advancements in computational power and algorithmic refinements have significantly amplified the impact and applicability of machine learning (ML), particularly in medical imaging. While ML in general thrives on extensive datasets to develop accurate, robust, and unbiased models, medical imaging faces unique challenges, including a scarcity of samples and a predominance of poorly annotated, heterogeneous datasets. This heterogeneity manifests in varied acquisition conditions, target populations, data formats and structures. Data acquisition of large datasets is often additionally hampered by compatibility issues of source specific downloading tools with high-performance computing (HPC) environments. To address these challenges, we introduce the unified retrieval tool (URT), a tool that unifies the acquisition and standardization of diverse medical imaging datasets to the brain imaging data structure (BIDS). Currently, downloads from the cancer imaging archive (TCIA), OpenNeuro and Synapse are supported, easing access to large-scale medical data. URT’s modularity allows the straightforward extension to other sources. Moreover, URT’s compatibility with Docker and Singularity enables reproducible research and easy application on HPCs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901.

    Google Scholar 

  2. Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. 2020.

  3. Tan M, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks. Proc PMLR. PMLR. 2019:6105–14.

    Google Scholar 

  4. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T et al. An image is worth 16x16 Words: transformers for image recognition at scale. Proc ICLR. 2021.

    Google Scholar 

  5. Xie Z, Zhang Z, Cao Y, Lin Y,Wei Y, Dai Qet al. On data scaling in masked image modeling. Proc IEEE CVPR. 2023:10365–74.

    Google Scholar 

  6. Zhai X, Kolesnikov A, Houlsby N, Beyer L. Scaling vision transformers. Proc IEEE CVPR. 2022:12104–13.

    Google Scholar 

  7. Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S, Duff EP et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016;3(1):160044.

    Google Scholar 

  8. Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit Med. 2022;5(1):48.

    Google Scholar 

  9. Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H et al. Preparing medical imaging data for machine learning. Radiol. 2020;295(1):4–15.

    Google Scholar 

  10. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Proc IEEE ICCV. Ieee. 2009:248–55.

    Google Scholar 

  11. Schilling MP, Ahuja N, Rettenberger L, Scherr T, Reischl M. Impact of annotation noise on histopathology nucleus segmentation. Cur Direct Biomed Eng. Vol. 8. (2). De Gruyter. 2022:197–200.

    Google Scholar 

  12. Gavrielides MA, Kinnard LM, Myers KJ, Peregoy J, Pritchard WF, Zeng R et al. A resource for the assessment of lung nodule size estimation methods: database of thoracic CT scans of an anthropomorphic phantom. Opt Express. 2010;18(14):15244–55.

    Google Scholar 

  13. Garcia Santa Cruz B, Bossa MN, Sölter J, Husch AD. Public Covid-19 X-ray datasets and their impact on model bias: a systematic review of a significant problem. Med Image Anal. 2021;74:102225.

    Google Scholar 

  14. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26:1045–57.

    Google Scholar 

  15. Markiewicz CJ, Gorgolewski KJ, Feingold F, Blair R, Halchenko YO, Miller E et al. The OpenNeuro resource for sharing of neuroscience data. eLife. 2021;10:e71774.

    Google Scholar 

  16. Zwiers MP, Moia S, Oostenveld R. BIDScoin: a user-friendly application to convert source data to brain imaging data structure. Front Neuroinform. 2022;15:65.

    Google Scholar 

  17. Varrette S, Bouvry P, Cartiaux H, Georgatos F. Management of an academic HPC cluster: the UL experience. Proc IEEE HPCS. 2014:959–67.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Husch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Der/die Autor(en), exklusiv lizenziert an Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maser, R., Andaloussi, M.A., Lamoline, F., Husch, A. (2024). Unified Retrieval for Streamlining Biomedical Image Dataset Aggregation and Standardization. In: Maier, A., Deserno, T.M., Handels, H., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2024. BVM 2024. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-44037-4_83

Download citation

Publish with us

Policies and ethics