Unified Retrieval for Streamlining Biomedical Image Dataset Aggregation and Standardization

Maser, Raphael; Andaloussi, Meryem Abbad; Lamoline, François; Husch, Andreas

doi:10.1007/978-3-658-44037-4_83

Raphael Maser⁸,
Meryem Abbad Andaloussi⁸,
François Lamoline⁸ &
…
Andreas Husch⁸

Part of the book series: Informatik aktuell ((INFORMAT))

Included in the following conference series:

BVM Workshop

298 Accesses

Abstract

Advancements in computational power and algorithmic refinements have significantly amplified the impact and applicability of machine learning (ML), particularly in medical imaging. While ML in general thrives on extensive datasets to develop accurate, robust, and unbiased models, medical imaging faces unique challenges, including a scarcity of samples and a predominance of poorly annotated, heterogeneous datasets. This heterogeneity manifests in varied acquisition conditions, target populations, data formats and structures. Data acquisition of large datasets is often additionally hampered by compatibility issues of source specific downloading tools with high-performance computing (HPC) environments. To address these challenges, we introduce the unified retrieval tool (URT), a tool that unifies the acquisition and standardization of diverse medical imaging datasets to the brain imaging data structure (BIDS). Currently, downloads from the cancer imaging archive (TCIA), OpenNeuro and Synapse are supported, easing access to large-scale medical data. URT’s modularity allows the straightforward extension to other sources. Moreover, URT’s compatibility with Docker and Singularity enables reproducible research and easy application on HPCs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901.
Google Scholar
Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. 2020.
Tan M, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks. Proc PMLR. PMLR. 2019:6105–14.
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T et al. An image is worth 16x16 Words: transformers for image recognition at scale. Proc ICLR. 2021.
Google Scholar
Xie Z, Zhang Z, Cao Y, Lin Y,Wei Y, Dai Qet al. On data scaling in masked image modeling. Proc IEEE CVPR. 2023:10365–74.
Google Scholar
Zhai X, Kolesnikov A, Houlsby N, Beyer L. Scaling vision transformers. Proc IEEE CVPR. 2022:12104–13.
Google Scholar
Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S, Duff EP et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016;3(1):160044.
Google Scholar
Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit Med. 2022;5(1):48.
Google Scholar
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H et al. Preparing medical imaging data for machine learning. Radiol. 2020;295(1):4–15.
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Proc IEEE ICCV. Ieee. 2009:248–55.
Google Scholar
Schilling MP, Ahuja N, Rettenberger L, Scherr T, Reischl M. Impact of annotation noise on histopathology nucleus segmentation. Cur Direct Biomed Eng. Vol. 8. (2). De Gruyter. 2022:197–200.
Google Scholar
Gavrielides MA, Kinnard LM, Myers KJ, Peregoy J, Pritchard WF, Zeng R et al. A resource for the assessment of lung nodule size estimation methods: database of thoracic CT scans of an anthropomorphic phantom. Opt Express. 2010;18(14):15244–55.
Google Scholar
Garcia Santa Cruz B, Bossa MN, Sölter J, Husch AD. Public Covid-19 X-ray datasets and their impact on model bias: a systematic review of a significant problem. Med Image Anal. 2021;74:102225.
Google Scholar
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26:1045–57.
Google Scholar
Markiewicz CJ, Gorgolewski KJ, Feingold F, Blair R, Halchenko YO, Miller E et al. The OpenNeuro resource for sharing of neuroscience data. eLife. 2021;10:e71774.
Google Scholar
Zwiers MP, Moia S, Oostenveld R. BIDScoin: a user-friendly application to convert source data to brain imaging data structure. Front Neuroinform. 2022;15:65.
Google Scholar
Varrette S, Bouvry P, Cartiaux H, Georgatos F. Management of an academic HPC cluster: the UL experience. Proc IEEE HPCS. 2014:959–67.
Google Scholar

Download references

Author information

Authors and Affiliations

Esch-Belval Belvaux, Luxembourg
Raphael Maser, Meryem Abbad Andaloussi, François Lamoline & Andreas Husch

Authors

Raphael Maser
View author publications
You can also search for this author in PubMed Google Scholar
Meryem Abbad Andaloussi
View author publications
You can also search for this author in PubMed Google Scholar
François Lamoline
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Husch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Husch .

Editor information

Editors and Affiliations

Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Bayern, Deutschland
Andreas Maier
Peter L. Reichertz Institut für Medizinische Informatik, Technische Universität Braunschweig, Braunschweig, Niedersachsen, Deutschland
Thomas M. Deserno
Institut für Medizinische Informatik, Universität zu Lübeck, Lübeck, Schleswig-Holstein, Deutschland
Heinz Handels
Medical Image Computing, E230, Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Baden-Württemberg, Deutschland
Klaus Maier-Hein
Informatik und Mathematik, OTH Regensburg, Regensburg, Deutschland
Christoph Palm
Institut für Medizinische Informatik, Charité - Universitätsmedizin Berlin, Berlin, Berlin, Deutschland
Thomas Tolxdorff

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maser, R., Andaloussi, M.A., Lamoline, F., Husch, A. (2024). Unified Retrieval for Streamlining Biomedical Image Dataset Aggregation and Standardization. In: Maier, A., Deserno, T.M., Handels, H., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2024. BVM 2024. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-44037-4_83

Download citation

DOI: https://doi.org/10.1007/978-3-658-44037-4_83
Published: 20 February 2024
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-44036-7
Online ISBN: 978-3-658-44037-4
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics