Abstract
Advancements in computational power and algorithmic refinements have significantly amplified the impact and applicability of machine learning (ML), particularly in medical imaging. While ML in general thrives on extensive datasets to develop accurate, robust, and unbiased models, medical imaging faces unique challenges, including a scarcity of samples and a predominance of poorly annotated, heterogeneous datasets. This heterogeneity manifests in varied acquisition conditions, target populations, data formats and structures. Data acquisition of large datasets is often additionally hampered by compatibility issues of source specific downloading tools with high-performance computing (HPC) environments. To address these challenges, we introduce the unified retrieval tool (URT), a tool that unifies the acquisition and standardization of diverse medical imaging datasets to the brain imaging data structure (BIDS). Currently, downloads from the cancer imaging archive (TCIA), OpenNeuro and Synapse are supported, easing access to large-scale medical data. URT’s modularity allows the straightforward extension to other sources. Moreover, URT’s compatibility with Docker and Singularity enables reproducible research and easy application on HPCs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901.
Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. 2020.
Tan M, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks. Proc PMLR. PMLR. 2019:6105–14.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T et al. An image is worth 16x16 Words: transformers for image recognition at scale. Proc ICLR. 2021.
Xie Z, Zhang Z, Cao Y, Lin Y,Wei Y, Dai Qet al. On data scaling in masked image modeling. Proc IEEE CVPR. 2023:10365–74.
Zhai X, Kolesnikov A, Houlsby N, Beyer L. Scaling vision transformers. Proc IEEE CVPR. 2022:12104–13.
Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S, Duff EP et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016;3(1):160044.
Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit Med. 2022;5(1):48.
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H et al. Preparing medical imaging data for machine learning. Radiol. 2020;295(1):4–15.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Proc IEEE ICCV. Ieee. 2009:248–55.
Schilling MP, Ahuja N, Rettenberger L, Scherr T, Reischl M. Impact of annotation noise on histopathology nucleus segmentation. Cur Direct Biomed Eng. Vol. 8. (2). De Gruyter. 2022:197–200.
Gavrielides MA, Kinnard LM, Myers KJ, Peregoy J, Pritchard WF, Zeng R et al. A resource for the assessment of lung nodule size estimation methods: database of thoracic CT scans of an anthropomorphic phantom. Opt Express. 2010;18(14):15244–55.
Garcia Santa Cruz B, Bossa MN, Sölter J, Husch AD. Public Covid-19 X-ray datasets and their impact on model bias: a systematic review of a significant problem. Med Image Anal. 2021;74:102225.
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26:1045–57.
Markiewicz CJ, Gorgolewski KJ, Feingold F, Blair R, Halchenko YO, Miller E et al. The OpenNeuro resource for sharing of neuroscience data. eLife. 2021;10:e71774.
Zwiers MP, Moia S, Oostenveld R. BIDScoin: a user-friendly application to convert source data to brain imaging data structure. Front Neuroinform. 2022;15:65.
Varrette S, Bouvry P, Cartiaux H, Georgatos F. Management of an academic HPC cluster: the UL experience. Proc IEEE HPCS. 2014:959–67.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 Der/die Autor(en), exklusiv lizenziert an Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature
About this paper
Cite this paper
Maser, R., Andaloussi, M.A., Lamoline, F., Husch, A. (2024). Unified Retrieval for Streamlining Biomedical Image Dataset Aggregation and Standardization. In: Maier, A., Deserno, T.M., Handels, H., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2024. BVM 2024. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-44037-4_83
Download citation
DOI: https://doi.org/10.1007/978-3-658-44037-4_83
Published:
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-44036-7
Online ISBN: 978-3-658-44037-4
eBook Packages: Computer Science and Engineering (German Language)