Presentation + Paper
4 April 2022 Sequestration of imaging studies in MIDRC: a multi-institutional data commons
Author Affiliations +
Abstract
The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered database for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated meta-data will become part of the open repository and 20% will be sequestered and kept separate from the open commons. To ensure that both the public, open dataset and the sequestered dataset are representative of the population available, demographic characteristics across the two datasets must be balanced. Our method uses multidimensional stratified sampling where several demographic variables of interest are sequentially used to separate the data into individual strata, each representing a unique combination of variables. Within each stratum, patients are randomly assigned to the open set (80%) or the sequestered set (20%). Thus, for p variables of interest, the balance of the pdimensional distribution of variable combinations can be controlled. This algorithm was used on an example COVID-19 dataset containing image exams of 4662 patients using the variables of race, age, sex at birth, and ethnicity, each containing 8, 8, 2, and 4 categories, respectively. After stratification of this dataset into the two subsets, resulting distributions of each variable matched the distribution from the original dataset with a maximum percent difference from its original fraction of 0.4%. These results demonstrate that the implemented process of multi-dimensional sequential stratified sampling can partition a large database while maintaining balance across several variables.
Conference Presentation
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Natalie Baughan, Heather M. Whitney, Karen Drukker, Berkman Sahiner, Tingting Hu, Kim Grace Hyun J., Michael McNitt-Gray, Kyle Myers, and Maryellen L. Giger "Sequestration of imaging studies in MIDRC: a multi-institutional data commons", Proc. SPIE 12035, Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment, 120350F (4 April 2022); https://doi.org/10.1117/12.2610239
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Medical imaging

Algorithm development

Data centers

Artificial intelligence

Medical research

Clinical trials

Back to Top