Abstract
Data is key to training deep neural networks. A common demand for individual data units is their abundance and diversity. However, it is barely investigated what is actually an informative data unit and how the amount of data relates to the neural network performance. In this study, we utilize evolutionary algorithms to optimize data usage during deep neural network training. We test multiple medical classification and segmentation datasets as being key tasks in medical imaging and found that this so-called dataset pruning removes rather unimportant data elements. Depending on how much we punished the incorporation of data, we found that across tasks and datasets, a critical amount of data is incorporated by the algorithm itself. This shows that future research not only needs to incorporate abundant data but rather relevant data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kermany D. Labeled optical coherence tomography (OCT) and Chest X-Ray images for classification. Mendeley. 2018.
Nickparvar M. brain tumor MRI dataset. Kaggle. 2021.
Lozano AP. Medical MNIST Classification. GitHub. 2017.
Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S et al. BAGLS, a multihospital benchmark for automatic glottis segmentation. Sci Data. 2020;7(1):186.
Antonelli M, Reinke A, Bakas S, Farahani K, Kopp-Schneider A, Landman BA et al. The medical segmentation decathlon. Nat Commun. 2022;13(1):4128.
Tan M, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks. International conference on machine learning. PMLR. 2019:6105–14.
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer. 2015:234–41.
Wilson DR, Martinez TR. Instance pruning techniques. ICML.Vol. 97. (1997). 1997:400–11.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Der/die Autor(en), exklusiv lizenziert an Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature
About this paper
Cite this paper
Neubig, L., Kist, A.M. (2023). Dataset Pruning using Evolutionary Optimization. In: Deserno, T.M., Handels, H., Maier, A., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2023. BVM 2023. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-41657-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-658-41657-7_30
Published:
Publisher Name: Springer Vieweg, Wiesbaden
Print ISBN: 978-3-658-41656-0
Online ISBN: 978-3-658-41657-7
eBook Packages: Computer Science and Engineering (German Language)