Skip to main content

Dataset Pruning using Evolutionary Optimization

  • Conference paper
  • First Online:
Bildverarbeitung für die Medizin 2023 (BVM 2023)

Part of the book series: Informatik aktuell ((INFORMAT))

Included in the following conference series:

  • 635 Accesses

Abstract

Data is key to training deep neural networks. A common demand for individual data units is their abundance and diversity. However, it is barely investigated what is actually an informative data unit and how the amount of data relates to the neural network performance. In this study, we utilize evolutionary algorithms to optimize data usage during deep neural network training. We test multiple medical classification and segmentation datasets as being key tasks in medical imaging and found that this so-called dataset pruning removes rather unimportant data elements. Depending on how much we punished the incorporation of data, we found that across tasks and datasets, a critical amount of data is incorporated by the algorithm itself. This shows that future research not only needs to incorporate abundant data but rather relevant data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kermany D. Labeled optical coherence tomography (OCT) and Chest X-Ray images for classification. Mendeley. 2018.

    Google Scholar 

  2. Nickparvar M. brain tumor MRI dataset. Kaggle. 2021.

    Google Scholar 

  3. Lozano AP. Medical MNIST Classification. GitHub. 2017.

    Google Scholar 

  4. Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S et al. BAGLS, a multihospital benchmark for automatic glottis segmentation. Sci Data. 2020;7(1):186.

    Google Scholar 

  5. Antonelli M, Reinke A, Bakas S, Farahani K, Kopp-Schneider A, Landman BA et al. The medical segmentation decathlon. Nat Commun. 2022;13(1):4128.

    Google Scholar 

  6. Tan M, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks. International conference on machine learning. PMLR. 2019:6105–14.

    Google Scholar 

  7. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer. 2015:234–41.

    Google Scholar 

  8. Wilson DR, Martinez TR. Instance pruning techniques. ICML.Vol. 97. (1997). 1997:400–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas M. Kist .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Der/die Autor(en), exklusiv lizenziert an Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Neubig, L., Kist, A.M. (2023). Dataset Pruning using Evolutionary Optimization. In: Deserno, T.M., Handels, H., Maier, A., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds) Bildverarbeitung für die Medizin 2023. BVM 2023. Informatik aktuell. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-41657-7_30

Download citation

Publish with us

Policies and ethics