Skip to main content

Dataset Distillation by Automatic Training Trajectories

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Dataset Distillation is used to create a concise, yet informative, synthetic dataset that can replace the original dataset for training purposes. Some leading methods in this domain prioritize long-range matching, involving the unrolling of training trajectories with a fixed number of steps (\(N_{S}\)) on the synthetic dataset to align with various expert training trajectories. However, traditional long-range matching methods possess an overfitting-like problem, the fixed step size \(N_{S}\) forces synthetic dataset to distortedly conform seen expert training trajectories, resulting in a loss of generality—especially to those from unencountered architecture. We refer to this as the Accumulated Mismatching Problem (AMP), and propose a new approach, Automatic Training Trajectories (ATT), which dynamically and adaptively adjusts trajectory length \(N_{S}\) to address the AMP. Our method outperforms existing methods particularly in tests involving cross-architectures. Moreover, owing to its adaptive nature, it exhibits enhanced stability in the face of parameter variations. Our source code is publicly available at https://github.com/NiaLiu/ATT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarap, A.F.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2018)

  2. Aljundi, R., Lin, M., Goujaud, B., Bengio, Y.: Gradient based sample selection for online continual learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  3. Assadi, S., Bateni, M., Bernstein, A., Mirrokni, V., Stein, C.: Coresets meet EDCS: algorithms for matching and vertex cover on massive graphs. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1616–1635. SIAM (2019)

    Google Scholar 

  4. Bachem, O., Lucic, M., Krause, A.: Practical coreset constructions for machine learning. arXiv preprint arXiv:1703.06476 (2017)

  5. Bohdal, O., Yang, Y., Hospedales, T.: Flexible dataset distillation: learn labels instead of images. arXiv preprint arXiv:2006.08572 (2020)

  6. Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Dataset distillation by matching training trajectories. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4750–4759 (2022)

    Google Scholar 

  7. Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Wearable imagenet: synthesizing tileable textures via dataset distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2278–2282 (2022)

    Google Scholar 

  8. Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Generalizing dataset distillation via deep generative prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3739–3748 (2023)

    Google Scholar 

  9. Chen, Y., Welling, M.: Parametric herding. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 97–104. JMLR Workshop and Conference Proceedings (2010)

    Google Scholar 

  10. Chen, Y., Welling, M., Smola, A.: Super-samples from kernel herding. arXiv preprint arXiv:1203.3472 (2012)

  11. Cui, J., Wang, R., Si, S., Hsieh, C.J.: Scaling up dataset distillation to imagenet-1k with constant memory (2022)

    Google Scholar 

  12. Dasgupta, A., Drineas, P., Harb, B., Kumar, R., Mahoney, M.W.: Sampling algorithms and coresets for \(\backslash \) ell_p regression. SIAM J. Comput. 38(5), 2060–2078 (2009)

    Google Scholar 

  13. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  14. Deng, Z., Russakovsky, O.: Remember the past: distilling datasets into addressable memories for neural networks (2022)

    Google Scholar 

  15. Dong, T., Zhao, B., Lyu, L.: Privacy for free: how does dataset condensation help privacy? In: International Conference on Machine Learning, pp. 5378–5396. PMLR (2022)

    Google Scholar 

  16. Du, J., Jiang, Y., Tan, V.T., Zhou, J.T., Li, H.: Minimizing the accumulated trajectory error to improve dataset distillation. arXiv preprint arXiv:2211.11004 (2023)

  17. Fastai: A smaller subset of 10 easily classified classes from imagenet, and a little more French

    Google Scholar 

  18. Feldman, D.: Core-sets: updated survey. Sampling techniques for supervised or unsupervised tasks, pp. 23–44 (2020)

    Google Scholar 

  19. Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367–4375 (2018)

    Google Scholar 

  20. Guo, Z., Wang, K., Cazenavette, G., Li, H., Zhang, K., You, Y.: Towards lossless dataset distillation via difficulty-aligned trajectory matching. arXiv preprint arXiv:2310.05773 (2023)

  21. Har-Peled, S., Kushal, A.: Smaller coresets for k-median and k-means clustering. In: Proceedings of the Twenty-First Annual Symposium on Computational Geometry, pp. 126–134 (2005)

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  23. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  24. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  25. Kim, J.H., et al.: Dataset condensation via efficient synthetic-data parameterization. In: International Conference on Machine Learning, pp. 11102–11118. PMLR (2022)

    Google Scholar 

  26. Kiyasseh, D., Zhu, T., Clifton, D.A.: PCPS: patient cardiac prototypes to probe AI-based medical diagnoses, distill datasets, and retrieve patients. Trans. Mach. Learn. Res. (2022)

    Google Scholar 

  27. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  28. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)

    Google Scholar 

  29. Lee, H.B., Lee, D.B., Hwang, S.J.: Dataset condensation with latent space knowledge factorization and sharing. arXiv preprint arXiv:2208.10494 (2022)

  30. Lee, S., Chun, S., Jung, S., Yun, S., Yoon, S.: Dataset condensation with contrastive signals. In: International Conference on Machine Learning, pp. 12352–12364. PMLR (2022)

    Google Scholar 

  31. Li, G., Togo, R., Ogawa, T., Haseyama, M.: Dataset distillation using parameter pruning. arXiv preprint arXiv:2209.14609 (2022)

  32. Liu, S., Wang, K., Yang, X., Ye, J., Wang, X.: Dataset distillation via factorization (2022)

    Google Scholar 

  33. Liu, S., Wang, X.: Few-shot dataset distillation via translative pre-training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 18654–18664 (2023)

    Google Scholar 

  34. Liu, S., Ye, J., Yu, R., Wang, X.: Slimmable dataset condensation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3759–3768 (2023)

    Google Scholar 

  35. Liu, Y., Gu, J., Wang, K., Zhu, Z., Jiang, W., You, Y.: Dream: efficient dataset distillation by representative matching (2023)

    Google Scholar 

  36. Liu, Y., Li, Z., Backes, M., Shen, Y., Zhang, Y.: Backdoor attacks against dataset distillation. arXiv preprint arXiv:2301.01197 (2023)

  37. Loo, N., Hasani, R., Lechner, M., Rus, D.: Dataset distillation fixes dataset reconstruction attacks. arXiv preprint arXiv:2302.01428 (2023)

  38. Loo, N., Hasani, R., Lechner, M., Rus, D.: Dataset distillation with convexified implicit gradients (2023)

    Google Scholar 

  39. Mirrokni, V., Zadimoghaddam, M.: Randomized composable core-sets for distributed submodular maximization. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 153–162 (2015)

    Google Scholar 

  40. Nguyen, T., Novak, R., Xiao, L., Lee, J.: Dataset distillation with infinitely wide convolutional networks. In: Advances in Neural Information Processing Systems, vol. 34, pp. 5186–5198 (2021)

    Google Scholar 

  41. Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)

    Article  Google Scholar 

  42. Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010)

    Article  Google Scholar 

  43. O’Mahony, N., et al.: Deep learning vs. traditional computer vision. In: Arai, K., Kapoor, S. (eds.) CVC 2019. AISC, vol. 943, pp. 128–144. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-17795-9_10

    Chapter  Google Scholar 

  44. Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: finding important examples early in training. In: Advances in Neural Information Processing Systems, vol. 34, pp. 20596–20607 (2021)

    Google Scholar 

  45. Pi, R., et al.: DynaFed: tackling client data heterogeneity with global dynamics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12177–12186 (2023)

    Google Scholar 

  46. Sangermano, M., Carta, A., Cossu, A., Bacciu, D.: Sample condensation in online continual learning. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–08. IEEE (2022)

    Google Scholar 

  47. Schwartz, R., Dodge, J., Smith, N.A., Etzioni, O.: Green AI. Commun. ACM 63(12), 54–63 (2020)

    Article  Google Scholar 

  48. Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. arXiv preprint arXiv:1708.00489 (2017)

  49. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  50. Song, R., et al.: Federated learning via decentralized dataset distillation in resource-constrained edge environments. arXiv preprint arXiv:2208.11311 (2022)

  51. Sucholutsky, I., Schonlau, M.: Soft-label dataset distillation and text dataset distillation. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  52. Toneva, M., Sordoni, A., Combes, R.T., Trischler, A., Bengio, Y., Gordon, G.J.: An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018)

  53. Tsang, I.W., Kwok, J.T., Cheung, P.M., Cristianini, N.: Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 6(4) (2005)

    Google Scholar 

  54. Tukan, M., Maalouf, A., Feldman, D.: Coresets for near-convex functions. In: Advances in Neural Information Processing Systems, vol. 33, pp. 997–1009 (2020)

    Google Scholar 

  55. Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)

  56. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E., et al.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 (2018)

    Google Scholar 

  57. Wang, J., Guo, S., Xie, X., Qi, H.: Protect privacy from gradient leakage attack in federated learning. In: IEEE Conference on Computer Communications, IEEE INFOCOM 2022, pp. 580–589. IEEE (2022)

    Google Scholar 

  58. Wang, K., et al.: Cafe: learning to condense dataset by aligning features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12196–12205 (2022)

    Google Scholar 

  59. Wang, T., Zhu, J.Y., Torralba, A., Efros, A.A.: Dataset distillation. arXiv preprint arXiv:1811.10959 (2018)

  60. Wiewel, F., Yang, B.: Condensed composite memory continual learning. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)

    Google Scholar 

  61. Xu, Z., et al.: Kernel ridge regression-based graph dataset distillation. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2850–2861 (2023)

    Google Scholar 

  62. Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: reducing training data by examining generalization influence. arXiv preprint arXiv:2205.09329 (2022)

  63. Yin, Z., Xing, E., Shen, Z.: Squeeze, recover and relabel: dataset condensation at imagenet scale from a new perspective. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  64. Yoon, J., Madaan, D., Yang, E., Hwang, S.J.: Online coreset selection for rehearsal-based continual learning. arXiv preprint arXiv:2106.01085 (2021)

  65. Zhang, L., et al.: Accelerating dataset distillation via model augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11950–11959 (2023)

    Google Scholar 

  66. Zhao, B., Bilen, H.: Dataset condensation with differentiable Siamese augmentation. In: International Conference on Machine Learning, pp. 12674–12685. PMLR (2021)

    Google Scholar 

  67. Zhao, B., Bilen, H.: Synthesizing informative training samples with GAN. arXiv preprint arXiv:2204.07513 (2022)

  68. Zhao, B., Bilen, H.: Dataset condensation with distribution matching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6514–6523 (2023)

    Google Scholar 

  69. Zhao, B., Mopuri, K.R., Bilen, H.: Dataset condensation with gradient matching. arXiv preprint arXiv:2006.05929 (2020)

  70. Zhao, G., Li, G., Qin, Y., Yu, Y.: Improved distribution matching for dataset condensation (2023)

    Google Scholar 

  71. Zhmoginov, A., Sandler, M., Miller, N., Kristiansen, G., Vladymyrov, M.: Decentralized learning with multi-headed distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8053–8063 (2023)

    Google Scholar 

  72. Zhou, Y., Ma, X., Wu, D., Li, X.: Communication-efficient and attack-resistant federated edge learning with dataset distillation. IEEE Trans. Cloud Comput. (2022)

    Google Scholar 

  73. Zhou, Y., Pu, G., Ma, X., Li, X., Wu, D.: Distilled one-shot federated learning. arXiv preprint arXiv:2009.07999 (2020)

  74. Zhou, Y., Nezhadarya, E., Ba, J.: Dataset distillation using neural feature regression. arXiv preprint arXiv:2206.00719 (2022)

Download references

Acknowledgment

This work is funded by Bayerische Forschungsstiftung under the research grants Von der Edge zur Cloud und zurück: Skalierbare und Adaptive Sensordatenverarbeitung (AZ-1468-20), and supported by AI systems hosted and operated by the Leibniz-Rechenzentrum (LRZ) der Bayerischen Akademie der Wissenschaften. Further, part of the results have been obtained on systems in the test environment BEAST (Bavarian Energy Architecture & Software Testbed) at the Leibniz Supercomputing Centre.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dai Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7265 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, D., Gu, J., Cao, H., Trinitis, C., Schulz, M. (2025). Dataset Distillation by Automatic Training Trajectories. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15145. Springer, Cham. https://doi.org/10.1007/978-3-031-73021-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73021-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73020-7

  • Online ISBN: 978-3-031-73021-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics