Skip to main content

Reusing Your Prepared Data: An Informed Cache for Accelerating DNN Model Training

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14855))

Included in the following conference series:

  • 888 Accesses

Abstract

In deep learning training, CPU-intensive data preprocessing often leads to CPU bottlenecks, and expensive GPUs cannot be fully utilized, thus degrading end-to-end training performance. In general, CPUs are used to doing preprocessing and GPUs are used to training the model. We propose a new caching algorithm for AI model training, named HCache, which uses the DLT-Informed caching approach to improve the reuse of cached data and the usage of memory. DLT-Informed Caching approach can make intelligent caching decisions by identifying data discrepancies in different preprocessing stages. Our evaluation shows that HCache can achieves about 2\(\times \) speedup compared with the state-of-the-art CoorDL and Quiver in the training of computer vision models while maintaining comparable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cloud tensor processing units (tpu). https://cloud.google.com/tpu/docs/tpus

  2. Imagenet22 dataset. https://opendatalab.com/OpenDataLab/ImageNet-21k

  3. Nvidia a100. https://www.nvidia.com/en-us/data-center/a100/

  4. Runtime options with memory, cpus, and gpus. https://docs.docker.com/config/containers/resource_constraints/

  5. Choi, D., Passos, A., Shallue, C.J., Dahl, G.E.: Faster neural network training with data echoing. arXiv preprint arXiv:1907.05550 (2019)

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  7. Graur, D., Aymon, D., Kluser, D., Albrici, T., Thekkath, C.A., Klimovic, A.: Cachew: machine learning input data processing as a service. In: 2022 USENIX Annual Technical Conference (USENIX ATC 2022), pp. 689–706 (2022)

    Google Scholar 

  8. Gu, R., et al.: Fluid: dataset abstraction and elastic acceleration for cloud-native deep learning training jobs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 2182–2195. IEEE (2022)

    Google Scholar 

  9. Guirao, J.A., et al.: Fast AI data preprocessing with nvidia dali. In: GPU Technology Conference (2019)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Kumar, A.V., Sivathanu, M.: Quiver: an informed storage cache for deep learning. In: 18th USENIX Conference on File and Storage Technologies (FAST 2020), pp. 283–296 (2020)

    Google Scholar 

  12. Lee, G., et al.: Refurbish your training data: reusing partially augmented samples for faster deep neural network training. In: 2021 USENIX Annual Technical Conference (USENIX ATC 2021), pp. 537–550 (2021)

    Google Scholar 

  13. Mohan, J., Phanishayee, A., Raniwala, A., Chidambaram, V.: Analyzing and mitigating data stalls in DNN training. arXiv preprint arXiv:2007.06775 (2020)

  14. Naman Agarwal, R.A., Koren, T., Talwar, K., Zhang, C.: Stochastic optimization with laggard data pipelines. Adv. Neural Inf. Process. Syst. 33 (2020)

    Google Scholar 

  15. Park, P., Jeong, H., Kim, J.: Trainbox: an extreme-scale neural network training server architecture by systematically balancing operations. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 825–838. IEEE (2020)

    Google Scholar 

  16. Um, T., Oh, B., Seo, B., Kweun, M., Kim, G., Lee, W.Y.: Fastflow: accelerating deep learning model training with smart offloading of input data pipeline. Proc. VLDB Endow. 16(5), 1086–1099 (2023)

    Article  Google Scholar 

  17. Zhao, M., et al.: Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product. In: Proceedings of the 49th Annual International Symposium on Computer Architecture, pp. 1042–1057 (2022)

    Google Scholar 

Download references

Acknowledgement

This work is supported in part by the National Science and Technology Major Project (2021ZD0114300), the National Natural Science Foundation of China (U22A6001), the National Key R&D Program of China (2022YFB4500405, and 2023YFB4503005), and the Zhejiang provincial “Ten Thousand Talents Program” (2021R52007).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yong Li or Lingfang Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Han, K., Cheng, W., Li, Y., Wu, Y., Zeng, L., Chen, G. (2024). Reusing Your Prepared Data: An Informed Cache for Accelerating DNN Model Training. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14855. Springer, Singapore. https://doi.org/10.1007/978-981-97-5572-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5572-1_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5571-4

  • Online ISBN: 978-981-97-5572-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics