Skip to main content

Mitigating Data Stalls in Deep Learning with Multi-times Data Loading Rule

  • Conference paper
  • First Online:
  • 1942 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Abstract

With the growth of AI data scale, most deep learning jobs separate data storage and computation tasks. Therefore, I/O optimization has gradually become an important issue for training optimization of deep learning. Recent studies focus on I/O optimization for those deep learning methods with one-time data loading rule, where each data is used only once per epoch. However, these methods cannot deal with some jobs with multi-times data loading rule (e.g., meta-learning). By analyzing the characteristic of multi-times data loading, we design a simple, intuitive and effective cache replacement strategy called steady cache strategy. This strategy utilizes a cache to mitigate data stalls and converts the data placement problem to a 0/1 knapsack problem. To our best knowledge, we are the first to mitigate data stalls in AI jobs with multi-times data loading rule and our method is suitable for multi-job scenario. Our experiments demonstrate that the steady cache strategy achieves great improvement over the LRU strategy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016)

    Google Scholar 

  2. Chrobak, M., Noga, J.: LRU is better than FIFO. Algorithmica 23(2), 180–185 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Demir, I., Sayar, A.: Hadoop optimization for massive image processing: case study face detection. Int. J. Comput. Commun. Control 9(6), 664–671 (2014)

    Article  Google Scholar 

  4. Hashemi, S.H., Jyothi, S.A., Campbell, R.H.: TicTac: accelerating distributed deep learning with communication scheduling. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019 (2019)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)

    Google Scholar 

  6. Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5149–5169 (2022)

    Google Scholar 

  7. Jayarajan, A., Wei, J., Gibson, G., Fedorova, A., Pekhimenko, G.: Priority-based parameter propagation for distributed DNN training. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019 (2019)

    Google Scholar 

  8. Kumar, A.V., Sivathanu, M.: Quiver: an informed storage cache for deep learning. In: 18th USENIX Conference on File and Storage Technologies, FAST 2020, pp. 283–296 (2020)

    Google Scholar 

  9. Li, D., Zhang, J., Yang, Y., Liu, C., Song, Y., Hospedales, T.M.: Episodic training for domain generalization. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 1446–1455 (2019)

    Google Scholar 

  10. Lin, Y., Han, S., Mao, H., Wang, Y., Dally, B.: Deep gradient compression: reducing the communication bandwidth for distributed training. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)

    Google Scholar 

  11. Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proceedings of the FAST 2003 Conference on File and Storage Technologies (2003)

    Google Scholar 

  12. Mohan, J., Phanishayee, A., Raniwala, A., Chidambaram, V.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endow. 14(5), 771–784 (2021)

    Article  Google Scholar 

  13. Prabu, S., Thiyaneswaran, B., Sujatha, M., Nalini, C., Rajkumar, S.: Grid search for predicting coronary heart disease by tuning hyper-parameters. Comput. Syst. Sci. Eng. 43(2), 737–749 (2022)

    Article  Google Scholar 

  14. Pumma, S., Si, M., Feng, W., Balaji, P.: Scalable deep learning via I/O analysis and optimization. ACM Trans. Parallel Comput. 6(2), 6:1–6:34 (2019)

    Google Scholar 

  15. Quijano, A.J., Nguyen, S., Ordonez, J.: Grid search hyperparameter benchmarking of BERT, ALBERT, and LongFormer on DuoRC. CoRR abs/2101.06326 (2021)

    Google Scholar 

  16. Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)

    Google Scholar 

  17. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 (2017)

    Google Scholar 

  18. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 3630–3638 (2016)

    Google Scholar 

  19. Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53(3), 63:1–63:34 (2020)

    Google Scholar 

  20. Yang, C., Cong, G.: Accelerating data loading in deep neural network training. In: 26th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2019, pp. 235–245 (2019)

    Google Scholar 

  21. Zhu, Z., Tan, L., Li, Y., Ji, C.: PHDFS: optimizing I/O performance of HDFS in deep learning cloud computing platform. J. Syst. Archit. 109, 101810 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62276047).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, D. et al. (2023). Mitigating Data Stalls in Deep Learning with Multi-times Data Loading Rule. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30637-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30636-5

  • Online ISBN: 978-3-031-30637-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics