Mitigating Data Stalls in Deep Learning with Multi-times Data Loading Rule

Chen, Derong; Liang, Shuang; Hu, Gang; Xu, Han; Luo, Xianqiang; Li, Hao; Shao, Jie

doi:10.1007/978-3-031-30637-2_37

Mitigating Data Stalls in Deep Learning with Multi-times Data Loading Rule

Derong Chen¹⁵,
Shuang Liang¹⁵,
Gang Hu¹⁶,
Han Xu¹⁶,
Xianqiang Luo¹⁶,
Hao Li¹⁵ &
…
Jie Shao^15,17

Conference paper
First Online: 14 April 2023

1942 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13943))

Abstract

With the growth of AI data scale, most deep learning jobs separate data storage and computation tasks. Therefore, I/O optimization has gradually become an important issue for training optimization of deep learning. Recent studies focus on I/O optimization for those deep learning methods with one-time data loading rule, where each data is used only once per epoch. However, these methods cannot deal with some jobs with multi-times data loading rule (e.g., meta-learning). By analyzing the characteristic of multi-times data loading, we design a simple, intuitive and effective cache replacement strategy called steady cache strategy. This strategy utilizes a cache to mitigate data stalls and converts the data placement problem to a 0/1 knapsack problem. To our best knowledge, we are the first to mitigate data stalls in AI jobs with multi-times data loading rule and our method is suitable for multi-job scenario. Our experiments demonstrate that the steady cache strategy achieves great improvement over the LRU strategy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016)
Google Scholar
Chrobak, M., Noga, J.: LRU is better than FIFO. Algorithmica 23(2), 180–185 (1999)
Article MathSciNet MATH Google Scholar
Demir, I., Sayar, A.: Hadoop optimization for massive image processing: case study face detection. Int. J. Comput. Commun. Control 9(6), 664–671 (2014)
Article Google Scholar
Hashemi, S.H., Jyothi, S.A., Campbell, R.H.: TicTac: accelerating distributed deep learning with communication scheduling. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)
Google Scholar
Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5149–5169 (2022)
Google Scholar
Jayarajan, A., Wei, J., Gibson, G., Fedorova, A., Pekhimenko, G.: Priority-based parameter propagation for distributed DNN training. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019 (2019)
Google Scholar
Kumar, A.V., Sivathanu, M.: Quiver: an informed storage cache for deep learning. In: 18th USENIX Conference on File and Storage Technologies, FAST 2020, pp. 283–296 (2020)
Google Scholar
Li, D., Zhang, J., Yang, Y., Liu, C., Song, Y., Hospedales, T.M.: Episodic training for domain generalization. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 1446–1455 (2019)
Google Scholar
Lin, Y., Han, S., Mao, H., Wang, Y., Dally, B.: Deep gradient compression: reducing the communication bandwidth for distributed training. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proceedings of the FAST 2003 Conference on File and Storage Technologies (2003)
Google Scholar
Mohan, J., Phanishayee, A., Raniwala, A., Chidambaram, V.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endow. 14(5), 771–784 (2021)
Article Google Scholar
Prabu, S., Thiyaneswaran, B., Sujatha, M., Nalini, C., Rajkumar, S.: Grid search for predicting coronary heart disease by tuning hyper-parameters. Comput. Syst. Sci. Eng. 43(2), 737–749 (2022)
Article Google Scholar
Pumma, S., Si, M., Feng, W., Balaji, P.: Scalable deep learning via I/O analysis and optimization. ACM Trans. Parallel Comput. 6(2), 6:1–6:34 (2019)
Google Scholar
Quijano, A.J., Nguyen, S., Ordonez, J.: Grid search hyperparameter benchmarking of BERT, ALBERT, and LongFormer on DuoRC. CoRR abs/2101.06326 (2021)
Google Scholar
Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 (2017)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 3630–3638 (2016)
Google Scholar
Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53(3), 63:1–63:34 (2020)
Google Scholar
Yang, C., Cong, G.: Accelerating data loading in deep neural network training. In: 26th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2019, pp. 235–245 (2019)
Google Scholar
Zhu, Z., Tan, L., Li, Y., Ji, C.: PHDFS: optimizing I/O performance of HDFS in deep learning cloud computing platform. J. Syst. Archit. 109, 101810 (2020)
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62276047).

Author information

Authors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Derong Chen, Shuang Liang, Hao Li & Jie Shao
Huawei Data Storage, Huawei Technologies Co., Ltd., Chengdu, China
Gang Hu, Han Xu & Xianqiang Luo
Shenzhen Institute for Advanced Study, UESTC, Shenzhen, China
Jie Shao

Authors

Derong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Liang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Han Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xianqiang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Hao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuang Liang .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, D. et al. (2023). Mitigating Data Stalls in Deep Learning with Multi-times Data Loading Rule. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-30637-2_37
Published: 14 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics