Abstract
With the growth of AI data scale, most deep learning jobs separate data storage and computation tasks. Therefore, I/O optimization has gradually become an important issue for training optimization of deep learning. Recent studies focus on I/O optimization for those deep learning methods with one-time data loading rule, where each data is used only once per epoch. However, these methods cannot deal with some jobs with multi-times data loading rule (e.g., meta-learning). By analyzing the characteristic of multi-times data loading, we design a simple, intuitive and effective cache replacement strategy called steady cache strategy. This strategy utilizes a cache to mitigate data stalls and converts the data placement problem to a 0/1 knapsack problem. To our best knowledge, we are the first to mitigate data stalls in AI jobs with multi-times data loading rule and our method is suitable for multi-job scenario. Our experiments demonstrate that the steady cache strategy achieves great improvement over the LRU strategy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016)
Chrobak, M., Noga, J.: LRU is better than FIFO. Algorithmica 23(2), 180–185 (1999)
Demir, I., Sayar, A.: Hadoop optimization for massive image processing: case study face detection. Int. J. Comput. Commun. Control 9(6), 664–671 (2014)
Hashemi, S.H., Jyothi, S.A., Campbell, R.H.: TicTac: accelerating distributed deep learning with communication scheduling. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)
Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5149–5169 (2022)
Jayarajan, A., Wei, J., Gibson, G., Fedorova, A., Pekhimenko, G.: Priority-based parameter propagation for distributed DNN training. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019 (2019)
Kumar, A.V., Sivathanu, M.: Quiver: an informed storage cache for deep learning. In: 18th USENIX Conference on File and Storage Technologies, FAST 2020, pp. 283–296 (2020)
Li, D., Zhang, J., Yang, Y., Liu, C., Song, Y., Hospedales, T.M.: Episodic training for domain generalization. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 1446–1455 (2019)
Lin, Y., Han, S., Mao, H., Wang, Y., Dally, B.: Deep gradient compression: reducing the communication bandwidth for distributed training. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proceedings of the FAST 2003 Conference on File and Storage Technologies (2003)
Mohan, J., Phanishayee, A., Raniwala, A., Chidambaram, V.: Analyzing and mitigating data stalls in DNN training. Proc. VLDB Endow. 14(5), 771–784 (2021)
Prabu, S., Thiyaneswaran, B., Sujatha, M., Nalini, C., Rajkumar, S.: Grid search for predicting coronary heart disease by tuning hyper-parameters. Comput. Syst. Sci. Eng. 43(2), 737–749 (2022)
Pumma, S., Si, M., Feng, W., Balaji, P.: Scalable deep learning via I/O analysis and optimization. ACM Trans. Parallel Comput. 6(2), 6:1–6:34 (2019)
Quijano, A.J., Nguyen, S., Ordonez, J.: Grid search hyperparameter benchmarking of BERT, ALBERT, and LongFormer on DuoRC. CoRR abs/2101.06326 (2021)
Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 (2017)
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp. 3630–3638 (2016)
Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53(3), 63:1–63:34 (2020)
Yang, C., Cong, G.: Accelerating data loading in deep neural network training. In: 26th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2019, pp. 235–245 (2019)
Zhu, Z., Tan, L., Li, Y., Ji, C.: PHDFS: optimizing I/O performance of HDFS in deep learning cloud computing platform. J. Syst. Archit. 109, 101810 (2020)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 62276047).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, D. et al. (2023). Mitigating Data Stalls in Deep Learning with Multi-times Data Loading Rule. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13943. Springer, Cham. https://doi.org/10.1007/978-3-031-30637-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-30637-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30636-5
Online ISBN: 978-3-031-30637-2
eBook Packages: Computer ScienceComputer Science (R0)