An Efficient Data Prefetch Strategy for Deep Learning Based on Non-volatile Memory

Jiang, Wenbin; Liu, Pai; Jin, Hai; Peng, Jing

doi:10.1007/978-3-030-64243-3_8

Wenbin Jiang¹¹,
Pai Liu¹¹,
Hai Jin¹¹ &
…
Jing Peng¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12398))

Included in the following conference series:

International Conference on Green, Pervasive, and Cloud Computing

1270 Accesses

Abstract

Deep learning (DL) systems usually utilize asynchronous prefetch to improve data reading performance. However, the efficiency of the data transfer path from hard disk to DRAM is still limited by disk performance. The emerging non-volatile memory (NVRAM) provides a novel solution for this problem, while few existing researches have considered it. We propose a novel efficient data prefetch strategy for DL based on a heterogeneous memory system combining NVRAM with DRAM. Benefitting from the large capacity and fast reading speed of NVRAM, the strategy uses an asynchronous reading method named sliding NVRAM cache (SNC) to improve the performance of the data transfer paths. A sliding window is applied to map the data from disk to NVRAM and continuously update the data, while non-ideal writing performance of NVRAM can be remitted to a large extent in this strategy. Experiments show that SNC can improve the time performance of diverse deep neural networks training by more than 30%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283. USENIX, Berkeley (2016)
Google Scholar
Cantalupo, C., Venkatesan, V., Hammond, J., Czurlyo, K., Hammond, S.D.: Memkind: an extensible heap memory manager for heterogeneous memory platforms and mixed memory policies. Technical report, Sandia National Lab (SNL-NM), Albuquerque, NM, United States (2015)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 248–255. IEEE, Piscataway (2009)
Google Scholar
Duan, Z., Liu, H., Liao, X., Jin, H.: HME: a lightweight emulator for hybrid memory. In: Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE 2018), pp. 1375–1380. IEEE, Piscataway (2018)
Google Scholar
Eisenman, A., et al.: Bandana: using non-volatile memory for storing deep learning models. arXiv preprint arXiv:1811.05922 (2018)
Geng, Q., Zhou, Z., Cao, X.: Survey of recent progress in semantic image segmentation with CNNs. SCIENCE CHINA Inf. Sci. 61(5), 1–18 (2018). 051101
Article MathSciNet Google Scholar
Hadjis, S., Abuzaid, F., Zhang, C., Ré, C.: Caffe con Troll: shallow ideas to speed up deep learning. In: Proceedings of the Fourth Workshop on Data Analytics in the Cloud (DanaC 2015), pp. 2:1–4. ACM, New York (2015)
Google Scholar
Intel Corporation: Intel distribution of Caffe (2019). https://github.com/intel/caffe
Izraelevitz, J., et al.: Basic performance measurements of the Intel Optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia (MM 2014), pp. 675–678. ACM, New York (2014)
Google Scholar
Peng, I.B., Gokhale, M.B., Green, E.W.: System evaluation of the Intel Optane byte-addressable NVM. In: Proceedings of the International Symposium on Memory Systems (MEMSYS 2019), pp. 304–315. ACM, New York (2019)
Google Scholar
Poremba, M., Zhang, T., Xie, Y.: NVMain 2.0: a user-friendly memory simulator to model (non-)volatile memory systems. IEEE Comput. Archit. Lett. 14(2), 140–143 (2015)
Article Google Scholar
Pumma, S., Min, S., Feng, W.C., Balaji, P.: Parallel I/O optimizations for scalable deep learning. In: Proceedings of the IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS 2017), pp. 720–729. IEEE, Los Alamitos (2017)
Google Scholar
Sreedhar, D., Saxena, V., Sabharwal, Y., Verma, A., Kumar, S.: Efficient training of convolutional neural nets on large distributed systems. In: Proceedings of the International Conference on Cluster Computing (CLUSTER 2018), pp. 392–401. IEEE, Los Alamitos (2018)
Google Scholar
You, Y., Zhang, Z., Hsieh, C., Demmel, J., Keutzer, K.: ImageNet training in minutes. In: Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018), pp. 1:1–10. ACM, New York (2018)
Google Scholar

Download references

Acknowledgement

This work is supported by National Natural Science Foundation of China under grants No. 61832006 and No. 61672250.

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Wenbin Jiang, Pai Liu, Hai Jin & Jing Peng

Authors

Wenbin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Pai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jing Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbin Jiang .

Editor information

Editors and Affiliations

Northwestern Polytechnical University, Xi’an, China
Zhiwen Yu
University of Mannheim, Mannheim, Germany
Christian Becker
Chinese University of Hong Kong, Shatin, Hong Kong
Guoliang Xing

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, W., Liu, P., Jin, H., Peng, J. (2020). An Efficient Data Prefetch Strategy for Deep Learning Based on Non-volatile Memory. In: Yu, Z., Becker, C., Xing, G. (eds) Green, Pervasive, and Cloud Computing. GPC 2020. Lecture Notes in Computer Science(), vol 12398. Springer, Cham. https://doi.org/10.1007/978-3-030-64243-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-64243-3_8
Published: 04 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64242-6
Online ISBN: 978-3-030-64243-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics