A Study on Non-volatile 3D Stacked Memory for Big Data Applications

Qian, Cheng; Huang, Libo; Xie, Peng; Xiao, Nong; Wang, Zhiying

doi:10.1007/978-3-319-27119-4_8

Cheng Qian¹⁷,
Libo Huang¹⁷,
Peng Xie¹⁷,
Nong Xiao¹⁷ &
…
Zhiying Wang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9528))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

Abstract

Recently, big data processing has been an increasingly important field of computer applications, which has attracted a lot of attention from academia and industry. However, it worsens the memory wall problem for processor design, which means a large performance gap between processor computation and memory access. The stacked memory structure has the potential benefits for future processor design such as low latency, large capacity, and high bandwidth. Since these benefits can effectively relieve the problem of memory wall, stacked memory structure has been a promising architecture technique. Such memory structure began to use non-volatile memory (NVM) to provide a faster and larger memory, but its memory access behaviours for big data application have not been fully studied. In order to understand its memory performance better, this paper analyses the NVM 3D stacked structure using simulation method. Since flash memory is the maturest NVM media, this paper uses flash memory as the NVM part in the stacked structure to study, which results in a processor architecture with tightly connected CPU, DRAM and flash layers. In our experiment, channel number, capacity, page size and latency of read and write are test variables. Through observing the evaluation results of eight programs from big data program set, we conclude that the bandwidth and capacity have a significant effect for big data applications, and as bandwidth and capacity increasing, the Read/Write latency of flash and page size show less affection. We also point out some problems about data consistency, channel selection, read and write strategy and data granularity selection. These analysis results are useful for further study and optimization on NVM 3D stacked structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang, S., Huang, J.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th ICDEW, pp. 41–51 (2010)
Google Scholar
Ferdman, M., Adileh, A.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: ASPLOS XVII, pp. 37–48 (2012)
Google Scholar
Chhetri, M.B., Chichin, S., Vo, Q.B., et al.: Smart CloudBench - automated performance benchmarking of the cloud. In: IEEE Sixth International Conference on Cloud Computing (CLOUD), pp. 414–421 (2013)
Google Scholar
Luo, C., Zhan, J., Jia, Z., Wang, L., et al.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6(4), 347–362 (2012)
MathSciNet Google Scholar
DCBench: a Benchmark Suite for Data Center Workloads. http://prof.ict.ac.cn/DCBench/
Ferdman, M., Adileh, A., Kocberber, O., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. ACM SIGARCH Comput. Archit. News 40(1), 37–48 (2012). ACM
Article Google Scholar
Lotfi-Kamran, P., Grot, B., Ferdman, M., et al.: Scale-out processors. In: Proceedings of the 39th International Symposium on Computer Architecture (ISCA) (2012)
Google Scholar
Tsai, Y.-F., Xie, Y., Vijaykrishnan, N., Irwin, M.J.: Three-dimensional cache design exploration using 3DCacti. In: ICCD (2005)
Google Scholar
Puttaswamy, K., Loh, G.H.: Implementing caches in a 3D technology for high performance processors. In: ICCD (2005)
Google Scholar
Ranganathan, P.: From microprocessors to nanostores: rethinking data centric systems. Computer 44, 39–48 (2011)
Article Google Scholar
Chang, J., Ranganathan, P., Mudge, T., et al.: A limits study of benefits from nanostore-based future data-centric system architectures. In: Proceedings of the 9th Conference on Computing Frontiers, pp. 33–42. ACM (2012)
Google Scholar
Guthmuller, E., Miro-Panades, I., Greiner, A.: Adaptive stackable 3D cache architecture for many-cores. In: 2012 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 39–44. IEEE (2012)
Google Scholar
Guthmuller, E., MiroPanades, I., Greiner, A.: Architectural exploration of a fine-grained 3D cache for high performance in a manycore context. In: 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC), pp. 302–307. IEEE (2013)
Google Scholar
Lai, S.K.: Flash memories: successes and challenges. IBM J. Res. Devel. 52(4/5), 529–535 (2008)
Article Google Scholar
Rosenfeld, P., Cooper-Balis, E., Jacob, B.: Dramsim2: a cycle accurate memory system simulator. Comput. Archit. Lett. 10(1), 16–19 (2011)
Article Google Scholar
Kim, Y., Tauras, B., Gupta, A., et al.: Flashsim: a simulator for nand flash-based solid-statedrives. In: First International Conference on Advances in System Simulation, SIMUL 2009, pp. 125–131. IEEE (2009)
Google Scholar
Luk, C.K., Cohn, R., Muth, R., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Not. 40, 190–200 (2005)
Article Google Scholar
Jevdjic, D., Volos, S., Falsafi, B.: Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. In: Proceedings of the 40th ISCA ACM, pp. 404–415 (2013)
Google Scholar
Pawlowski, J.T.: Hybrid memory cube (HMC). Hot Chips 23 (2011)
Google Scholar
Sandhu, G.: DRAM scaling and bandwidth challenges. In: NSF Workshop on Emerging Technologies for Interconnects (2012)
Google Scholar
Kim, G., Kim, J., Ahn, J.H., et al.: Memory-centric system interconnect design with hybrid memory cubes. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 145–156. IEEE Press (2013)
Google Scholar
Pugsley, S.H., Jestes, J., et al.: NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads (2013)
Google Scholar
Kgil, T., Mudge, T.: FlashCache: a NAND flash memory file cache for low power webservers. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 103–112. ACM (2006)
Google Scholar
Saxena, M., Swift, M.M., Zhang, Y.: Flashtier: a lightweight, consistent and durable storagecache. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 267–280. ACM (2012)
Google Scholar
Shi, L., Li, J., Xue, C.J., et al.: ExLRU: a unified write buffer cache management for flash memory. In: Proceedings of the Ninth ACM International Conference on Embedded Software, pp. 339–348. ACM (2011)
Google Scholar
Yang, J., Plasson, N., et al.: HEC: improving endurance of high performance flash-based cache devices. In: Proceedings of the 6th International Systems and Storage Conference (SYSTOR 2013) (2013)
Google Scholar
Caulfield, A.M., Grupp, L.M., Swanson, S.: Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. ACM Sigplan Not. 44(3), 217–228 (2009)
Article Google Scholar
Fawibe, A., Sherman, J., Kavi, K., Ignatowski, M., Mayhew, D.: New memory organizations for 3D DRAM and PCMs. In: Herkersdorf, A., Römer, K., Brinkschulte, U. (eds.) ARCS 2012. LNCS, vol. 7179, pp. 200–211. Springer, Heidelberg (2012)
Chapter Google Scholar
Kavi, K., Pianelli, S., Pisano, G., Regina, G., Ignatowski, M.: 3D DRAM and PCMs in processor memory hierarchy. In: Maehle, E., Römer, K., Karl, W., Tovar, E. (eds.) ARCS 2014. LNCS, vol. 8350, pp. 183–195. Springer, Heidelberg (2014)
Chapter Google Scholar
Dong, X., Wu, X., Sun, G., et al.: Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In: 45th ACM/IEEE Design Automation Conference, DAC 2008, pp. 554–559. IEEE (2008)
Google Scholar

Download references

Acknowledgements

This research was parially funded by NSF grants (No. 61433019, No. 61472435, and No. 61572508), HPNSFC grant (No. 12JJ4070), and DFMEC grant (20114307120010).

Author information

Authors and Affiliations

State Key Laboratory of High Performance Computing, National Unversity of Defense Technology, Changsha, 410073, China
Cheng Qian, Libo Huang, Peng Xie, Nong Xiao & Zhiying Wang

Authors

Cheng Qian
View author publications
You can also search for this author in PubMed Google Scholar
Libo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Nong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiying Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Libo Huang .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University , Changsha, China
Kenli Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qian, C., Huang, L., Xie, P., Xiao, N., Wang, Z. (2015). A Study on Non-volatile 3D Stacked Memory for Big Data Applications. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-27119-4_8
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27118-7
Online ISBN: 978-3-319-27119-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics