Skip to main content

A Study on Non-volatile 3D Stacked Memory for Big Data Applications

  • Conference paper
  • First Online:
Book cover Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9528))

Abstract

Recently, big data processing has been an increasingly important field of computer applications, which has attracted a lot of attention from academia and industry. However, it worsens the memory wall problem for processor design, which means a large performance gap between processor computation and memory access. The stacked memory structure has the potential benefits for future processor design such as low latency, large capacity, and high bandwidth. Since these benefits can effectively relieve the problem of memory wall, stacked memory structure has been a promising architecture technique. Such memory structure began to use non-volatile memory (NVM) to provide a faster and larger memory, but its memory access behaviours for big data application have not been fully studied. In order to understand its memory performance better, this paper analyses the NVM 3D stacked structure using simulation method. Since flash memory is the maturest NVM media, this paper uses flash memory as the NVM part in the stacked structure to study, which results in a processor architecture with tightly connected CPU, DRAM and flash layers. In our experiment, channel number, capacity, page size and latency of read and write are test variables. Through observing the evaluation results of eight programs from big data program set, we conclude that the bandwidth and capacity have a significant effect for big data applications, and as bandwidth and capacity increasing, the Read/Write latency of flash and page size show less affection. We also point out some problems about data consistency, channel selection, read and write strategy and data granularity selection. These analysis results are useful for further study and optimization on NVM 3D stacked structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, S., Huang, J.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th ICDEW, pp. 41–51 (2010)

    Google Scholar 

  2. Ferdman, M., Adileh, A.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: ASPLOS XVII, pp. 37–48 (2012)

    Google Scholar 

  3. Chhetri, M.B., Chichin, S., Vo, Q.B., et al.: Smart CloudBench - automated performance benchmarking of the cloud. In: IEEE Sixth International Conference on Cloud Computing (CLOUD), pp. 414–421 (2013)

    Google Scholar 

  4. Luo, C., Zhan, J., Jia, Z., Wang, L., et al.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6(4), 347–362 (2012)

    MathSciNet  Google Scholar 

  5. DCBench: a Benchmark Suite for Data Center Workloads. http://prof.ict.ac.cn/DCBench/

  6. Ferdman, M., Adileh, A., Kocberber, O., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. ACM SIGARCH Comput. Archit. News 40(1), 37–48 (2012). ACM

    Article  Google Scholar 

  7. Lotfi-Kamran, P., Grot, B., Ferdman, M., et al.: Scale-out processors. In: Proceedings of the 39th International Symposium on Computer Architecture (ISCA) (2012)

    Google Scholar 

  8. Tsai, Y.-F., Xie, Y., Vijaykrishnan, N., Irwin, M.J.: Three-dimensional cache design exploration using 3DCacti. In: ICCD (2005)

    Google Scholar 

  9. Puttaswamy, K., Loh, G.H.: Implementing caches in a 3D technology for high performance processors. In: ICCD (2005)

    Google Scholar 

  10. Ranganathan, P.: From microprocessors to nanostores: rethinking data centric systems. Computer 44, 39–48 (2011)

    Article  Google Scholar 

  11. Chang, J., Ranganathan, P., Mudge, T., et al.: A limits study of benefits from nanostore-based future data-centric system architectures. In: Proceedings of the 9th Conference on Computing Frontiers, pp. 33–42. ACM (2012)

    Google Scholar 

  12. Guthmuller, E., Miro-Panades, I., Greiner, A.: Adaptive stackable 3D cache architecture for many-cores. In: 2012 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 39–44. IEEE (2012)

    Google Scholar 

  13. Guthmuller, E., MiroPanades, I., Greiner, A.: Architectural exploration of a fine-grained 3D cache for high performance in a manycore context. In: 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC), pp. 302–307. IEEE (2013)

    Google Scholar 

  14. Lai, S.K.: Flash memories: successes and challenges. IBM J. Res. Devel. 52(4/5), 529–535 (2008)

    Article  Google Scholar 

  15. Rosenfeld, P., Cooper-Balis, E., Jacob, B.: Dramsim2: a cycle accurate memory system simulator. Comput. Archit. Lett. 10(1), 16–19 (2011)

    Article  Google Scholar 

  16. Kim, Y., Tauras, B., Gupta, A., et al.: Flashsim: a simulator for nand flash-based solid-statedrives. In: First International Conference on Advances in System Simulation, SIMUL 2009, pp. 125–131. IEEE (2009)

    Google Scholar 

  17. Luk, C.K., Cohn, R., Muth, R., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM Sigplan Not. 40, 190–200 (2005)

    Article  Google Scholar 

  18. Jevdjic, D., Volos, S., Falsafi, B.: Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. In: Proceedings of the 40th ISCA ACM, pp. 404–415 (2013)

    Google Scholar 

  19. Pawlowski, J.T.: Hybrid memory cube (HMC). Hot Chips 23 (2011)

    Google Scholar 

  20. Sandhu, G.: DRAM scaling and bandwidth challenges. In: NSF Workshop on Emerging Technologies for Interconnects (2012)

    Google Scholar 

  21. Kim, G., Kim, J., Ahn, J.H., et al.: Memory-centric system interconnect design with hybrid memory cubes. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pp. 145–156. IEEE Press (2013)

    Google Scholar 

  22. Pugsley, S.H., Jestes, J., et al.: NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads (2013)

    Google Scholar 

  23. Kgil, T., Mudge, T.: FlashCache: a NAND flash memory file cache for low power webservers. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 103–112. ACM (2006)

    Google Scholar 

  24. Saxena, M., Swift, M.M., Zhang, Y.: Flashtier: a lightweight, consistent and durable storagecache. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 267–280. ACM (2012)

    Google Scholar 

  25. Shi, L., Li, J., Xue, C.J., et al.: ExLRU: a unified write buffer cache management for flash memory. In: Proceedings of the Ninth ACM International Conference on Embedded Software, pp. 339–348. ACM (2011)

    Google Scholar 

  26. Yang, J., Plasson, N., et al.: HEC: improving endurance of high performance flash-based cache devices. In: Proceedings of the 6th International Systems and Storage Conference (SYSTOR 2013) (2013)

    Google Scholar 

  27. Caulfield, A.M., Grupp, L.M., Swanson, S.: Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. ACM Sigplan Not. 44(3), 217–228 (2009)

    Article  Google Scholar 

  28. Fawibe, A., Sherman, J., Kavi, K., Ignatowski, M., Mayhew, D.: New memory organizations for 3D DRAM and PCMs. In: Herkersdorf, A., Römer, K., Brinkschulte, U. (eds.) ARCS 2012. LNCS, vol. 7179, pp. 200–211. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  29. Kavi, K., Pianelli, S., Pisano, G., Regina, G., Ignatowski, M.: 3D DRAM and PCMs in processor memory hierarchy. In: Maehle, E., Römer, K., Karl, W., Tovar, E. (eds.) ARCS 2014. LNCS, vol. 8350, pp. 183–195. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  30. Dong, X., Wu, X., Sun, G., et al.: Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In: 45th ACM/IEEE Design Automation Conference, DAC 2008, pp. 554–559. IEEE (2008)

    Google Scholar 

Download references

Acknowledgements

This research was parially funded by NSF grants (No. 61433019, No. 61472435, and No. 61572508), HPNSFC grant (No. 12JJ4070), and DFMEC grant (20114307120010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Libo Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Qian, C., Huang, L., Xie, P., Xiao, N., Wang, Z. (2015). A Study on Non-volatile 3D Stacked Memory for Big Data Applications. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27119-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27118-7

  • Online ISBN: 978-3-319-27119-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics