Skip to main content
Log in

Adaptive correlated prefetch with large-scale hybrid memory system for stream processing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Owing to the exponential growth of real-time data generation, the importance of stream processing is ever increasing. However, the data processing paradigm of stream processing is quite different, so it is difficult to expect high performance from memory systems applied to existing data centers. To solve this problem, two main solutions are suggested in this paper. First, a hybrid main memory and small buffer architecture are designed to reflect the execution characteristics of stream processing. Second, a hardware-based prefetch module supports correlation prefetching. Stream processing tends to accept incoming data in the main memory, so the prefetch module is used to divert data from the main memory layer to the buffer layer based on an intelligent clustering algorithm. This clustering algorithm affects the rapidly changing data access pattern of stream processing applications. By using heterogeneous main memories, not only can one enjoy the fast access latency of DRAM but also its nonvolatility, scalability, and low power consumption. The proposed hybrid memory architecture with our prefetch buffer structure can improve the buffer hit rate by 9–14% over other prefetch methods, reduce energy consumption by 26% over the conventional DRAM-only model, and achieve similar execution time over the 1/8-size DRAM space of the DRAM-only model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Habibzadeh H, Qin Z, Soyata T, Kantarci B (2017) Large scale distributed dedicated and non-dedicated smart city sensing systems. IEEE Sens J 1748:1–1

    Google Scholar 

  2. Barcelo M, Correa A, Llorca J, Tulino AM, Vicario JL, Morell A (2016) IoT-cloud service optimization in next generation smart environments. IEEE J Sel Areas Commun 34:4077–4090

    Article  Google Scholar 

  3. Reed DA, Dongarra J (2015) Exascale computing and big data. Commun ACM 58:56–68

    Article  Google Scholar 

  4. Chang B-j, Chang Y-h, Chang H-s, Kuo T-W, Li H-P (2014) A PCM translation layer for integrated memory and storage management. In: CODES’14 Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis

  5. Arcangioli B (1992) A switch in time. Curr Biol 2(6):323–325

    Article  Google Scholar 

  6. Dhiman G, Ayoub R, Rosing T (2009) PDRAM: a hybrid PRAM and DRAM main memory system. In: Design Automation Conference (DAC), p 66

  7. Lee BC, Ipek En, Mutlu O, Burger D (2009) Architecting phase change memory as a scalable dram alternative. Int Symp Comput Archit 36:2–13

    Google Scholar 

  8. Carbone P, Ewen S, Haridi S, Katsifodimos A, Markl V, Tzoumas K (2015) Apache Flink: unified stream and batch processing in a single engine. Data Eng 36:28–38

    Google Scholar 

  9. Toshniwal A, Donham J, Bhagat N, Mittal S, Ryaboy D, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M (2014) Storm@twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data—SIGMOD’14, pp 147–156

  10. Abadi DJ, Carney D, etintemel UC, Cherniack M, Convey C, Erwin C, Galvez E, Hatoun M (2003) Aurora: a data stream management system. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data

  11. Shevgoor M, Koladiya S, Balasubramonian R, Wilkerson C, Pugsley SH, Chishti Z (2015) Efficiently prefetching complex address patterns. Int Symp Microarchitect (Micro) 48:141–152

    Google Scholar 

  12. Jain A, Lin C (2013) Linearizing irregular memory accesses for improved correlated prefetching. Int Symp Microarchit (Micro) 46:247–259

    Google Scholar 

  13. Dahlgren F (1995) Sequential hardware prefetching in shared-memory multiprocessors. IEEE Trans Parallel Distrib Syst 6:733–745

    Article  Google Scholar 

  14. Gill B, Modha D (2005) SARC: sequential prefetching in adaptive replacement cache. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, pp 293–308

  15. Ramos LM, Ibanez PE (2011) Multi-level adaptive prefetching based on performance gradient tracking. J Instr Level Parallelism 13:1–14

    Google Scholar 

  16. Joseph D, Grunwald D (1999) Prefetching using Markov predictors. IEEE Trans Comput 48:121–133

    Article  Google Scholar 

  17. Apache storm project @ONLINE. https://github.com/apache/storm

  18. Apache spark streaming project @ONLINE. https://github.com/apache/spark/tree/master/streaming

  19. Apache Fink project @ONLINE. https://github.com/apache/flink

  20. Zhou P, Zhao B, Yang J, Zhang Y (2014) Throughput enhancement for phase change memories. IEEE Trans Comput 63:2080–2093

    Article  MathSciNet  MATH  Google Scholar 

  21. Ferreira AP, Childers B, Melhem R, Mosse D, Yousif M (2010) Using PCM in next-generation embedded space applications. In: 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium, pp 153–162

  22. Hoseinzadeh M, Arjomand M, Sarbazi-Azad H (2016) SPCM: the striped phase change memory. ACM Trans Archit Code Optim 12. https://doi.org/10.1145/2829951

  23. Kultursay E, Kandemir M, Sivasubramaniam A, Mutlu O (2013) Evaluating STT-RAM as an energy-efficient main memory alternative. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 256–267

  24. Kgil T, Mudge T (2006) FlashCache: a NAND flash memory file cache for low PowerWeb servers. In: Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), p 103

  25. Ouyang X, Islam NS, Rajachandrasekar R, Jose J, Luo M, Wang H, Panda DK (2012) SSD-assisted hybrid memory to accelerate memcached over high performance networks. In: Proceedings of the International Conference on Parallel Processing, pp 470–479

  26. Huang J, Badam A, Qureshi MK, Schwan K (2015) Unified address translation for memory-mapped SSDs with FlashMap. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture(ISCA), pp 580–591

  27. Van Essen B, Pearce R, Ames S, Gokhale M (2012) On the role of NVRAM in data-intensive architectures: an evaluation. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pp 703–714

  28. Liu H, Chen Y, Liao X, Jin H, He B, Zheng L, Guo R (2017) Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. In: Proceedings of International Conference on Supercomputing (ICS)

  29. Salkhordeh R, Asadi H (2016) An operating system level data migration scheme in hybrid DRAM-NVM memory architecture, design, automation, and test in Europe (DATE), pp 936–941

  30. Bolotin E, Nellans D, Villa O, O’Connor M, Ramirez A, Keckler SW (2015) Designing efficient heterogeneous memory architectures. IEEE Micro 35:60–68

    Article  Google Scholar 

  31. Wu X, Reddy ALN (2011) SCMFS: a file system for storage class memory. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p 39

  32. Dulloor SR, Roy A, Zhao Z, Sundaram N, Satish N, Sankaran R, Jackson J, Schwan K (2016) Data tiering in heterogeneous memory systems. Eur Conf Comput Syst (EuroSys) 11:1–16

    Google Scholar 

  33. Yoon SK, Youn YS, Nam SJ, Son MH, Kim SD (2016) Optimized memory-disk integrated system with dram and nonvolatile memory. IEEE Trans Multi-Scale Comput Syst 2:83–93

    Article  Google Scholar 

  34. Inagaki T, Onodera T, Komatsu H, Nakatani T (2003) Stride prefetching by dynamically inspecting objects. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), p 269

  35. Hariprakash G, Achutharaman R, Omondi AR (2001) DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors. In: Proceedings of the Australasian Computer Systems Architecture Conference (ACSAC), pp 62–70

  36. Pathak P, Sarwar M, Sohoni S (2010) Markov prediction scheme for cache prefetching. Conf Theor Appl Comput Sci 2:14–19

    Google Scholar 

  37. Sethia A, Dasika G, Samadi M, Mahlke S (2013) APOGEE: adaptive prefetching on GPUs for energy efficiency. In: Parallel Architectures and Compilation Techniques Conference Proceedings (PACT), pp 73–82

  38. Matteis T, Mencagli G (2016) Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. In: Symposium on Principles and Practice of Parallel Programming (PPoPP), p 21

  39. Sun D, Zhang G, Yang S, Zheng W, Khan SU, Li K (2015) Re-stream: real-time and energy-efficient resource scheduling in big data stream computing environments. Inf Sci 319:92–112

    Article  MathSciNet  Google Scholar 

  40. Kamburugamuve S, Ekanayake S, Pathirage, Fox G (2016) Towards high performance processing of streaming data in large data centers. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp 1627–1644

  41. James J (2016) STYX: stream processing with trustworthy cloud-based execution. Symp Cloud Comput 7:348–360

    Google Scholar 

  42. Kryder MH, Kim CS (2009) After hard drives-what comes next? IEEE Trans Magn 45:3406–3413

    Article  Google Scholar 

  43. Qureshi MK, Srinivasan V, Ja Rivers (2009) Scalable high performance main memory system using phase-change memory technology. ACM SIGARCH Comput Archit News 37:24–33

    Article  Google Scholar 

  44. Li Y, Chen Y, Jones AK (2012) A software approach for combating asymmetries of non-volatile memories. In: ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), pp 191–196

  45. Song W, Kim Y, Kim H, Lim J, Kim J (2014) Personalized optimization for android smartphones. ACM Trans Embed Comput Syst 13:1–25

    Google Scholar 

  46. Chintapalli S, Dagit D, Evans B, Farivar R, Graves T, Holderbaugh M, Liu Z, Nusbaum K, Patil K, Peng BJ, Poulosky P (2016) Benchmarking streaming computation engines: Storm, Flink and spark streaming. In: IEEE 30th International Parallel and Distributed Processing Symposium (IPDPS), pp 1789–1792

  47. Thein KMM (2014) Apache Kafka: next generation distributed messaging system. Int J Sci Eng Technol Res 3:9478–9483

    Google Scholar 

  48. Redis @ONLINE. https://redis.io

  49. Bellard F (2005) QEMU, a fast and portable dynamic translator. In: USENIX Annual Technical Conference, pp 41–46

  50. Qureshi M, Karidis J (2009) Enhancing lifetime and security of pcm based main memory with start-gap wear leveling. IEEE/ACM Int Symp Microarchit (Micro) 42:14–23

    Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2015M3C4A7065522) and by an Industry-Academy joint research program between Samsung Electronics and Yonsei University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shin-Dug Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S.M., Yoon, SK., Kim, JG. et al. Adaptive correlated prefetch with large-scale hybrid memory system for stream processing. J Supercomput 74, 4746–4770 (2018). https://doi.org/10.1007/s11227-018-2466-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2466-7

Keywords

Navigation