ABSTRACT
Migrating computation to memory was proposed a long time ago as a way to overcome the memory bandwidth and latency bottleneck, as well as increase the computation parallelism. While the concept had been applied to several research projects it is only recently that the technological hurdles have been solved and we are able to see products arriving the market. While in most cases we need to concentrate on developing new algorithms and porting applications to new models as to fully exploit the potentials of the new products, we will still want to be able to execute efficiently existing applications. As such, in this work we focus on the analysis of the in-memory computation characteristics of existing applications in a way to evaluate how we would be able to have them move to "Memoryland".
We present a tool that analyses the locality of the memory accesses for the different routines in an application. The results observed from the execution of this tool on different applications are that while certain applications seem to be able to fit in a small granularity architecture (small memory-to-computation ratio), others have routines that require a large amount of data. Thus we believe that hierarchical in-memory processing architectures are a good fit for the demands of the different applications. In addition, results have shown that for most applications we can limit our analysis to the routines that issue the most memory accesses.
- A. Anghel, G. Dittmann, R. Jongerius, and R. Luijten. Spatio-Temporal Locality Characterization. In Proceedings of the 1st Workshop on Near-Data Processing (WoNDP), 2013.Google Scholar
- D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The nas parallel benchmarks summary and preliminary results. In Supercomputing, 1991. Supercomputing '91. Proceedings of the 1991 ACM/IEEE Conference on, pages 158--165, Nov 1991. Google ScholarDigital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 72--81, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- L. Bonebakker, A. Over, and I. Sharapov. Working set characterization of applications with an efficient lru algorithm. In A. Horváth and M. Telek, editors, Formal Methods and Stochastic Models for Performance Evaluation, volume 4054 of Lecture Notes in Computer Science, pages 78--92. Springer Berlin Heidelberg, 2006. Google ScholarDigital Library
- P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An efficient and scalable semiconductor architecture for parallel automata processing. Parallel and Distributed Systems, IEEE Transactions on, PP(99): 1--1, 2014.Google Scholar
- M. Gokhale, B. Holmes, and K. Iobst. Processing in memory: the terasys massively parallel pim array. Computer, 28(4): 23--31, Apr 1995. Google ScholarDigital Library
- M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, J. Shin, and J. Park. Mapping irregular applications to diva, a pim-based data-intensive architecture. In Supercomputing, ACM/IEEE 1999 Conference, pages 57--57, Nov 1999. Google ScholarDigital Library
- Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. Flexram: toward an advanced intelligent memory system. In Computer Design, 1999. (ICCD '99) International Conference on, pages 192--201, 1999. Google ScholarDigital Library
- G. H. Loh, N. Jayasena, M. Oskin, M. Nutter, D. Roberts, M. Meswani, D. P. Zhang, and M. Ignatowski. A Processing in Memory Taxonomy and a Case for Studying Fixed-function PIM. In Proceedings of the Workshop on Near-Data Processing (WoNDP).Google Scholar
- V. T. Ltd. TOMI Technology Implementations. http://www.venraytechnology.com/Implementations.htm, 2014.Google Scholar
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray User's Group (CUG), May 2010.Google Scholar
- M. Oskin, F. Chong, and T. Sherwood. Active pages: a computation model for intelligent memory. In Computer Architecture, 1998. Proceedings. The 25th Annual International Symposium on, pages 192--203, Jun 1998. Google ScholarDigital Library
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. Intelligent ram (iram): chips that remember and compute. In Solid-State Circuits Conference, 1997. Digest of Technical Papers. 43rd ISSCC., 1997 IEEE International, pages 224--225, Feb 1997.Google ScholarCross Ref
- J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely. Quantifying Locality In The Memory Access Patterns of HPC Applications. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC '05, pages 50--, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
Index Terms
- Moving to memoryland: in-memory computation for existing applications
Recommendations
CORUSCANT: Fast Efficient Processing-in-Racetrack Memories
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitectureThe growth in data needs of modern applications has created significant challenges for modern systems leading to a "memory wall." Spintronic Domain-Wall Memory (DWM), provides near-SRAM read/write performance, energy savings and non-volatility, ...
PIPF-DRAM: processing in precharge-free DRAM
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceTo alleviate costly data communication among processing cores and memory modules, parallel processing-in-memory (PIM) is a promising approach which exploits the huge available internal memory bandwidth. High capacity, wide row size, and maturity of DRAM ...
FePIM: Contention-Free In-Memory Computing Based on Ferroelectric Field-Effect Transistors
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation ConferenceThe memory wall bottleneck has caused a large portion of the energy to be consumed by data transfer between processors and memories when dealing with data-intensive workloads. By giving some processing abilities to memories, processing-in-memory (PIM) is ...
Comments