ABSTRACT
Processing-in-Memory (PIM) is returning as a promising solution to address the issue of memory wall as computing systems gradually step into the big data era. Researchers continually proposed various PIM architecture combined with novel memory device or 3D integration technology, but it is still a lack of universal task scheduling method in terms of the new heterogeneous platform. In this paper, we propose a formalized model to quantify the performance and energy of the PIM+CPU heterogeneous parallel system. In addition, we are the first to build a task partitioning and mapping framework to exploit different PIM engines. In this framework, an application is divided into subtasks and mapped onto appropriate execution units based on the proposed PIM-oriented Earliest-Finish-Time (PEFT) algorithm to maximize the performance gains brought by PIM. Experimental evaluations show our PIM-aware framework significantly improves the system performance compared to conventional processor architectures.
- Kozyrakis C.E, et al., Scalable processors in the billion-transistor era: IRAM. Computer, 1997, 30(9): pp. 75--78. Google ScholarDigital Library
- Ahn J, et al., A scalable processing-in-memory accelerator for parallel graph processing. In Proc. of ISCA, 2015, pp. 105--117. Google ScholarDigital Library
- Zhang D, et al. TOP-PIM: throughput-oriented programmable processing in memory. In Proc. of HPDC, 2014, pp. 85--98. Google ScholarDigital Library
- Pugsley S.H, et al. NDC: Analyzing the impact of 3D-stacked memory+ logic devices on MapReduce workloads. In Proc. of ISPASS, 2014, pp. 190--200.Google ScholarCross Ref
- Ahn J, et al. Pim-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In Proc. of ISCA, 2015, pp. 336--348. Google ScholarDigital Library
- Johnson R.C, Efficient program analysis using dependence flow graphs. Ph.D. Dissertation. 1994, Cornell University. Google ScholarDigital Library
- Topcuoglu H, Hariri S, and Wu M.Y, Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): pp. 260--274. Google ScholarDigital Library
- Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification 2.0, Tech. Rep., 2013.Google Scholar
- Ubal R, et al. Multi2sim: A simulation framework to evaluate multicore-multithread processors. In Proc. of SBAC-PAD, 2007, pp. 62--68.Google ScholarCross Ref
- Chen K, et al. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proc. of DATE, 2012, pp. 33--38. Google ScholarDigital Library
Index Terms
- Selective off-loading to Memory: Task Partitioning and Mapping for PIM-enabled Heterogeneous Systems
Recommendations
GP-SIMD Processing-in-Memory
GP-SIMD, a novel hybrid general-purpose SIMD computer architecture, resolves the issue of data synchronization by in-memory computing through combining data storage and massively parallel processing. GP-SIMD employs a two-dimensional access memory with ...
A Memory Access Scheduling Method for Multi-core Processor
IWCSE '09: Proceedings of the 2009 Second International Workshop on Computer Science and Engineering - Volume 01It is well known fact that multi-core processor architecture is the mainstream of the next-generation microprocessor architecture and actualizes by Chip Multi-core Processors (CMP). As the number of cores per processor and the number of threaded ...
Resistive GP-SIMD Processing-In-Memory
GP-SIMD, a novel hybrid general-purpose SIMD architecture, addresses the challenge of data synchronization by in-memory computing, through combining data storage and massive parallel processing. In this article, we explore a resistive implementation of ...
Comments