Abstract:
By sidestepping the limitations at the memory interface, processing-in-memory (PIM) unlocks internally available memory bandwidth to the compute units on the memory side....Show MoreMetadata
Abstract:
By sidestepping the limitations at the memory interface, processing-in-memory (PIM) unlocks internally available memory bandwidth to the compute units on the memory side. This abundant bandwidth is conventionally utilized by highly-parallel throughput-oriented many-core style PIM architectures via offloading bandwidth-bound parallel tasks. However, it can be difficult to fully isolate these PIM-suitable tasks, and an offloaded program may include compute-bound sequential phases. These PIM-averse phases constitute a critical performance bottleneck for conventional many-core style PIM architectures. In this paper, we propose an analytical model for PIM execution that considers a program's bandwidth demand as well as its parallelism. Based on the proposed model, we make a case for an asymmetric PIM architecture that can mitigate the performance bottlenecks for PIM-averse phases while keeping the performance upside for PIM-suitable phases.
Published in: IEEE Computer Architecture Letters ( Volume: 18, Issue: 1, 01 Jan.-June 2019)