Skip to main content

Advertisement

Log in

Reinventing Memory System Design for Many-Accelerator Architecture

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yan G, Li Y, Han Y, Li X, Guo M, Liang X. AgileRegulator: A hybird voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture. In Proc. the 18th International Symposium on High Performance Computer Architecture, Feb. 2012, pp.287-298.

  2. Fu B, Han Y, Ma J, Li H, Li X. An abacus turn model for time/space-efficient reconfigurable routing. In Proc. the 38th International Symposium on Computer Architecture, June 2011, pp.259-270.

  3. Hameed R, Qadeer W, Wachs M, Azizi O, Solomatnikov A, Lee B C, Richardson S, Kozyrakis C, Horowitz M. Understanding sources of inefficiency in general-purpose chips. In Proc. the 37th Annual International Symposium on Computer Architecture, June 2010, pp.37-47.

  4. Cong J, Grigorian B, Reinman G, Vitanza M. Accelerating vision and navigation applications on a customizable platform. In Proc. the 22nd IEEE International Conference on Application-Specific Systems, Architectures and Processors, Sept. 2011, pp.25-32.

  5. Auras D, Girbal S, Berry H et al. CMA: Chip multi-accelerator. In Proc. the 8th IEEE Symposium on Application Specific Processors, June 2010, pp.8-15.

  6. Girbal S, Temam O, Yehia S, Berry H, Li Z. A memory interface for multi-purpose multi-stream accelerators. In Proc. the 13rd International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2010, pp.107-116.

  7. Chien A A, Snavely A, Gahagan M. 10×10: A general-purpose architectural approach to heterogeneity and energy efficiency. In Proc. the 11th International Conference on Computational Science, June 2011, pp.1987-1996.

  8. Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and through-put. In Proc. the 38th Annual International Symposium on Computer Architecture, June 2011, pp.295-306.

  9. Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16–19.

    Article  Google Scholar 

  10. Seznec A. Decoupled sectored caches: Conciliating low tag implementation cost. In Proc. the 21st Annual International Symposium on Computer Architecture, Apr. 1994, pp.384-393.

  11. Kumar S, Zhao H, Shriraman A, Matthews E, Dwarkadas S, Shannon L. Amoeba-cache: Adaptive blocks for eliminating waste in the memory hierarchy. In Proc. the 45th Annual International Symposium on Microarchitecture, December 2012, pp.376-388.

  12. Ahn J H, Leverich J, Schreiber R, Jouppi N P. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters, 2009, 8(1): 5–8.

    Article  Google Scholar 

  13. Udipi A N, Muralimanohar N, Chatterjee N, Balasubramonian R, Davis A, Jouppi N P. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proc. the 37th Annual International Symposium on Computer Architecture, June 2010, pp.175-186.

  14. Kim J S, Oh C S, Lee H et al. A 1.2 V 12.8 GB/s 2 Gb mobile Wide-I/O DRAM with 4 × 128 I/Os using TSV-based stacking. In Proc. the International Solid-State Circuits Conference, February 2011, pp.496-498.

  15. Liu C, Zhang L, Han Y, Li X. Vertical interconnects squeezing in symmetric 3D mesh network-on-Chip. In Proc. the 16th Asia and South Pacific Design Automation Conference, Jan. 2011, pp.357-362

  16. Wang Y, Zhang L, Han Y, Li H, Li X. FlexMemory: Exploiting and managing abundant off-chip optical bandwidth. In Proc. Design, Automation and Test in Europe, March 2011, pp.968-973

  17. Rafique N, Lim W, Thottethodi M. Effective management of DRAM bandwidth in multicore processors. In Proc. the 16th International Conference on Parallel Architectures and Compilation Techniques, Sept. 2007, pp.245-258.

  18. Bitirgen R, Ipek E, Martinez J. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proc. the 41st IEEE/ACM International Symposium on Microarchitecture, Nov. 2008, pp.318-329.

  19. Liu F, Jiang X, Solihin Y. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In Proc. the 16th IEEE International Symposium on High Performance Computer Architecture, January 2010.

  20. Muralidhara S P, Subramanian L, Mutlu O et al. Reducing memory interference in multicore systems via application aware memory channel partitioning. In Proc. the 44th International Symposium on Microarchitecture, December 2011, pp.374-385.

  21. Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proc. the 21st International Conference on Parallel Architectures and Compilation Techniques, August 2012, pp.367-376.

  22. Thiebaut D, Stone H S. Footprints in the cache. ACM Trans. Computer Systems, 1987, 5(4): 305–329.

    Article  Google Scholar 

  23. Sudan K, Chatterjee N, Nellans D, Awasthi M, Balasubramonian R, Davis A. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. In Proc. the 15th Edi tion of ASPLOS on Architectural Support for Programming Languages and Operating systems, March 2010, pp.219-230.

  24. Luk C K, Cohn R, Muth R et al. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. the 10th International Conference on Programming Language Design and Implementation, June 2005, pp.190-200.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin-He Han.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 61173006, 60921002, the National Basic Research 973 Program of China under Grant No. 2011CB302503, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010403.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 29 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Zang, L., Han, YH. et al. Reinventing Memory System Design for Many-Accelerator Architecture. J. Comput. Sci. Technol. 29, 273–280 (2014). https://doi.org/10.1007/s11390-014-1429-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1429-6

Keywords

Navigation