Reinventing Memory System Design for Many-Accelerator Architecture

Wang, Ying; Zang, Lei; Han, Yin-He; Li, Hua-Wei

doi:10.1007/s11390-014-1429-6

Reinventing Memory System Design for Many-Accelerator Architecture

Regular Paper
Published: 23 March 2014

Volume 29, pages 273–280, (2014)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Ying Wang^1,2,
Lei Zang¹,
Yin-He Han¹ &
…
Hua-Wei Li¹

139 Accesses
3 Altmetric
Explore all metrics

Abstract

The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions

Article 28 January 2021

LC-MEMENTO: A Memory Model for Accelerated Architectures

Towards Application-Centric Parallel Memories

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Yan G, Li Y, Han Y, Li X, Guo M, Liang X. AgileRegulator: A hybird voltage regulator scheme redeeming dark silicon for power efficiency in a multicore architecture. In Proc. the 18th International Symposium on High Performance Computer Architecture, Feb. 2012, pp.287-298.
Fu B, Han Y, Ma J, Li H, Li X. An abacus turn model for time/space-efficient reconfigurable routing. In Proc. the 38th International Symposium on Computer Architecture, June 2011, pp.259-270.
Hameed R, Qadeer W, Wachs M, Azizi O, Solomatnikov A, Lee B C, Richardson S, Kozyrakis C, Horowitz M. Understanding sources of inefficiency in general-purpose chips. In Proc. the 37th Annual International Symposium on Computer Architecture, June 2010, pp.37-47.
Cong J, Grigorian B, Reinman G, Vitanza M. Accelerating vision and navigation applications on a customizable platform. In Proc. the 22nd IEEE International Conference on Application-Specific Systems, Architectures and Processors, Sept. 2011, pp.25-32.
Auras D, Girbal S, Berry H et al. CMA: Chip multi-accelerator. In Proc. the 8th IEEE Symposium on Application Specific Processors, June 2010, pp.8-15.
Girbal S, Temam O, Yehia S, Berry H, Li Z. A memory interface for multi-purpose multi-stream accelerators. In Proc. the 13rd International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2010, pp.107-116.
Chien A A, Snavely A, Gahagan M. 10×10: A general-purpose architectural approach to heterogeneity and energy efficiency. In Proc. the 11th International Conference on Computational Science, June 2011, pp.1987-1996.
Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and through-put. In Proc. the 38th Annual International Symposium on Computer Architecture, June 2011, pp.295-306.
Rosenfeld P, Cooper-Balis E, Jacob B. DRAMSim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011, 10(1): 16–19.
Article Google Scholar
Seznec A. Decoupled sectored caches: Conciliating low tag implementation cost. In Proc. the 21st Annual International Symposium on Computer Architecture, Apr. 1994, pp.384-393.
Kumar S, Zhao H, Shriraman A, Matthews E, Dwarkadas S, Shannon L. Amoeba-cache: Adaptive blocks for eliminating waste in the memory hierarchy. In Proc. the 45th Annual International Symposium on Microarchitecture, December 2012, pp.376-388.
Ahn J H, Leverich J, Schreiber R, Jouppi N P. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters, 2009, 8(1): 5–8.
Article Google Scholar
Udipi A N, Muralimanohar N, Chatterjee N, Balasubramonian R, Davis A, Jouppi N P. Rethinking DRAM design and organization for energy-constrained multi-cores. In Proc. the 37th Annual International Symposium on Computer Architecture, June 2010, pp.175-186.
Kim J S, Oh C S, Lee H et al. A 1.2 V 12.8 GB/s 2 Gb mobile Wide-I/O DRAM with 4 × 128 I/Os using TSV-based stacking. In Proc. the International Solid-State Circuits Conference, February 2011, pp.496-498.
Liu C, Zhang L, Han Y, Li X. Vertical interconnects squeezing in symmetric 3D mesh network-on-Chip. In Proc. the 16th Asia and South Pacific Design Automation Conference, Jan. 2011, pp.357-362
Wang Y, Zhang L, Han Y, Li H, Li X. FlexMemory: Exploiting and managing abundant off-chip optical bandwidth. In Proc. Design, Automation and Test in Europe, March 2011, pp.968-973
Rafique N, Lim W, Thottethodi M. Effective management of DRAM bandwidth in multicore processors. In Proc. the 16th International Conference on Parallel Architectures and Compilation Techniques, Sept. 2007, pp.245-258.
Bitirgen R, Ipek E, Martinez J. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proc. the 41st IEEE/ACM International Symposium on Microarchitecture, Nov. 2008, pp.318-329.
Liu F, Jiang X, Solihin Y. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In Proc. the 16th IEEE International Symposium on High Performance Computer Architecture, January 2010.
Muralidhara S P, Subramanian L, Mutlu O et al. Reducing memory interference in multicore systems via application aware memory channel partitioning. In Proc. the 44th International Symposium on Microarchitecture, December 2011, pp.374-385.
Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proc. the 21st International Conference on Parallel Architectures and Compilation Techniques, August 2012, pp.367-376.
Thiebaut D, Stone H S. Footprints in the cache. ACM Trans. Computer Systems, 1987, 5(4): 305–329.
Article Google Scholar
Sudan K, Chatterjee N, Nellans D, Awasthi M, Balasubramonian R, Davis A. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. In Proc. the 15th Edi tion of ASPLOS on Architectural Support for Programming Languages and Operating systems, March 2010, pp.219-230.
Luk C K, Cohn R, Muth R et al. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. the 10th International Conference on Programming Language Design and Implementation, June 2005, pp.190-200.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Ying Wang, Lei Zang, Yin-He Han & Hua-Wei Li
University of Chinese Academy of Sciences, Beijing, 100049, China
Ying Wang

Authors

Ying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zang
View author publications
You can also search for this author in PubMed Google Scholar
Yin-He Han
View author publications
You can also search for this author in PubMed Google Scholar
Hua-Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin-He Han.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 61173006, 60921002, the National Basic Research 973 Program of China under Grant No. 2011CB302503, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010403.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 29 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Zang, L., Han, YH. et al. Reinventing Memory System Design for Many-Accelerator Architecture. J. Comput. Sci. Technol. 29, 273–280 (2014). https://doi.org/10.1007/s11390-014-1429-6

Download citation

Received: 19 November 2013
Revised: 21 January 2014
Published: 23 March 2014
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11390-014-1429-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinventing Memory System Design for Many-Accelerator Architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions

LC-MEMENTO: A Memory Model for Accelerated Architectures

Towards Application-Centric Parallel Memories

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(DOC 29 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Reinventing Memory System Design for Many-Accelerator Architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions

LC-MEMENTO: A Memory Model for Accelerated Architectures

Towards Application-Centric Parallel Memories

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(DOC 29 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation