Abstract
Scalable memory systems provide scalable bandwidth to the core growth demands in multicores and embedded systems processors. In these systems, as memory controllers (MCs) are scaled, memory traffic per MC is reduced, so transaction queues become shallower. As a consequence, there is an opportunity to explore transaction queue utilization and its impact on energy utilization. In this paper, we propose to evaluate the performance and energy-per-bit impact when reducing transaction queue sizes along with the MCs of these systems. Experimental results show that reducing 50 % on the number of entries, bandwidth and energy-per-bit levels are not affected, whilst reducing aggressively of about 90 %, bandwidth is similarly reduced while causing significantly higher energy-per-bit utilization.




Similar content being viewed by others
References
“LPDDR4 Moves Mobile”, mobile Forum (2013) Presented by Daniel Skinner.http://www.jedec.org/sites/.../D_Skinner_Mobile_Forum_May_2013_0.pdf. Accessed 06/03/2013
JEDEC Publishes Breakthrough Standard for Wide I/O Mobile DRAM. http://www.jedec.org/. Accessed 02/03/2014
Vantrease D et al (2008) Corona: system implications of emerging nanophotonic technology. In: ISCA. IEEE, DC, USA, pp 153–164
Therdsteerasukdi Kea (2011) The dimm tree architecture: a high bandwidth and scalable memory system. In: ICCD. IEEE, pp 388–395. [Online]. http://dblp.uni-trier.de/db/conf/iccd/iccd2011.html#TherdsteerasukdiBIRCC11
Marino MD (2013) RFiof: an RF approach to the I/O-pin and memory controller scalability for off-chip memories. In: CF, May 14–16. ACM, Ischia, Italy
Li S et al (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: MICRO’09. ACM, New York, USA, pp 469–480
Lee JK et al (2015) Guest editorial: embedded multicore systems andapplications. J Signal Process Syst 80(3):241–243
Hybrid Memory Cube Specification 1.0. http://www.hybridmemorycube.org/. Accessed date 03/03/2014
Marino MD (2012) RFiop: RF-memory path to address on-package I, O pad and memory controller scalability. In: ICCD, 2012. IEEE, Montreal, Quebec, Canada
Liu Q (2007) Quilt packaging: a novel high speed chip-to-chip communication paradigm for system-in-package. Ph.D. Dissertation, Notre Dame, Indiana, USA, December 2007, Chair-Jacob, Bruce L
McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE TCCA Newsletter, pp 19–25
The pChase memory benchmark page. http://pchase.org/. Accessed date 09/12/2012
CACTI 5.1. http://www.hpl.hp.com/techreports/2008/HPL200820.html. Accessed date 04/16/2013
Wang D et al (2005) DRAMsim: a memory system simulator. SIGARCH Comput Archit News 33(4):100–107
Micron manufactures DRAM components and modules and NAND Flash. http://www.micron.com/. Accessed date 12/28/2012
Binkert NL et al (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60
Loh Gabriel H (2008) 3D-stacked memory architectures for multi-core processors. In: ISCA. IEEE, DC, USA, pp 453–464
Frank Chang M et al (2008) CMP network-on-chip overlaid with multi-band RF-interconnect. In: HPCA. pp 191–202
Chang MCF et al (2008) Power reduction of CMP communication networks via RF-interconnects. In: MICRO. IEEE, Washington, USA, pp 376–387
Chang MCF et al (2005) Advanced RF/baseband interconnect schemes for inter- and intra-ULSI communications. IEEE Trans Electron Dev 52:1271–1285
Marino MD (2012) On-package scalability of RF and inductive memory controllers. In: Euromicro DSD IEEE
AMD Reveals Details About Bulldozer Microprocessors (2011). http://www.xbitlabs.com/news/cpu/display/20100824154814_AMD_Unveils_Details_About_Bulldozer_Microprocessors.html. Accessed date 08/02/2014
David et al (2011) Memory power management via dynamic voltage/frequency scaling. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ser. ICAC ’11. ACM, New York, NY, USA, pp 31–40
Tam SW et al (2011) RF-interconnect for future network-on-chip. Low Power Network-on-Chip, pp 255–280
Byun G et al (2011) An 8.4 Gb/s 2.5 pJ/b mobile memory I/O interface using bi-directional and simultaneous dual (base+RF)-band signaling. In: ISSCC. IEEE, pp 488, 490
ITRS HOME. http://www.itrs.net/. Accessed date 09/12/2012
NAS parallel benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html/. Accessed date 03/11/2013
Marino MD, Li KC (2014) Insights on memory controller scaling in multi-core embedded systems. Int J Embed Syst 6(4):351–361
Deng Q et al (2011) Memscale: active low-power modes for main memory. In: Proceedings of the Sixteenth ASPLOS. ACM, New York, NY, USA, pp 225–238
Malladi et al. (2012) Towards energy-proportional datacenter memory with mobile DRAM. In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ser. ISCA ’12. IEEE Computer Society, Washington, DC, USA, pp 37–48
Marowka A (2012) TBBench: a micro-benchmark suite for intel Threading building blocks. J Inf Proces Syst 8(2):331–346
Ding JH et al (2014) An efficient and comprehensive scheduler on asymmetric multicore architecture systems. J Syst Architect Embed Syst Des 60(3):305–314
Liu C, Granados O, Duarte R, Andrian J (2012) Energy efficient architecture using hardware acceleration for software defined radio components. J Inf Process Syst 8(1):133–144
Bunse C, Choi Y, Gross HG (2012) Evaluation of an abstract component model for embedded systems development. J Inf Process Syst 8(4):539–554
Acknowledgments
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Ministry of Science and Technology (MOST, Taiwan), Providence University and Nvidia. This research is based upon work partially supported by Ministry of Science and Technology (MOST, Taiwan), Providence University and NVIDIA. We would like to thank Maria Amelia Guitti Marino and anonymous reviewers for their feedbacks and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marino, M.D., Li, KC. Implications of shallower memory controller transaction queues in scalable memory systems. J Supercomput 72, 1785–1798 (2016). https://doi.org/10.1007/s11227-015-1485-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1485-x