ABSTRACT
The increasing gap between the bandwidth requirements of modern Systems on Chip (SoC) and the I/O data rate delivered by Dynamic Random Access Memory (DRAM), known as the Memory Wall, limits the performance of today's data-intensive applications. General purpose memory controllers use online scheduling techniques in order to increase the memory bandwidth. Due to a limited buffer depth they only have a local view on the executed application. However, numerous applications possess regular or fixed memory access patterns, which are not yet exploited to overcome the memory wall. In this paper, we present a holistic methodology to generate an Application Specific Memory Controller (ASMC), which has a global view on the application and utilizes application knowledge to decrease the energy and increase the bandwidth. To generate an ASMC we analyze the DRAM access pattern of the application offline and generate a custom address mapping by solving a combinatorial sequence partitioning problem.
- B. Akin, J. C. Hoe, and F. Franchetti. HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM. In High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pages 1--6, Sept 2014.Google ScholarCross Ref
- R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu. Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 416--427, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
- E. Azarkhish, C. Pfister, D. Rossi, I. Loi, and L. Benini. Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP(99):1--14, 2016.Google Scholar
- S. Bayliss and G. A. Constantinides. Application Specific Memory Access, Reuse and Reordering for SDRAM. In Proceedings of the 7th International Conference on Reconfigurable Computing: Architectures, Tools and Applications, ARC'11, pages 41--52, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
- M. N. Bojnordi and E. Ipek. PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards. SIGARCH Comput. Archit. News, 40(3):13--24, June 2012. Google ScholarDigital Library
- J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: building a smarter memory controller. In High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium On, pages 70--79, Jan 1999. Google ScholarDigital Library
- R. Chen and V. K. Prasanna. DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-Based Systems. In Applied Reconfigurable Computing - 11th International Symposium, ARC 2015, Bochum, Germany, April 13-17, 2015, Proceedings, pages 349--356, 2015.Google Scholar
- R. Diestel. Graph Theory. Springer, 2010.Google Scholar
- A. E. Feldmann. Fast balanced partitioning is hard even on grids and trees. Mathematical Foundations of Computer Science 2012, pages 372--382. Google ScholarDigital Library
- H. Fleischner. Eulerian Graphs and Related Topics. Elsevier Science Publishers B.V., 1991.Google Scholar
- M. Ghasempour, J. D. Garside, A. Jaleel, and M. Luján. DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs. CoRR, abs/1509.03721, 2015.Google Scholar
- I. Hur and C. Lin. Adaptive History-Based Memory Schedulers. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 37, pages 343--354, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- J.A.Bondy and U. Murty. Graph Theory with Applications. The Macmillan Press LTD, 1976. Google ScholarDigital Library
- B. Jacob, S. Ng, and D. Wang. Memory Systems: Cache, DRAM, Disk. Elsevier Science, 2010.Google ScholarDigital Library
- M. Jung, C. Weis, P. Bertram, and N. Wehn. Power Modelling of 3D-Stacked Memories with TLM2.0 based Virtual Platforms. In Synopsys User Group Conference (SNUG), May, 2013, Munich, Germany., 2013.Google Scholar
- M. Jung, C. Weis, and N. Wehn. DRAMSys: A flexible DRAM Subsystem Design Space Exploration Framework. IPSJ Transactions on System LSI Design Methodology (T-SLDM), August 2015.Google ScholarCross Ref
- M. Jung, E. Zulian, D. Mathew, M. Herrmann, C. Brugger, C. Weis, and N. Wehn. Omitting Refresh - A Case Study for Commodity and Wide I/O DRAMs. In 1st International Symposium on Memory Systems (MEMSYS 2015), Washington, DC, USA, October 2015. Google ScholarDigital Library
- H. S. Kim, N. Vijaykrishnan, M. Kandemir, E. Brockmeyer, F. Catthoor, and M. J. Irwin. Estimating influence of data layout optimizations on SDRAM energy consumption. In ISLPED '03. Proceedings of the 2003 International Symposium on, pages 40--43, Aug 2003. Google ScholarDigital Library
- T. Kogel. Optimizing DDR Memory Subsystem Efficiency - The Unpredictable Memory Bottleneck. Synopsys Inc., January 2016.Google Scholar
- S. Langemeyer, P. Pirsch, and H. Blume. Using SDRAMs for two-dimensional accesses of long 2n x 2m-point FFTs and transposing. In Embedded Computer Systems (SAMOS), 2011 International Conference on, pages 242--248, July 2011.Google Scholar
- W.-F. Lin, S. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In High-Performance Computer Architecture, 2001. HPCA. The Seventh International Symposium on, pages 301--312, 2001. Google ScholarDigital Library
- J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. SIGARCH Comput. Archit. News, 41(3):60--71, June 2013. Google ScholarDigital Library
- Cadence Inc. Cadence® Denali® DDR Memory IP. http://ip.cadence.com/ipportfolio/ip-portfolio-overview/memory-ip/ddr-lpddr, October 2014, last access 18.02.2015.Google Scholar
- Micron Technology Inc. 1Gb: x4, x8, x16 DDR3 SDRAM. July 2006.Google Scholar
- Synopsys, Inc. DesignWare DDR IP. http://www.synopsys.com/IP/InterfaceIP/DDRn/, 2015, Last Access: 18.02.2015.Google Scholar
- Xilinx, Inc. Memory Interface Generator (MIG). http://www.xilinx.com/products/intellectual-property/mig.html, 2015, Last Access: 18.02.2015.Google Scholar
- W. Mi, X. Feng, J. Xue, and Y. Jia. Software-hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors. In Proceedings of the 2010 IFIP International Conference on Network and Parallel Computing, NPC'10, pages 329--343, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- M. Mitzenmacher and E. Upfal. Probability and Computing. Cambridge University Press, 2005.Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Parallelism-Aware Batch-Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. In 35th International Symposium on Computer Architecture (ISCA). Association for Computing Machinery, Inc., June 2008. Google ScholarDigital Library
- G. Nemhauser and L. Wolsey. Integer and Combinatorial Optimization. Series in discrete mathematics and optimization. John Wiley & Sons, Inc., 1999. Google ScholarDigital Library
- A. Ramani, I. L. Markov, K. A. Sakallah, and F. A. Aloul. Breaking instance-independent symmetries in exact graph coloring. Journal of Artificial Intelligence Research, 26:289--322, 2006. Google ScholarDigital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA '00, pages 128--138, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- T. Rockicki. Indexing memory banks to maximize page mode hit percentage and minimize memory latency. Hewlett-Packard Laboratories Technical Report, HPL-96-95, 1996.Google Scholar
- P. Sanders and C. Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In SEA'13, volume 7933 of LNCS, pages 164--175. Springer, 2013.Google Scholar
- V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Gather-scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 267--280, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- J. Shao and B. T. Davis. The Bit-reversal SDRAM Address Mapping. In Proceedings of the 2005 Workshop on Software and Compilers for Embedded Systems, SCOPES '05, pages 62--71, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- Z. Zhang, Z. Zhu, and X. Zhang. A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 32--41, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- ConGen: An Application Specific DRAM Memory Controller Generator
Recommendations
Efficient Generation of Application Specific Memory Controllers
MEMSYS '20: Proceedings of the International Symposium on Memory SystemsThe increasing gap between the bandwidth requirements of modern Systems on Chip (SoC) and the I/O data rate delivered by Dynamic Random Access Memory (DRAM), known as the Memory Wall, limits the performance of today’s data-intensive applications. ...
Storage coding for wear leveling in flash memories
Flash memory is a nonvolatile computer memory comprised of blocks of cells, wherein each cell is implemented as either NAND or NOR floating gate. NAND flash is currently the most widely used type of flash memory. In a NAND flash memory, every block of ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation ConferenceHybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
Comments