skip to main content
10.1145/2989081.2989131acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

ConGen: An Application Specific DRAM Memory Controller Generator

Authors Info & Claims
Published:03 October 2016Publication History

ABSTRACT

The increasing gap between the bandwidth requirements of modern Systems on Chip (SoC) and the I/O data rate delivered by Dynamic Random Access Memory (DRAM), known as the Memory Wall, limits the performance of today's data-intensive applications. General purpose memory controllers use online scheduling techniques in order to increase the memory bandwidth. Due to a limited buffer depth they only have a local view on the executed application. However, numerous applications possess regular or fixed memory access patterns, which are not yet exploited to overcome the memory wall. In this paper, we present a holistic methodology to generate an Application Specific Memory Controller (ASMC), which has a global view on the application and utilizes application knowledge to decrease the energy and increase the bandwidth. To generate an ASMC we analyze the DRAM access pattern of the application offline and generate a custom address mapping by solving a combinatorial sequence partitioning problem.

References

  1. B. Akin, J. C. Hoe, and F. Franchetti. HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM. In High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pages 1--6, Sept 2014.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu. Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 416--427, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Azarkhish, C. Pfister, D. Rossi, I. Loi, and L. Benini. Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP(99):1--14, 2016.Google ScholarGoogle Scholar
  4. S. Bayliss and G. A. Constantinides. Application Specific Memory Access, Reuse and Reordering for SDRAM. In Proceedings of the 7th International Conference on Reconfigurable Computing: Architectures, Tools and Applications, ARC'11, pages 41--52, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. N. Bojnordi and E. Ipek. PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards. SIGARCH Comput. Archit. News, 40(3):13--24, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: building a smarter memory controller. In High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium On, pages 70--79, Jan 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Chen and V. K. Prasanna. DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-Based Systems. In Applied Reconfigurable Computing - 11th International Symposium, ARC 2015, Bochum, Germany, April 13-17, 2015, Proceedings, pages 349--356, 2015.Google ScholarGoogle Scholar
  8. R. Diestel. Graph Theory. Springer, 2010.Google ScholarGoogle Scholar
  9. A. E. Feldmann. Fast balanced partitioning is hard even on grids and trees. Mathematical Foundations of Computer Science 2012, pages 372--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Fleischner. Eulerian Graphs and Related Topics. Elsevier Science Publishers B.V., 1991.Google ScholarGoogle Scholar
  11. M. Ghasempour, J. D. Garside, A. Jaleel, and M. Luján. DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs. CoRR, abs/1509.03721, 2015.Google ScholarGoogle Scholar
  12. I. Hur and C. Lin. Adaptive History-Based Memory Schedulers. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 37, pages 343--354, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.A.Bondy and U. Murty. Graph Theory with Applications. The Macmillan Press LTD, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Jacob, S. Ng, and D. Wang. Memory Systems: Cache, DRAM, Disk. Elsevier Science, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Jung, C. Weis, P. Bertram, and N. Wehn. Power Modelling of 3D-Stacked Memories with TLM2.0 based Virtual Platforms. In Synopsys User Group Conference (SNUG), May, 2013, Munich, Germany., 2013.Google ScholarGoogle Scholar
  16. M. Jung, C. Weis, and N. Wehn. DRAMSys: A flexible DRAM Subsystem Design Space Exploration Framework. IPSJ Transactions on System LSI Design Methodology (T-SLDM), August 2015.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Jung, E. Zulian, D. Mathew, M. Herrmann, C. Brugger, C. Weis, and N. Wehn. Omitting Refresh - A Case Study for Commodity and Wide I/O DRAMs. In 1st International Symposium on Memory Systems (MEMSYS 2015), Washington, DC, USA, October 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. S. Kim, N. Vijaykrishnan, M. Kandemir, E. Brockmeyer, F. Catthoor, and M. J. Irwin. Estimating influence of data layout optimizations on SDRAM energy consumption. In ISLPED '03. Proceedings of the 2003 International Symposium on, pages 40--43, Aug 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Kogel. Optimizing DDR Memory Subsystem Efficiency - The Unpredictable Memory Bottleneck. Synopsys Inc., January 2016.Google ScholarGoogle Scholar
  20. S. Langemeyer, P. Pirsch, and H. Blume. Using SDRAMs for two-dimensional accesses of long 2n x 2m-point FFTs and transposing. In Embedded Computer Systems (SAMOS), 2011 International Conference on, pages 242--248, July 2011.Google ScholarGoogle Scholar
  21. W.-F. Lin, S. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In High-Performance Computer Architecture, 2001. HPCA. The Seventh International Symposium on, pages 301--312, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. SIGARCH Comput. Archit. News, 41(3):60--71, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cadence Inc. Cadence® Denali® DDR Memory IP. http://ip.cadence.com/ipportfolio/ip-portfolio-overview/memory-ip/ddr-lpddr, October 2014, last access 18.02.2015.Google ScholarGoogle Scholar
  24. Micron Technology Inc. 1Gb: x4, x8, x16 DDR3 SDRAM. July 2006.Google ScholarGoogle Scholar
  25. Synopsys, Inc. DesignWare DDR IP. http://www.synopsys.com/IP/InterfaceIP/DDRn/, 2015, Last Access: 18.02.2015.Google ScholarGoogle Scholar
  26. Xilinx, Inc. Memory Interface Generator (MIG). http://www.xilinx.com/products/intellectual-property/mig.html, 2015, Last Access: 18.02.2015.Google ScholarGoogle Scholar
  27. W. Mi, X. Feng, J. Xue, and Y. Jia. Software-hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors. In Proceedings of the 2010 IFIP International Conference on Network and Parallel Computing, NPC'10, pages 329--343, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Mitzenmacher and E. Upfal. Probability and Computing. Cambridge University Press, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. O. Mutlu and T. Moscibroda. Parallelism-Aware Batch-Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. In 35th International Symposium on Computer Architecture (ISCA). Association for Computing Machinery, Inc., June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Nemhauser and L. Wolsey. Integer and Combinatorial Optimization. Series in discrete mathematics and optimization. John Wiley & Sons, Inc., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Ramani, I. L. Markov, K. A. Sakallah, and F. A. Aloul. Breaking instance-independent symmetries in exact graph coloring. Journal of Artificial Intelligence Research, 26:289--322, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA '00, pages 128--138, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Rockicki. Indexing memory banks to maximize page mode hit percentage and minimize memory latency. Hewlett-Packard Laboratories Technical Report, HPL-96-95, 1996.Google ScholarGoogle Scholar
  34. P. Sanders and C. Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In SEA'13, volume 7933 of LNCS, pages 164--175. Springer, 2013.Google ScholarGoogle Scholar
  35. V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Gather-scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 267--280, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Shao and B. T. Davis. The Bit-reversal SDRAM Address Mapping. In Proceedings of the 2005 Workshop on Software and Compilers for Embedded Systems, SCOPES '05, pages 62--71, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Z. Zhang, Z. Zhu, and X. Zhang. A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 32--41, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. ConGen: An Application Specific DRAM Memory Controller Generator

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
      October 2016
      463 pages
      ISBN:9781450343053
      DOI:10.1145/2989081

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader