skip to main content
10.1145/1878961.1878989acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Automatic memory partitioning: increasing memory parallelism via data structure partitioning

Published:24 October 2010Publication History

ABSTRACT

In high-level synthesis, pipelined designs are often restricted by the number of memory banks available to the synthesis system. Using multiple memory banks can improve the performance of accelerated applications. Currently, programmers must manually assign data structures to specific memory banks on the accelerator. This paper describes Automatic Memory Partitioning, a method for automatically partitioning data structures into multiple memory banks for increased parallelism and performance. We use source code instrumentation to collect memory traces in order to detect linear memory access patterns. The memory traces are used to split data structures into disjoint memory regions and determine which segments may benefit from parallel memory access. We present an ILP based algorithm for allocating memory segments into multiple memory banks. Experiments show significant improvements in performance while using a minimal number of memory banks.

References

  1. C.Y.R. Ahmad, I. Chen. Post-processor for data path synthesis using multiport memories. In Computer-Aided Design, 1991. ICCAD-91. Digest of Technical Papers., 1991 IEEE International Conference on, pages 276--279, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  2. Yosi Ben-Asher and Nadav Rotem. Synthesis for variable pipelined function units. In System-on-Chip, 2008. SOC 2008. International Symposium on, pages 1--4. IEEE Computer Society, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. Joo M.P. Cardoso and Pedro C. Diniz. Compilation Techniques for Reconfigurable Architectures. Springer Publishing Company, Incorporated, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stephen Curial, Peng Zhao, Jose Nelson Amaral, Yaoqing Gao, Shimin Cui, Raul Silvera, and Roch Archambault. Mpads: memory-pooling-assisted data splitting. In ISMM '08: Proceedings of the 7th international symposium on Memory management, pages 101--110, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Srinivas Devadas, Abhijit Ghosh, and Kurt Keutzer. Logic Synthesis. McGraw-Hill, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np-complete problems. In STOC '74: Proceedings of the sixth annual ACM symposium on Theory of computing, pages 47--63, New York, NY, USA, 1974. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Xilinx Inc. Ml405 evaluation platform reference designs, 2009. http://www.xilinx.com/products/boards/ml405/.Google ScholarGoogle Scholar
  8. Chanik Park Junghee Lee and Soonhoi Ha. Memory access pattern analysis and stream cache design for multimedia applications. In Design Automation Conference, 2003. Proceedings of the ASP-DAC 2003. Asia and South Pacific, pages 22--27, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ramachandran L., Gajski D.D., and Chaiyakul V. An algorithm for array variable clustering. In European Design and Test Conference, 1994. EDAC, The European Conference on Design Automation., pages 262--266, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Lam. Software pipelining: an effective scheduling technique for vliw machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, pages 318--328, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chris Lattner and Vikram Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. SIGPLAN Not., 40(6):129--142, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jaydeep Marathe, Frank Mueller, Tushar Mohan, Sally A. Mckee, Bronis R. De Supinski, and Andy Yoo. Metric: Memory tracing via dynamic binary rewriting to identify cache inefficiencies. ACM Transactions on Programming Languages and Systems, 29, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nicholas Nethercote and Julian Seward. Valgrind: A program supervision framework. Electronic Notes in Theoretical Computer Science, 89(2):44--66, 2003. RV '2003, Run-time Verification (Satellite Workshop of CAV '03).Google ScholarGoogle ScholarCross RefCross Ref
  15. Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not., 42(6):89--100, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, and P. G. Kjeldsberg. Data and memory optimization techniques for embedded systems. ACM Trans. Des. Autom. Electron. Syst., 6(2):149--206, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Cheung P.Y.K. Qiang Liu Constantinides, G.A. Masselos. Automatic on-chip memory minimization for data reuse. In Field-Programmable Custom Computing Machines, 2007. FCCM 2007. 15th Annual IEEE Symposium on, pages 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shai Rubin, Rastislav Bodık, and Trishul Chilimbi. An efficient profile-analysis framework for data-layout optimizations. In POPL ';02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 140--153, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. NVidia CUDA SDK. http://www.nvidia.com/object/cuda_showcase.html.Google ScholarGoogle Scholar
  20. Jaewon Seo, Taewhan Kim, and Preeti Ranjan Panda. Memory allocation and mapping in high-level synthesis: an integrated approach. IEEE Trans. Very Large Scale Integr. Syst., 11(5):928--938, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Greg Stitt, Zhi Guo, Frank Vahid, and Walid Najjar. Techniques for synthesizing binaries to an advanced register/memory structure. In In FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays, pages 118--124. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Weinhardt and W. Luk. Pipeline vectorization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 234--248, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiangyu Zhang and Rajiv Gupta. Whole execution traces and their applications. ACM Transactions on Architecture and Code Optimization, 2:301--334, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Peng Zhao, Shimin Cui, Yaoqing Gao, Raúl Silvera, and José Nelson Amaral. Forma: A framework for safe automatic array reshaping. ACM Trans. Program. Lang. Syst., 30(1):2, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic memory partitioning: increasing memory parallelism via data structure partitioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
        October 2010
        348 pages
        ISBN:9781605589053
        DOI:10.1145/1878961

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 October 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate280of864submissions,32%

        Upcoming Conference

        ESWEEK '24
        Twentieth Embedded Systems Week
        September 29 - October 4, 2024
        Raleigh , NC , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader