skip to main content
10.1145/2847263.2847264acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Efficient Memory Partitioning for Parallel Data Access via Data Reuse

Published:21 February 2016Publication History

ABSTRACT

In this paper, we propose an efficient memory partitioning algorithm for parallel data access via data reuse. We found that for most of the applications in image and video processing, a large amount of data can be reused among different iterations in a loop nest. Motivated by this observation, we propose to cache these reusable data by on-chip registers. The on-chip registers used to cache the re-fetched data can be organized as chains of registers. The non-reusable data are then partitioned into several memory banks by a memory partition algorithm. We revise the existing padding method to cover cases occurring frequently in our method that some components of partition vector are zeros. Experimental results have demonstrated that compared with the state-of-the-art algorithms the proposed method can reduce the required number of memory banks by 59.8% on average. The corresponding resources for bank mapping is also significantly reduced. The number of LUTs is reduced by 78.6%. The number of Flip-Flops is reduced by 66.8%. The number of DSP48Es is reduced by 41.7%. Moreover, the storage overheads of the proposed method are zeros for most of the widely used access patterns in image filtering.

References

  1. M. Fingeroff, High-level synthesis blue book., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. T. W. Bruce Jacob, Spencer W. Ng, Memory Systems -- Cache, DRAM, Disk. Denise E.M. Penrose, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Tatsumi and H. Mattausch, "Fast quadratic increase of multiport-storage-cell area with port number," Electronics Letters, vol. 35, no. 25, pp. 2185--2187, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  4. Q. Liu, T. Todman, and W. Luk, "Combining optimizations in automated low power design," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2010, pp. 1791--1796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. B. Asher and N. Rotem, "Automatic memory partitioning: increasing memory parallelism via data structure partitioning," in Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, 2010, pp. 155--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cong, W. Jiang, B. Liu, and Y. Zou, "Automatic memory partitioning and scheduling for throughput and power optimization," ACM Transaction on Design Automation of Electronic Systems (TODAES), no. 16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Wang, P. Zhang, X. Cheng, and J. Cong, "An integrated and automated memory optimization flow for FPGA behavioral synthesis," in Asia and South Pacific Design Automation Conf.(ASP-DAC), 2012, pp. 257--262.Google ScholarGoogle Scholar
  8. P. Li, Y. Wang, P. Zhang, G. Luo, T.Wang, and J.Cong, "Memory paritioning and scheduling co-optimization in behavioral synthesis," in IEEE/ACM International Conference on Computer-Aided Design(ICCAD), 2012, pp. 488--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong, "Memory partitioning for multidimensional arrays in high-level synthesis," in Proceedings of the 50th Annual Design Automation Conference (DAC), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Wang, P. Li, and J. Cong, "Theory and algorithm for generalized memory partitioning in high-level synthesis," in Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Meng, S. Yin, P. Ouyang, L. Liu, and S. Wei, "Efficient memory partitioning for parallel data access in multidimensional arrays," in Proceedings of the 52th Annual Design Automation Conference (DAC), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt, "A data reuse analysis technique for efficient scratch-pad memory management," in ACM Trans. Des. Autom. Electron. Syst., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L.-N. Pouchet, P. Zhang, P.Sadayappan, and J. Cong, "Polyhedral-based data reuse optimization for configurable computing," in Proceedings of the 2013 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Cong, P. Zhang, and Y. Zou, "Optimizing memory hierarchy allocation with loop transformations for high-level synthesis," in Proceedings of the 49th Annual Design Automation Conference (DAC), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. S. Prewitt, Picture processing and psychopictorics. Academic Press, 1970, ch. Object enhancement and extraction.Google ScholarGoogle Scholar
  16. M. S. Alfred V.Aho and J. D. Ravi Sethi, Compilers: Principles, Techniques and Tools. Pearson Education, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Cong, H. Huang, C. Liu, and Y. Zou, "A reuse-aware prefetching scheme for scratchpad memory," in Proceedings of the 48th Annual Design Automation Conference (DAC), 2011, pp. 960--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {Online}. Available: http://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2014-4.html\BIBentrySTDinterwordspacingGoogle ScholarGoogle Scholar
  19. {Online}. Available: http://www.xilinx.com/products/boards-and-kits/ek-v7-vc707-g.html\BIBentrySTDinterwordspacingGoogle ScholarGoogle Scholar

Index Terms

  1. Efficient Memory Partitioning for Parallel Data Access via Data Reuse

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
      February 2016
      298 pages
      ISBN:9781450338561
      DOI:10.1145/2847263

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 February 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      FPGA '16 Paper Acceptance Rate20of111submissions,18%Overall Acceptance Rate125of627submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader