ABSTRACT
In this paper, we propose an efficient memory partitioning algorithm for parallel data access via data reuse. We found that for most of the applications in image and video processing, a large amount of data can be reused among different iterations in a loop nest. Motivated by this observation, we propose to cache these reusable data by on-chip registers. The on-chip registers used to cache the re-fetched data can be organized as chains of registers. The non-reusable data are then partitioned into several memory banks by a memory partition algorithm. We revise the existing padding method to cover cases occurring frequently in our method that some components of partition vector are zeros. Experimental results have demonstrated that compared with the state-of-the-art algorithms the proposed method can reduce the required number of memory banks by 59.8% on average. The corresponding resources for bank mapping is also significantly reduced. The number of LUTs is reduced by 78.6%. The number of Flip-Flops is reduced by 66.8%. The number of DSP48Es is reduced by 41.7%. Moreover, the storage overheads of the proposed method are zeros for most of the widely used access patterns in image filtering.
- M. Fingeroff, High-level synthesis blue book., 2010. Google ScholarDigital Library
- D. T. W. Bruce Jacob, Spencer W. Ng, Memory Systems -- Cache, DRAM, Disk. Denise E.M. Penrose, 2008. Google ScholarDigital Library
- Y. Tatsumi and H. Mattausch, "Fast quadratic increase of multiport-storage-cell area with port number," Electronics Letters, vol. 35, no. 25, pp. 2185--2187, 1999.Google ScholarCross Ref
- Q. Liu, T. Todman, and W. Luk, "Combining optimizations in automated low power design," in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2010, pp. 1791--1796. Google ScholarDigital Library
- Y. B. Asher and N. Rotem, "Automatic memory partitioning: increasing memory parallelism via data structure partitioning," in Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, 2010, pp. 155--162. Google ScholarDigital Library
- J. Cong, W. Jiang, B. Liu, and Y. Zou, "Automatic memory partitioning and scheduling for throughput and power optimization," ACM Transaction on Design Automation of Electronic Systems (TODAES), no. 16, 2011. Google ScholarDigital Library
- Y. Wang, P. Zhang, X. Cheng, and J. Cong, "An integrated and automated memory optimization flow for FPGA behavioral synthesis," in Asia and South Pacific Design Automation Conf.(ASP-DAC), 2012, pp. 257--262.Google Scholar
- P. Li, Y. Wang, P. Zhang, G. Luo, T.Wang, and J.Cong, "Memory paritioning and scheduling co-optimization in behavioral synthesis," in IEEE/ACM International Conference on Computer-Aided Design(ICCAD), 2012, pp. 488--495. Google ScholarDigital Library
- Y. Wang, P. Li, P. Zhang, C. Zhang, and J. Cong, "Memory partitioning for multidimensional arrays in high-level synthesis," in Proceedings of the 50th Annual Design Automation Conference (DAC), 2013. Google ScholarDigital Library
- Y. Wang, P. Li, and J. Cong, "Theory and algorithm for generalized memory partitioning in high-level synthesis," in Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2014. Google ScholarDigital Library
- C. Meng, S. Yin, P. Ouyang, L. Liu, and S. Wei, "Efficient memory partitioning for parallel data access in multidimensional arrays," in Proceedings of the 52th Annual Design Automation Conference (DAC), 2015. Google ScholarDigital Library
- I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt, "A data reuse analysis technique for efficient scratch-pad memory management," in ACM Trans. Des. Autom. Electron. Syst., 2007. Google ScholarDigital Library
- L.-N. Pouchet, P. Zhang, P.Sadayappan, and J. Cong, "Polyhedral-based data reuse optimization for configurable computing," in Proceedings of the 2013 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2013. Google ScholarDigital Library
- J. Cong, P. Zhang, and Y. Zou, "Optimizing memory hierarchy allocation with loop transformations for high-level synthesis," in Proceedings of the 49th Annual Design Automation Conference (DAC), 2012. Google ScholarDigital Library
- J. M. S. Prewitt, Picture processing and psychopictorics. Academic Press, 1970, ch. Object enhancement and extraction.Google Scholar
- M. S. Alfred V.Aho and J. D. Ravi Sethi, Compilers: Principles, Techniques and Tools. Pearson Education, 2007. Google ScholarDigital Library
- J. Cong, H. Huang, C. Liu, and Y. Zou, "A reuse-aware prefetching scheme for scratchpad memory," in Proceedings of the 48th Annual Design Automation Conference (DAC), 2011, pp. 960--965. Google ScholarDigital Library
- {Online}. Available: http://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2014-4.html\BIBentrySTDinterwordspacingGoogle Scholar
- {Online}. Available: http://www.xilinx.com/products/boards-and-kits/ek-v7-vc707-g.html\BIBentrySTDinterwordspacingGoogle Scholar
Index Terms
- Efficient Memory Partitioning for Parallel Data Access via Data Reuse
Recommendations
An Efficient Data Reuse Strategy for Multi-Pattern Data Access
2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)Memory partitioning has been widely adopted to increase the memory bandwidth. Data reuse is a hardware-efficient way to improve data access throughput by exploiting locality in memory access patterns. We found that for many applications in image and video ...
An Efficient Memory Partitioning Approach for Multi-Pattern Data Access via Data Reuse
Memory bandwidth has become a bottleneck that impedes performance improvement during the parallelism optimization of the datapath. Memory partitioning is a practical approach to reduce bank-level conflicts and increase the bandwidth on a field-...
Efficient memory partitioning for parallel data access in multidimensional arrays
DAC '15: Proceedings of the 52nd Annual Design Automation ConferenceMemory bandwidth bottlenecks severely restrict parallel access of data from memory arrays. To increase bandwidth, memory partitioning algorithms have been proposed to access multiple memory banks simultaneously. However, previous partitioning schemes ...
Comments