Abstract
This paper discusses power-performance optimization for 3-D stencil computing on a stream-oriented FPGA accelerator with highlevel synthesis. Taking a heat conduction simulation and an FDTD electromagnetic field simulation as benchmark applications, powerperformance profiling results are presented focusing on the effect of high-level pipeline parameters. As a result, it is shown that the optimal power efficiency can be achieved basically by optimizing the execution performance. The relationship between power efficiency and the clock frequency is also discussed.
- K. Peter, Editor and Study Lead, "ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems," 2008.Google Scholar
- T. Ishihara and M. Goudarzi, "System-Level Techniques for Estimating and Reducing Energy Consumption in Real-Time Embedded Systems," International Soc Design Conference, pp. 67--72, 2007.Google Scholar
- Y. Sato, Y. Inoguchi, W. Luk, and T. Nakamura, "Evaluating reconfigurable dataflow computing using the Himeno benchmark," in Proceedings of International Conference on ReConFigurable Computing and FPGAs, pp. 1--7, 2012.Google Scholar
- H. Giefers, C. Plessl, and J. Förstner, "Accelerating Finite Difference Time Domain Simulations with Reconfigurable Dataflow Computers," in Proceedings of 4th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, pp. 33--38, 2013.Google Scholar
- K. Sano, "FPGA-Based Systolic Computational-Memory Array for Scalable Stencil Computations," in High-Performance Computing Using FPGAs (W. Vanderbauwhede and K. Benkrid, eds.), pp. 279--303, Springer New York, 2013.Google Scholar
- T. Kobori and T. Maruyama, "A High Speed Computation System for 3D FCHC Lattice Gas Model with FPGA," in Field Programmable Logic and Application (P. Cheung and G. Constantinides, eds.), vol. 2778 of Lecture Notes in Computer Science, pp. 755--765, Springer, 2003.Google Scholar
- R. Soejima, K. Okina, K. Dohi, Y. Shibata, and K. Oguri, "A memory profiling framework for stencil computation on an fpga accelerator with high level synthesis," SIGARCH Comput. Archit. News, vol. 42, pp. 69--74, Dec. 2014. Google ScholarDigital Library
- Impulse Accelerated Technologies, "Impulse C." http://www.impulseaccelerated.com/.{9} Xilinx, "Vivado HLS Design." http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/index.html.Google Scholar
- K. Dohi, K. Fukumoto, Y. Shibata, and K. Oguri, "Performance modeling and optimization of 3-D stencil computation on a stream-based FPGA accelerator," in Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on, pp. 1--6, Dec 2013.Google Scholar
- K. Dohi, K. Okina, R. Soejima, Y. Shibata, and K. Oguri, "Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space Exploration," IEICE TRANSACTIONS on Information and Systems, vol. E98-D, pp. 298--308, 2 2015.Google ScholarCross Ref
- Maxeler Technologies, "MaxCompiler." http://www.maxeler.com/.Google Scholar
- K. H. Tsoi and W. Luk, "Power Profiling and Optimization for Heterogeneous Multi-core Systems," SIGARCH Comput. Archit. News, vol. 39, pp. 8--13, Dec. 2011. Google ScholarDigital Library
- H. Ding and M. Huang, "Improve Memory Access for Achieving Both Performance and Energy Efficiencies on Heterogeneous Systems," in Field-Programmable Technology (FPT), 2014 International Conference on, Dec 2014.Google Scholar
- D. Chen, J. Cong, Y. Fan, and L. Wan, "LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 18, pp. 564--577, April 2010. Google ScholarDigital Library
- J. Nunez-Yanez, "Energy Efficient Reconfigurable Computing with Adaptive Voltage and Logic Scaling," SIGARCH Comput. Archit. News, vol. 42, pp. 87--92, Dec. 2014. Google ScholarDigital Library
- J. Yao, Y. Nakashima, N. Devisetti, and K. Yoshimura, "A Tightly Coupled General Purpose Reconfigurable Accelerator LAPP and Its Power States for HotSpot-Based Energy Reduction," IEICE TRANSACTIONS on Information and Systems, vol. E97-D, pp. 3092--3100, 12 2014.Google ScholarCross Ref
- D. Llamocca and M. Pattichis, "A Dynamically Reconfigurable Pixel Processor System Based on Power/Energy-Performance-Accuracy Optimization," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 488--502, March 2013. Google ScholarDigital Library
Recommendations
Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysRecent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for ...
Cross-Accelerator Performance Profiling
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at ScaleThe computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the ...
Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth
Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of ...
Comments