column

Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization

Authors:
Koji Okina

Nagasaki University, Japan

Nagasaki University, Japan
View Profile

,
Rie Soejima

Nagasaki University, Japan

Nagasaki University, Japan
View Profile

,
Kota Fukumoto

Nagasaki University, Japan

Nagasaki University, Japan
View Profile

,
Yuichiro Shibata

Nagasaki University, Japan

Nagasaki University, Japan
View Profile

,
Kiyoshi Oguri

Nagasaki University, Japan

Nagasaki University, Japan
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 43 Issue 4September 2015pp 9–14https://doi.org/10.1145/2927964.2927967

Published:22 April 2016Publication History

ACM SIGARCH Computer Architecture News

Abstract

This paper discusses power-performance optimization for 3-D stencil computing on a stream-oriented FPGA accelerator with highlevel synthesis. Taking a heat conduction simulation and an FDTD electromagnetic field simulation as benchmark applications, powerperformance profiling results are presented focusing on the effect of high-level pipeline parameters. As a result, it is shown that the optimal power efficiency can be achieved basically by optimizing the execution performance. The relationship between power efficiency and the clock frequency is also discussed.

References

K. Peter, Editor and Study Lead, "ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems," 2008.Google Scholar
T. Ishihara and M. Goudarzi, "System-Level Techniques for Estimating and Reducing Energy Consumption in Real-Time Embedded Systems," International Soc Design Conference, pp. 67--72, 2007.Google Scholar
Y. Sato, Y. Inoguchi, W. Luk, and T. Nakamura, "Evaluating reconfigurable dataflow computing using the Himeno benchmark," in Proceedings of International Conference on ReConFigurable Computing and FPGAs, pp. 1--7, 2012.Google Scholar
H. Giefers, C. Plessl, and J. Förstner, "Accelerating Finite Difference Time Domain Simulations with Reconfigurable Dataflow Computers," in Proceedings of 4th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, pp. 33--38, 2013.Google Scholar
K. Sano, "FPGA-Based Systolic Computational-Memory Array for Scalable Stencil Computations," in High-Performance Computing Using FPGAs (W. Vanderbauwhede and K. Benkrid, eds.), pp. 279--303, Springer New York, 2013.Google Scholar
T. Kobori and T. Maruyama, "A High Speed Computation System for 3D FCHC Lattice Gas Model with FPGA," in Field Programmable Logic and Application (P. Cheung and G. Constantinides, eds.), vol. 2778 of Lecture Notes in Computer Science, pp. 755--765, Springer, 2003.Google Scholar
R. Soejima, K. Okina, K. Dohi, Y. Shibata, and K. Oguri, "A memory profiling framework for stencil computation on an fpga accelerator with high level synthesis," SIGARCH Comput. Archit. News, vol. 42, pp. 69--74, Dec. 2014. Google ScholarDigital Library
Impulse Accelerated Technologies, "Impulse C." http://www.impulseaccelerated.com/.{9} Xilinx, "Vivado HLS Design." http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/index.html.Google Scholar
K. Dohi, K. Fukumoto, Y. Shibata, and K. Oguri, "Performance modeling and optimization of 3-D stencil computation on a stream-based FPGA accelerator," in Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on, pp. 1--6, Dec 2013.Google Scholar
K. Dohi, K. Okina, R. Soejima, Y. Shibata, and K. Oguri, "Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space Exploration," IEICE TRANSACTIONS on Information and Systems, vol. E98-D, pp. 298--308, 2 2015.Google ScholarCross Ref
Maxeler Technologies, "MaxCompiler." http://www.maxeler.com/.Google Scholar
K. H. Tsoi and W. Luk, "Power Profiling and Optimization for Heterogeneous Multi-core Systems," SIGARCH Comput. Archit. News, vol. 39, pp. 8--13, Dec. 2011. Google ScholarDigital Library
H. Ding and M. Huang, "Improve Memory Access for Achieving Both Performance and Energy Efficiencies on Heterogeneous Systems," in Field-Programmable Technology (FPT), 2014 International Conference on, Dec 2014.Google Scholar
D. Chen, J. Cong, Y. Fan, and L. Wan, "LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 18, pp. 564--577, April 2010. Google ScholarDigital Library
J. Nunez-Yanez, "Energy Efficient Reconfigurable Computing with Adaptive Voltage and Logic Scaling," SIGARCH Comput. Archit. News, vol. 42, pp. 87--92, Dec. 2014. Google ScholarDigital Library
J. Yao, Y. Nakashima, N. Devisetti, and K. Yoshimura, "A Tightly Coupled General Purpose Reconfigurable Accelerator LAPP and Its Power States for HotSpot-Based Energy Reduction," IEICE TRANSACTIONS on Information and Systems, vol. E97-D, pp. 3092--3100, 12 2014.Google ScholarCross Ref
D. Llamocca and M. Pattichis, "A Dynamically Reconfigurable Pixel Processor System Based on Power/Energy-Performance-Accuracy Optimization," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 488--502, March 2013. Google ScholarDigital Library

Recommendations

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for ...
Read More
Cross-Accelerator Performance Profiling
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

The computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the ...
Read More
Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 43, Issue 4
HEART '15
September 2015
98 pages
ISSN:0163-5964
DOI:10.1145/2927964
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
Copyright © 2016 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2016
Check for updates
Qualifiers
- column
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 149
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Cross-Accelerator Performance Profiling

Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Recommendations

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Cross-Accelerator Performance Profiling

Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media