Abstract
We are designing a custom computing machine for large-scale flui simulation with the building-cube method (BCM). In BCM, parallel computation is performed with cubes, each of which is an orthogonal grid with a f xed resolution of cells. Although BCM is advantageous in balancing loads with cubes, it also has a problem of efficien y and scalability for comptuting with general-purpose supercomputers due to insufficien memory bandwidth and communication overhead of an interconnection network. In this paper, we present a custom computing architecture for FPGA-based scalable BCM computation with a dedicated network, called an accelerator domain network (ADN). We design a cube engine which allows bandwidth-efficien computation of cubes based on streamed stencil computation of the fractional-step method. Through prototype implementation, we evaluate the potential performance of the architecture. For ALTERA Stratix V 28nm FPGA, we estimate that a single FPGA has the peak performance of 107 GFlop/s in a single precision.
- K. Nakahashi. Building-cube method for flow problems with broadband characteristic length. Computational Fluid Dynamics, pages 77--81, 2002.Google Scholar
- S. Takahashi, T. Ishida, K. Nakahashi, H. Kobayashi, K. Okabe, Y. Shimomura, T. Soga, and A. Musa. Large scaled computation of incompressible flows on cartesian mesh using a vector-parallel supercomputer. Parallel Computational Fluid Dynamics, 74:331--338, 2008.Google Scholar
- H. Onda D. Sasaki, A. Deguchi and K. Nakahashi. Landing gear aerodynamic noise prediction using building-cube method. Modelling and Simulation in Engineering, 2012(632387):1--16, 2012. Google ScholarDigital Library
- Michael deLorimier and André DeHon. Floating-point sparse matrix-vector multiply for FPGAs. Proceedings of the International Symposium on Field-Programmable Gate Arrays, pages 75--85, February 2005. Google ScholarDigital Library
- Ling Zhuo and Viktor K. Prasanna. Sparse matrix-vector multiplication on FPGAs. Proceedings of the International Symposium on Field-Programmable Gate Arrays, pages 63--74, February 2005. Google ScholarDigital Library
- Yong Dou, S. Vassiliadis, G. K. Kuzmanov, and G. N. Gaydadjiev. 64-bit floating-poin FPGA matrix multiplication. Proceedings of the International Symposium on Field-Programmable Gate Arrays, pages 86--95, February 2005. Google ScholarDigital Library
- Kentaro Sano, Takanori Iizuka, and Satoru Yamamoto. Systolic architecture for computational flui dynamics on FPGAs. Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, pages 107--116, April 2007. Google ScholarDigital Library
- S. Murtaza, A.G. Hoekstra, and P.M. Sloot. Compute bound and I/O bound cellular automata simulations on FPGA logic. ACM Transactions on Reconfiurable Technology and Systems, 1(4), January 2009. Article 23. Google ScholarDigital Library
- Kentaro Sano, WANG Luzhou, Yoshiaki Hatsuda, Takanori Iizuka, and Satoru Yamamoto. FPGA-array with bandwidth-reduction mechanism for scalable and power-efficient numerical simulations based on finit difference methods. ACM Transactions on Reconfigurable Technology and Systems, 3(4), November 2010. Google ScholarDigital Library
- Kentaro Sano, Yoshiaki Hatsuda, and Satoru Yamamoto. Multi-FPGA accelerator for scalable stencil computation with constant memory-bandwidth. IEEE Transaction on Parallel and Distributed Systems, 25(3):695--705, March 2014. Google ScholarDigital Library
- Kentaro Sano, Yoshiaki Kono, Hayato Suzuki, Ryotaro Chiba, Ryo Ito, Kyo Koizumi, and Satoru Yamamoto. Efficient custom computing of fully-streamed lattice boltzmann method on tightly-coupled FPGA cluster. ACM SIGARCH Computer Architecture News, 2013. To appear. Google ScholarDigital Library
- J. Kim and P. Moin. Application of a fractional-step method to incompressible navier-stokes. Journal of Computational Physics, 59:308--323, June 1985.Google ScholarCross Ref
- John C. Strikwerda and Young S. Lee. The accuracy of the fractional step method. SIAM Journal on Numerical Analysis, 37(1):37--47, November 1999. Google ScholarDigital Library
- Louis A. Hageman and David M. Young. Applied Iterative Methods. Academic Press, 1981.Google Scholar
- Terasic Technologies. http://www.terasic.com, 2014.Google Scholar
- Altera Corporation. http://www.altera.com/literature/, 2014.Google Scholar
- FloPoCo project. http://flopoco.gfo ge.inria.fr, 2014.Google Scholar
Recommendations
Application-Specific FPGA using heterogeneous logic blocks
This work presents a new automatic mechanism to explore the solution space between Field Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs). This new solution is termed as an Application-Specific Inflexible FPGA (ASIF) ...
High throughput architecture for packet classification using FPGA
ANCS '09: Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications SystemsTo avoid packet classification from being the performance bottleneck in network devices, one-chip solution hardware packet classifier based on HiCuts algorithm is designed and implemented in single chip of FPGA. The compact data structure and the ...
Efficient custom computing of fully-streamed lattice boltzmann method on tightly-coupled FPGA cluster
This paper presents the detailed design of a custom computing machine for fully-streamed LBM computation on multiple FPGAs, and evaluates its efficiency with prototype implementation. We design a unit for completely streamed computation including ...
Comments