As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
In this paper, we discuss a highly parallel implementation of the lattice QCD Wilson Dirac operator on FPGAs. The operator is described as a data-flow graph using OpenSPL and acts on a four-dimensional lattice collecting all next neighbour terms. The so implemented kernel fits on an Altera Stratix V FPGA using fixed-point arithmetic. This allows us to compute all arithmetic operations simultaneously. In addition, the OpenSPL language shows also an implicit optimized locality expressed by the offset operator which is studied on the presented implementation. The lattice is held in DDR3 memory on the FPGA accelerator card and streamed into the FPGA over six memory channels where we use the MAX4 card and the software framework from Maxeler. The operator is memory bound and has an equivalent arithmetic intensity of 0.92 FLOPs/Byte. With a clock frequency of 133 MHz, we get an equivalent theoretical peak performance of 176 GFLOP/s. Therefore, we also address the memory interface and memory access pattern.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.