Abstract
Due to their customizable on-chip resources, reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors. They have been widely adopted in many important applications, from traditional numerical processing to emerging deep learning systems. Since FPGAs have become promising options for current and future high performance computing, this report summarises and analyses recent FPGA-related efforts, including the latest industrial approaches, the state-of-the-art reconfigurable solutions, and various issues such as on-chip resources and development productivity.







Similar content being viewed by others
References
Arram, J., Luk, W., Jiang, P.: Ramethy: reconfigurable acceleration of bisulfite sequence alignment. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 250–259 (2015)
Arram, J., Pflanzer, M., Kaplan, T., Luk, W.: Fpga acceleration of reference-based compression for genomic data. In: 2015 International Conference on Field Programmable Technology (FPT). IEEE, pp 9–16 (2015)
Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Hardware acceleration of genetic sequence alignment. In: International Symposium on Applied Reconfigurable Computing. Springer, Berlin, Heidelberg, pp 13–24 (2013)
Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Reconfigurable acceleration of short read mapping. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, pp 210–217 (2013)
Awad, M.: Fpga supercomputing platforms: a survey. In: 2009 International Conference on Field Programmable Logic and Applications. IEEE, pp 564–568 (2009)
Azizi, N., Kuon, I., Egier, A., Darabiha, A., Chow, P.: Reconfigurable molecular dynamics simulator. In: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, pp 197–206 (2004)
Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., Mccormick, A., Smart, G.: Maxwell—a 64 FPGA supercomputer. In: Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007). IEEE, pp 287–294 (2007)
Cass, S.: Taking ai to the edge: Google’s tpu now comes in a maker-friendly package. IEEE Spectr. 56(5), 16–17 (2019)
Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K., Zhang, Z.: High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 30(4), 473–491 (2011)
Craven, S., Athanas, P.: Examining the viability of FPGA supercomputing. EURASIP J. Embed. Syst. 2007(1), 093652 (2007)
Dahm, J., Richards, D., Black, A., et al.: Sierra Center of Excellence: Lessons Learned. IBM J. Res. Dev. (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248−255 (2009)
Dimond, R., Racaniere, S., Pell, O.: Accelerating large-scale HPC Applications using FPGAs. In: 2011 IEEE 20th Symposium on Computer Arithmetic. IEEE, pp 191−192
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Dongarra, J.: Report on the sunway taihulight system. PDF (2016). www.netlib.org. Retrieved 20 June 2016
Dongarra, J. J., Meuer, H. W., Strohmaier, E.: Top500 supercomputer sites. (2019)
Düben, P.D.: A new number format for ensemble simulations. J. Adv. Model. Earth Syst. 10(11), 2983–2991 (2018)
Flynn, M.J.: Some computer organizations and their effectiveness. Comput. IEEE Trans. 100(9), 948–960 (1972)
Fu, H., Clapp, R.G., Lindtjorn, O., Wei, T., Yang, G.: Revisiting finite difference and spectral migration methods on diverse parallel architectures. Comput. Geosci. 43, 187–196 (2012)
Fu, H., Gan, L., Clapp, R.G., Ruan, H., Pell, O., Mencer, O., Flynn, M., Huang, X., Yang, G.: Scaling reverse time migration performance through reconfigurable dataflow engines. IEEE Micro 34(1), 30–40 (2013)
Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 72001 (2016b)
Fu, H., Clapp, R.G.: Eliminating the memory bottleneck: an FPGA-based solution for 3d reverse time migration. In: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 65–74. ACM (2011)
Fu, H., Liao, J., Xue, W., Wang, L., Chen, D., Gu, L., Xu, J., Ding, N., Wang, X., He, C., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. In: High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pp. 969–980. IEEE (2016a)
Gan, L., Fu, H., Luk, W., et al.: Solving mesoscale atmospheric dynamics using a reconfigurable dataflow architecture. IEEE Micro 37(4), 40–50 (2017)
Gan, L., Fu, H., Luk, W., Yang, C., Xue, W., Huang, X., Zhang, Y., Yang, G.: Solving the global atmospheric equations through heterogeneous reconfigurable platforms. ACM Trans. Reconfigurable Technol. Syst. 8(2), 1–16 (2015)
Gan, L., Fu, H., Xue, W., Xu, Y., Yang, C., Wang, X., Lv, Z., You, Y., Yang, G., Ou, K.: Scaling and analyzing the stencil performance on multi-core and many-core architectures. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 103–110 (2014)
Gan, L., Fu, H., Yang, C., Luk, W., Xue, W., Mencer, O., Huang, X., Yang, G.: A highly-efficient and green data flow engine for solving euler atmospheric equations. In: Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, pp. 1–6. IEEE (2014)
Gan, L., Xu, J., Wang, X., Wu, S., Duan, X., Li, Y., Fu, H., Yang, G.: Million-core-scalable simulation of the elastic migration Algorithm on Sunway TaihuLight Supercomputer. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–6 (2019)
Gan, L., Fu, H., Luk, W., et al.: Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In: 2013 23rd International Conference on Field programmable Logic and Applications, 1–6 (2013)
Gorbachev, Y., Fedorov, M., Slavutin, I., et al.: OpenVINO deep learning workbench: comprehensive analysis and tuning of neural networks inference. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Guan, Y., Liang, H., Xu, N., Wang, W., Shi, S., Chen, X., Sun, G., Zhang, W., Cong, J.: Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp 152–159 (2017)
Guo, K., Sui, L., Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: From model to FPGA: Software-hardware co-design for efficient neural network acceleration. In: 2016 IEEE Hot Chips 28 Symposium (HCS). IEEE, pp: 1–27 (2016)
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44(3), 243–254 (2016)
Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S., Wang, Y., et al.: Ese: Efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 75–84 (2017)
Hines, J.: Stepping up to summit. Comput Sci Eng 20(2), 78–82 (2018)
Hoang, D.T.: Searching genetic databases on Splash 2. In: Proceedings IEEE Workshop on FPGAs for Custom Computing Machines. IEEE, pp 185–191 (1993)
Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEE, 136–143 (2013)
Hou, J., Zhu, Y., Kong, L., Wang, Z., Huang, T.: A case study of accelerating apache spark with FPGA. In: 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, pp 855–860 (2018)
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: Proceedings of the 2006 workshop on Memory system performance and correctness. pp. 51–60 (2006)
Korcyl, G., Korcyl, P.: Optimized implementation of the conjugate gradient algorithm for fpga-based platforms using the dirac-wilson operator as an example (2020). arXiv:2001.05218
Kästner, F., Janßen, B., Kautz, F., Hübner, M., Corradi, G.: Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 154–161 (2018)
Lawande, A.G., George, A.D., Lam, H.: Novo-g: a multidimensional torus-based reconfigurable cluster for molecular dynamics. Concurr C.omput. Pract. Exp. 28(8), 2374–2393 (2016)
Li, H., Fan, X., Jiao, L., Cao, W., Zhou, X., Wang, L.: A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–9 (2016)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Lu, L., Liang, Y., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on FPGAS. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 101–108 (2017)
Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8 (2017)
Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 45–54 (2017)
Nakahara, H., Fujii, T., Sato, S.: A fully connected layer elimination for a binarizec convolutional neural network on an FPGA. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), IEEE, pp. 1–4 (2017)
Nane, R., Sima, V.M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y.T., Hsiao, H., Brown, S., Ferrandi, F., et al.: A survey and evaluation of fpga high-level synthesis tools. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(10), 1591–1604 (2015)
Osburn, J., Anderson, W., Rosenberg, R., Lanzagorta, M.: Early experiences on the NRL cray xd1. In: 2006 HPCMP Users Group Conference (HPCMP-UGC'06). IEEE, pp 347–353 (2006)
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45(2), 27–40 (2017)
Prost-Boucle, A., Bourge, A., Pétrot, F., Alemdar, H., Caldwell, N., Leroy, V.: Scalable high-performance architecture for convolutional ternary neural networks on fpga. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp 1–7 (2017)
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35 (2016)
Rahman, A., Oh, S., Lee, J., Choi, K.: Design space exploration of fpga accelerators for convolutional neural networks. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp. 1147–1152 (2017)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7263–7271 (2017)
Riel, D.C., Juan, Y., Ko, S.B.: License plate segmentation and recognition system using deep learning and OpenVINO. IET Intelligent Transport Systems 14(2), 119–126 (2020)
Sheng, J., Chen, Y., Sanaullah, A., Papamichael, M., Herbordt, M.C.: Hpc on fpga clouds: 3D FFTS and implications for molecular dynamics. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1–4 (2017)
Shi, F., Li, H., Gao, Y., Kuschner, B., Zhu, S.C.: Sparse winograd convolutional neural networks on small-scale systolic arrays (2018). arXiv:1810.01973
Targett, J.S., Niu, X., Russell, F., Luk, W., Jeffress, S., Duben, P.: Lower precision for higher accuracy: Precision and resolution exploration for shallow water equations. In: 2015 International Conference on Field Programmable Technology (FPT). IEEE, pp 208–211 (2015)
Tech, M.: Programming mpc systems white paper. Tech. rep. (2013)
The Corerain: Solution (2019). http://www.corerain.com/. Accessed Oct 2019
Thomas, D.B., Luk, W., Stumpf, M.: Reconfigurable hardware acceleration of canonical graph labelling. International Workshop on Applied ReconfigurableComputing, pp. 302–313. Springer, Berlin, Heidelberg (2007)
Venieris, S.I., Bouganis, C.S.: fpgaconvnet: A framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp. 40–47 (2016)
Venieris, S.I., Bouganis, C.S.: Latency-driven design for fpga-based convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–8 (2017)
Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., Chen, D.: Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 163–1636 (2018)
Wei, X., Liang, Y., Li, X., Yu, C.H., Zhang, P., Cong, J.: TGPA: tile-grained pipeline architecture for low latency cnn inference. In: Proceedings of the International Conference on Computer-Aided Design. pp. 1–8 (2018)
Wei, X., Yu, C.H., Zhang, P., Chen, Y., Wang, Y., Hu, H., Liang, Y., Cong, J.: Automated systolic array architecture synthesis for high throughput CNN inference on FPGAS. In: Proceedings of the 54th Annual Design Automation Conference 2017, pp. 1–6 (2017)
Xu, J., Fu, H., Luk, W., Gan, L., Shi, W., Xue, W., Yang, C., Jiang, Y., He, C., Yang, G.: Optimizing finite volume method solvers on NVIDIA GPUS. IEEE Trans. Parallel Distrib. Syst. 30(12), 2790–2805 (2019)
Yang, C., Xue, W., Fu, H., et al.: A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM SIGPLAN Notices 48(8), 1–12 (2013)
Yang, C., Geng, T., Wang, T., Patel, R., Herbordt, M.C.: Fully integrated on-FPGA molecular dynamics simulations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–31 (2019)
Yang, C., Xue, W., Fu, H., You, H., Wang, X., Ao, Y., Liu, F., Gan, L., Xu, P., Wang, L., et al.: 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 57–68 (2016)
Zhang, C., Sun, G., Fang, Z., Zhou, P., Pan, P., Cong, J.: Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(11), 2072–2085 (2018)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. pp. 161–170 (2015)
Zhang, C., Prasanna, V.: Frequency domain acceleration of convolutional neural networks on cpu-fpga shared memory system. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 35–44 (2017)
Zhang, X., Wang, J., Zhu, C., Lin, Y., Xiong, J., Hwu, W.m., Chen, D.: Dnnbuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAS. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), IEEE, pp. 1–8 (2018)
Zhao, W., Haohuan, F., Fang, J., Zheng, W., Gan, L., Yang, G.: Optimizing Convolutional Neural Networks on the Sunway TaihuLightSupercomputer. ACM Transactions on Architecture and Code Optimization 15(1), 1–26 (2018)
Acknowledgements
This work was supported in part by the National Key R&D Program of China (grant no. 2016YFA0602200, and 2017YFA0604500), the National Natural Science Foundation of China (grant nos. 61672312, 41374113, 91530323, U1839206, 61962051), and by Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Gan, L., Yuan, M., Yang, J. et al. High performance reconfigurable computing for numerical simulation and deep learning. CCF Trans. HPC 2, 196–208 (2020). https://doi.org/10.1007/s42514-020-00032-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-020-00032-x