High performance reconfigurable computing for numerical simulation and deep learning

Gan, Lin; Yuan, Ming; Yang, Jinzhe; Zhao, Wenlai; Luk, Wayne; Yang, Guangwen

doi:10.1007/s42514-020-00032-x

High performance reconfigurable computing for numerical simulation and deep learning

Survey Paper
Published: 11 June 2020

Volume 2, pages 196–208, (2020)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Lin Gan^1,2,3,
Ming Yuan^3,4,
Jinzhe Yang^3,5,
Wenlai Zhao^1,2,3,
Wayne Luk⁵ &
…
Guangwen Yang^1,2,3

850 Accesses
Explore all metrics

Abstract

Due to their customizable on-chip resources, reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors. They have been widely adopted in many important applications, from traditional numerical processing to emerging deep learning systems. Since FPGAs have become promising options for current and future high performance computing, this report summarises and analyses recent FPGA-related efforts, including the latest industrial approaches, the state-of-the-art reconfigurable solutions, and various issues such as on-chip resources and development productivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CAOS: CAD as an Adaptive Open-Platform Service for High Performance Reconfigurable Systems

Design Optimization for High-Performance Computing Using FPGA

Simulation environment for reconfigurable virtual accelerators using a field programmable gate array development environment

Article 09 April 2025

References

Arram, J., Luk, W., Jiang, P.: Ramethy: reconfigurable acceleration of bisulfite sequence alignment. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 250–259 (2015)
Arram, J., Pflanzer, M., Kaplan, T., Luk, W.: Fpga acceleration of reference-based compression for genomic data. In: 2015 International Conference on Field Programmable Technology (FPT). IEEE, pp 9–16 (2015)
Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Hardware acceleration of genetic sequence alignment. In: International Symposium on Applied Reconfigurable Computing. Springer, Berlin, Heidelberg, pp 13–24 (2013)
Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Reconfigurable acceleration of short read mapping. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, pp 210–217 (2013)
Awad, M.: Fpga supercomputing platforms: a survey. In: 2009 International Conference on Field Programmable Logic and Applications. IEEE, pp 564–568 (2009)
Azizi, N., Kuon, I., Egier, A., Darabiha, A., Chow, P.: Reconfigurable molecular dynamics simulator. In: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, pp 197–206 (2004)
Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., Mccormick, A., Smart, G.: Maxwell—a 64 FPGA supercomputer. In: Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007). IEEE, pp 287–294 (2007)
Cass, S.: Taking ai to the edge: Google’s tpu now comes in a maker-friendly package. IEEE Spectr. 56(5), 16–17 (2019)
Article Google Scholar
Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K., Zhang, Z.: High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 30(4), 473–491 (2011)
Article Google Scholar
Craven, S., Athanas, P.: Examining the viability of FPGA supercomputing. EURASIP J. Embed. Syst. 2007(1), 093652 (2007)
Article Google Scholar
Dahm, J., Richards, D., Black, A., et al.: Sierra Center of Excellence: Lessons Learned. IBM J. Res. Dev. (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248−255 (2009)
Dimond, R., Racaniere, S., Pell, O.: Accelerating large-scale HPC Applications using FPGAs. In: 2011 IEEE 20th Symposium on Computer Arithmetic. IEEE, pp 191−192
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Article Google Scholar
Dongarra, J.: Report on the sunway taihulight system. PDF (2016). www.netlib.org. Retrieved 20 June 2016
Dongarra, J. J., Meuer, H. W., Strohmaier, E.: Top500 supercomputer sites. (2019)
Düben, P.D.: A new number format for ensemble simulations. J. Adv. Model. Earth Syst. 10(11), 2983–2991 (2018)
Article Google Scholar
Flynn, M.J.: Some computer organizations and their effectiveness. Comput. IEEE Trans. 100(9), 948–960 (1972)
Article Google Scholar
Fu, H., Clapp, R.G., Lindtjorn, O., Wei, T., Yang, G.: Revisiting finite difference and spectral migration methods on diverse parallel architectures. Comput. Geosci. 43, 187–196 (2012)
Article Google Scholar
Fu, H., Gan, L., Clapp, R.G., Ruan, H., Pell, O., Mencer, O., Flynn, M., Huang, X., Yang, G.: Scaling reverse time migration performance through reconfigurable dataflow engines. IEEE Micro 34(1), 30–40 (2013)
Article Google Scholar
Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 72001 (2016b)
Article Google Scholar
Fu, H., Clapp, R.G.: Eliminating the memory bottleneck: an FPGA-based solution for 3d reverse time migration. In: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 65–74. ACM (2011)
Fu, H., Liao, J., Xue, W., Wang, L., Chen, D., Gu, L., Xu, J., Ding, N., Wang, X., He, C., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. In: High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pp. 969–980. IEEE (2016a)
Gan, L., Fu, H., Luk, W., et al.: Solving mesoscale atmospheric dynamics using a reconfigurable dataflow architecture. IEEE Micro 37(4), 40–50 (2017)
Article Google Scholar
Gan, L., Fu, H., Luk, W., Yang, C., Xue, W., Huang, X., Zhang, Y., Yang, G.: Solving the global atmospheric equations through heterogeneous reconfigurable platforms. ACM Trans. Reconfigurable Technol. Syst. 8(2), 1–16 (2015)
Article Google Scholar
Gan, L., Fu, H., Xue, W., Xu, Y., Yang, C., Wang, X., Lv, Z., You, Y., Yang, G., Ou, K.: Scaling and analyzing the stencil performance on multi-core and many-core architectures. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 103–110 (2014)
Gan, L., Fu, H., Yang, C., Luk, W., Xue, W., Mencer, O., Huang, X., Yang, G.: A highly-efficient and green data flow engine for solving euler atmospheric equations. In: Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, pp. 1–6. IEEE (2014)
Gan, L., Xu, J., Wang, X., Wu, S., Duan, X., Li, Y., Fu, H., Yang, G.: Million-core-scalable simulation of the elastic migration Algorithm on Sunway TaihuLight Supercomputer. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–6 (2019)
Gan, L., Fu, H., Luk, W., et al.: Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In: 2013 23rd International Conference on Field programmable Logic and Applications, 1–6 (2013)
Gorbachev, Y., Fedorov, M., Slavutin, I., et al.: OpenVINO deep learning workbench: comprehensive analysis and tuning of neural networks inference. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Guan, Y., Liang, H., Xu, N., Wang, W., Shi, S., Chen, X., Sun, G., Zhang, W., Cong, J.: Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp 152–159 (2017)
Guo, K., Sui, L., Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: From model to FPGA: Software-hardware co-design for efficient neural network acceleration. In: 2016 IEEE Hot Chips 28 Symposium (HCS). IEEE, pp: 1–27 (2016)
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44(3), 243–254 (2016)
Article Google Scholar
Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S., Wang, Y., et al.: Ese: Efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 75–84 (2017)
Hines, J.: Stepping up to summit. Comput Sci Eng 20(2), 78–82 (2018)
Article Google Scholar
Hoang, D.T.: Searching genetic databases on Splash 2. In: Proceedings IEEE Workshop on FPGAs for Custom Computing Machines. IEEE, pp 185–191 (1993)
Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEE, 136–143 (2013)
Hou, J., Zhu, Y., Kong, L., Wang, Z., Huang, T.: A case study of accelerating apache spark with FPGA. In: 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, pp 855–860 (2018)
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: Proceedings of the 2006 workshop on Memory system performance and correctness. pp. 51–60 (2006)
Korcyl, G., Korcyl, P.: Optimized implementation of the conjugate gradient algorithm for fpga-based platforms using the dirac-wilson operator as an example (2020). arXiv:2001.05218
Kästner, F., Janßen, B., Kautz, F., Hübner, M., Corradi, G.: Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 154–161 (2018)
Lawande, A.G., George, A.D., Lam, H.: Novo-g: a multidimensional torus-based reconfigurable cluster for molecular dynamics. Concurr C.omput. Pract. Exp. 28(8), 2374–2393 (2016)
Article Google Scholar
Li, H., Fan, X., Jiao, L., Cao, W., Zhou, X., Wang, L.: A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–9 (2016)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Lu, L., Liang, Y., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on FPGAS. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 101–108 (2017)
Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8 (2017)
Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 45–54 (2017)
Nakahara, H., Fujii, T., Sato, S.: A fully connected layer elimination for a binarizec convolutional neural network on an FPGA. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), IEEE, pp. 1–4 (2017)
Nane, R., Sima, V.M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y.T., Hsiao, H., Brown, S., Ferrandi, F., et al.: A survey and evaluation of fpga high-level synthesis tools. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(10), 1591–1604 (2015)
Article Google Scholar
Osburn, J., Anderson, W., Rosenberg, R., Lanzagorta, M.: Early experiences on the NRL cray xd1. In: 2006 HPCMP Users Group Conference (HPCMP-UGC'06). IEEE, pp 347–353 (2006)
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45(2), 27–40 (2017)
Article Google Scholar
Prost-Boucle, A., Bourge, A., Pétrot, F., Alemdar, H., Caldwell, N., Leroy, V.: Scalable high-performance architecture for convolutional ternary neural networks on fpga. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp 1–7 (2017)
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35 (2016)
Rahman, A., Oh, S., Lee, J., Choi, K.: Design space exploration of fpga accelerators for convolutional neural networks. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp. 1147–1152 (2017)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7263–7271 (2017)
Riel, D.C., Juan, Y., Ko, S.B.: License plate segmentation and recognition system using deep learning and OpenVINO. IET Intelligent Transport Systems 14(2), 119–126 (2020)
Article Google Scholar
Sheng, J., Chen, Y., Sanaullah, A., Papamichael, M., Herbordt, M.C.: Hpc on fpga clouds: 3D FFTS and implications for molecular dynamics. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1–4 (2017)
Shi, F., Li, H., Gao, Y., Kuschner, B., Zhu, S.C.: Sparse winograd convolutional neural networks on small-scale systolic arrays (2018). arXiv:1810.01973
Targett, J.S., Niu, X., Russell, F., Luk, W., Jeffress, S., Duben, P.: Lower precision for higher accuracy: Precision and resolution exploration for shallow water equations. In: 2015 International Conference on Field Programmable Technology (FPT). IEEE, pp 208–211 (2015)
Tech, M.: Programming mpc systems white paper. Tech. rep. (2013)
The Corerain: Solution (2019). http://www.corerain.com/. Accessed Oct 2019
Thomas, D.B., Luk, W., Stumpf, M.: Reconfigurable hardware acceleration of canonical graph labelling. International Workshop on Applied ReconfigurableComputing, pp. 302–313. Springer, Berlin, Heidelberg (2007)
Google Scholar
Venieris, S.I., Bouganis, C.S.: fpgaconvnet: A framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp. 40–47 (2016)
Venieris, S.I., Bouganis, C.S.: Latency-driven design for fpga-based convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–8 (2017)
Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., Chen, D.: Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 163–1636 (2018)
Wei, X., Liang, Y., Li, X., Yu, C.H., Zhang, P., Cong, J.: TGPA: tile-grained pipeline architecture for low latency cnn inference. In: Proceedings of the International Conference on Computer-Aided Design. pp. 1–8 (2018)
Wei, X., Yu, C.H., Zhang, P., Chen, Y., Wang, Y., Hu, H., Liang, Y., Cong, J.: Automated systolic array architecture synthesis for high throughput CNN inference on FPGAS. In: Proceedings of the 54th Annual Design Automation Conference 2017, pp. 1–6 (2017)
Xu, J., Fu, H., Luk, W., Gan, L., Shi, W., Xue, W., Yang, C., Jiang, Y., He, C., Yang, G.: Optimizing finite volume method solvers on NVIDIA GPUS. IEEE Trans. Parallel Distrib. Syst. 30(12), 2790–2805 (2019)
Article Google Scholar
Yang, C., Xue, W., Fu, H., et al.: A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM SIGPLAN Notices 48(8), 1–12 (2013)
Article Google Scholar
Yang, C., Geng, T., Wang, T., Patel, R., Herbordt, M.C.: Fully integrated on-FPGA molecular dynamics simulations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–31 (2019)
Yang, C., Xue, W., Fu, H., You, H., Wang, X., Ao, Y., Liu, F., Gan, L., Xu, P., Wang, L., et al.: 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 57–68 (2016)
Zhang, C., Sun, G., Fang, Z., Zhou, P., Pan, P., Cong, J.: Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(11), 2072–2085 (2018)
Article Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. pp. 161–170 (2015)
Zhang, C., Prasanna, V.: Frequency domain acceleration of convolutional neural networks on cpu-fpga shared memory system. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 35–44 (2017)
Zhang, X., Wang, J., Zhu, C., Lin, Y., Xiong, J., Hwu, W.m., Chen, D.: Dnnbuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAS. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), IEEE, pp. 1–8 (2018)
Zhao, W., Haohuan, F., Fang, J., Zheng, W., Gan, L., Yang, G.: Optimizing Convolutional Neural Networks on the Sunway TaihuLightSupercomputer. ACM Transactions on Architecture and Code Optimization 15(1), 1–26 (2018)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China (grant no. 2016YFA0602200, and 2017YFA0604500), the National Natural Science Foundation of China (grant nos. 61672312, 41374113, 91530323, U1839206, 61962051), and by Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Lin Gan, Wenlai Zhao & Guangwen Yang
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
Lin Gan, Wenlai Zhao & Guangwen Yang
National Supercomputing Center in Wuxi, Jiangsu, China
Lin Gan, Ming Yuan, Jinzhe Yang, Wenlai Zhao & Guangwen Yang
School of Internet of Things Engineering, Jiangnan University, Jiangsu, China
Ming Yuan
Department of Computing, Imperial College London, London, UK
Jinzhe Yang & Wayne Luk

Authors

Lin Gan
View author publications
You can also search for this author inPubMed Google Scholar
Ming Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Jinzhe Yang
View author publications
You can also search for this author inPubMed Google Scholar
Wenlai Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Wayne Luk
View author publications
You can also search for this author inPubMed Google Scholar
Guangwen Yang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lin Gan.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gan, L., Yuan, M., Yang, J. et al. High performance reconfigurable computing for numerical simulation and deep learning. CCF Trans. HPC 2, 196–208 (2020). https://doi.org/10.1007/s42514-020-00032-x

Download citation

Received: 14 February 2020
Accepted: 17 April 2020
Published: 11 June 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s42514-020-00032-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High performance reconfigurable computing for numerical simulation and deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CAOS: CAD as an Adaptive Open-Platform Service for High Performance Reconfigurable Systems

Design Optimization for High-Performance Computing Using FPGA

Simulation environment for reconfigurable virtual accelerators using a field programmable gate array development environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now