Skip to main content

Advertisement

Log in

High performance reconfigurable computing for numerical simulation and deep learning

  • Survey Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Due to their customizable on-chip resources, reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors. They have been widely adopted in many important applications, from traditional numerical processing to emerging deep learning systems. Since FPGAs have become promising options for current and future high performance computing, this report summarises and analyses recent FPGA-related efforts, including the latest industrial approaches, the state-of-the-art reconfigurable solutions, and various issues such as on-chip resources and development productivity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Arram, J., Luk, W., Jiang, P.: Ramethy: reconfigurable acceleration of bisulfite sequence alignment. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 250–259 (2015)

  • Arram, J., Pflanzer, M., Kaplan, T., Luk, W.: Fpga acceleration of reference-based compression for genomic data. In: 2015 International Conference on Field Programmable Technology (FPT). IEEE, pp 9–16 (2015)

  • Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Hardware acceleration of genetic sequence alignment. In: International Symposium on Applied Reconfigurable Computing. Springer, Berlin, Heidelberg, pp 13–24 (2013)

  • Arram, J., Tsoi, K.H., Luk, W., Jiang, P.: Reconfigurable acceleration of short read mapping. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, pp 210–217 (2013)

  • Awad, M.: Fpga supercomputing platforms: a survey. In: 2009 International Conference on Field Programmable Logic and Applications. IEEE, pp 564–568 (2009)

  • Azizi, N., Kuon, I., Egier, A., Darabiha, A., Chow, P.: Reconfigurable molecular dynamics simulator. In: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, pp 197–206 (2004)

  • Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., Mccormick, A., Smart, G.: Maxwell—a 64 FPGA supercomputer. In: Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007). IEEE, pp 287–294 (2007)

  • Cass, S.: Taking ai to the edge: Google’s tpu now comes in a maker-friendly package. IEEE Spectr. 56(5), 16–17 (2019)

    Article  Google Scholar 

  • Cong, J., Liu, B., Neuendorffer, S., Noguera, J., Vissers, K., Zhang, Z.: High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 30(4), 473–491 (2011)

    Article  Google Scholar 

  • Craven, S., Athanas, P.: Examining the viability of FPGA supercomputing. EURASIP J. Embed. Syst. 2007(1), 093652 (2007)

    Article  Google Scholar 

  • Dahm, J., Richards, D., Black, A., et al.: Sierra Center of Excellence: Lessons Learned. IBM J. Res. Dev. (2019)

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248−255 (2009)

  • Dimond, R., Racaniere, S., Pell, O.: Accelerating large-scale HPC Applications using FPGAs. In: 2011 IEEE 20th Symposium on Computer Arithmetic. IEEE, pp 191−192

  • Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)

    Article  Google Scholar 

  • Dongarra, J.: Report on the sunway taihulight system. PDF (2016). www.netlib.org. Retrieved 20 June 2016

  • Dongarra, J. J., Meuer, H. W., Strohmaier, E.: Top500 supercomputer sites. (2019)

  • Düben, P.D.: A new number format for ensemble simulations. J. Adv. Model. Earth Syst. 10(11), 2983–2991 (2018)

    Article  Google Scholar 

  • Flynn, M.J.: Some computer organizations and their effectiveness. Comput. IEEE Trans. 100(9), 948–960 (1972)

    Article  Google Scholar 

  • Fu, H., Clapp, R.G., Lindtjorn, O., Wei, T., Yang, G.: Revisiting finite difference and spectral migration methods on diverse parallel architectures. Comput. Geosci. 43, 187–196 (2012)

    Article  Google Scholar 

  • Fu, H., Gan, L., Clapp, R.G., Ruan, H., Pell, O., Mencer, O., Flynn, M., Huang, X., Yang, G.: Scaling reverse time migration performance through reconfigurable dataflow engines. IEEE Micro 34(1), 30–40 (2013)

    Article  Google Scholar 

  • Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 72001 (2016b)

    Article  Google Scholar 

  • Fu, H., Clapp, R.G.: Eliminating the memory bottleneck: an FPGA-based solution for 3d reverse time migration. In: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, pp. 65–74. ACM (2011)

  • Fu, H., Liao, J., Xue, W., Wang, L., Chen, D., Gu, L., Xu, J., Ding, N., Wang, X., He, C., et al.: Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer. In: High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pp. 969–980. IEEE (2016a)

  • Gan, L., Fu, H., Luk, W., et al.: Solving mesoscale atmospheric dynamics using a reconfigurable dataflow architecture. IEEE Micro 37(4), 40–50 (2017)

    Article  Google Scholar 

  • Gan, L., Fu, H., Luk, W., Yang, C., Xue, W., Huang, X., Zhang, Y., Yang, G.: Solving the global atmospheric equations through heterogeneous reconfigurable platforms. ACM Trans. Reconfigurable Technol. Syst. 8(2), 1–16 (2015)

    Article  Google Scholar 

  • Gan, L., Fu, H., Xue, W., Xu, Y., Yang, C., Wang, X., Lv, Z., You, Y., Yang, G., Ou, K.: Scaling and analyzing the stencil performance on multi-core and many-core architectures. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 103–110 (2014)

  • Gan, L., Fu, H., Yang, C., Luk, W., Xue, W., Mencer, O., Huang, X., Yang, G.: A highly-efficient and green data flow engine for solving euler atmospheric equations. In: Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, pp. 1–6. IEEE (2014)

  • Gan, L., Xu, J., Wang, X., Wu, S., Duan, X., Li, Y., Fu, H., Yang, G.: Million-core-scalable simulation of the elastic migration Algorithm on Sunway TaihuLight Supercomputer. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–6 (2019)

  • Gan, L., Fu, H., Luk, W., et al.: Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In: 2013 23rd International Conference on Field programmable Logic and Applications, 1–6 (2013)

  • Gorbachev, Y., Fedorov, M., Slavutin, I., et al.: OpenVINO deep learning workbench: comprehensive analysis and tuning of neural networks inference. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

  • Guan, Y., Liang, H., Xu, N., Wang, W., Shi, S., Chen, X., Sun, G., Zhang, W., Cong, J.: Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp 152–159 (2017)

  • Guo, K., Sui, L., Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: From model to FPGA: Software-hardware co-design for efficient neural network acceleration. In: 2016 IEEE Hot Chips 28 Symposium (HCS). IEEE, pp: 1–27 (2016)

  • Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 44(3), 243–254 (2016)

    Article  Google Scholar 

  • Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S., Wang, Y., et al.: Ese: Efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 75–84 (2017)

  • Hines, J.: Stepping up to summit. Comput Sci Eng 20(2), 78–82 (2018)

    Article  Google Scholar 

  • Hoang, D.T.: Searching genetic databases on Splash 2. In: Proceedings IEEE Workshop on FPGAs for Custom Computing Machines. IEEE, pp 185–191 (1993)

  • Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEE, 136–143 (2013)

  • Hou, J., Zhu, Y., Kong, L., Wang, Z., Huang, T.: A case study of accelerating apache spark with FPGA. In: 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, pp 855–860 (2018)

  • Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: Proceedings of the 2006 workshop on Memory system performance and correctness. pp. 51–60 (2006)

  • Korcyl, G., Korcyl, P.: Optimized implementation of the conjugate gradient algorithm for fpga-based platforms using the dirac-wilson operator as an example (2020). arXiv:2001.05218

  • Kästner, F., Janßen, B., Kautz, F., Hübner, M., Corradi, G.: Hardware/software codesign for convolutional neural networks exploiting dynamic partial reconfiguration on PYNQ. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 154–161 (2018)

  • Lawande, A.G., George, A.D., Lam, H.: Novo-g: a multidimensional torus-based reconfigurable cluster for molecular dynamics. Concurr C.omput. Pract. Exp. 28(8), 2374–2393 (2016)

    Article  Google Scholar 

  • Li, H., Fan, X., Jiao, L., Cao, W., Zhou, X., Wang, L.: A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–9 (2016)

  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)

  • Lu, L., Liang, Y., Xiao, Q., Yan, S.: Evaluating fast algorithms for convolutional neural networks on FPGAS. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 101–108 (2017)

  • Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8 (2017)

  • Ma, Y., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 45–54 (2017)

  • Nakahara, H., Fujii, T., Sato, S.: A fully connected layer elimination for a binarizec convolutional neural network on an FPGA. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), IEEE, pp. 1–4 (2017)

  • Nane, R., Sima, V.M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y.T., Hsiao, H., Brown, S., Ferrandi, F., et al.: A survey and evaluation of fpga high-level synthesis tools. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(10), 1591–1604 (2015)

    Article  Google Scholar 

  • Osburn, J., Anderson, W., Rosenberg, R., Lanzagorta, M.: Early experiences on the NRL cray xd1. In: 2006 HPCMP Users Group Conference (HPCMP-UGC'06). IEEE, pp 347–353 (2006)

  • Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45(2), 27–40 (2017)

    Article  Google Scholar 

  • Prost-Boucle, A., Bourge, A., Pétrot, F., Alemdar, H., Caldwell, N., Leroy, V.: Scalable high-performance architecture for convolutional ternary neural networks on fpga. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp 1–7 (2017)

  • Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35 (2016)

  • Rahman, A., Oh, S., Lee, J., Choi, K.: Design space exploration of fpga accelerators for convolutional neural networks. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp. 1147–1152 (2017)

  • Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7263–7271 (2017)

  • Riel, D.C., Juan, Y., Ko, S.B.: License plate segmentation and recognition system using deep learning and OpenVINO. IET Intelligent Transport Systems 14(2), 119–126 (2020)

    Article  Google Scholar 

  • Sheng, J., Chen, Y., Sanaullah, A., Papamichael, M., Herbordt, M.C.: Hpc on fpga clouds: 3D FFTS and implications for molecular dynamics. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1–4 (2017)

  • Shi, F., Li, H., Gao, Y., Kuschner, B., Zhu, S.C.: Sparse winograd convolutional neural networks on small-scale systolic arrays (2018). arXiv:1810.01973

  • Targett, J.S., Niu, X., Russell, F., Luk, W., Jeffress, S., Duben, P.: Lower precision for higher accuracy: Precision and resolution exploration for shallow water equations. In: 2015 International Conference on Field Programmable Technology (FPT). IEEE, pp 208–211 (2015)

  • Tech, M.: Programming mpc systems white paper. Tech. rep. (2013)

  • The Corerain: Solution (2019). http://www.corerain.com/. Accessed Oct 2019

  • Thomas, D.B., Luk, W., Stumpf, M.: Reconfigurable hardware acceleration of canonical graph labelling. International Workshop on Applied ReconfigurableComputing, pp. 302–313. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  • Venieris, S.I., Bouganis, C.S.: fpgaconvnet: A framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp. 40–47 (2016)

  • Venieris, S.I., Bouganis, C.S.: Latency-driven design for fpga-based convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 1–8 (2017)

  • Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., Chen, D.: Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp. 163–1636 (2018)

  • Wei, X., Liang, Y., Li, X., Yu, C.H., Zhang, P., Cong, J.: TGPA: tile-grained pipeline architecture for low latency cnn inference. In: Proceedings of the International Conference on Computer-Aided Design. pp. 1–8 (2018)

  • Wei, X., Yu, C.H., Zhang, P., Chen, Y., Wang, Y., Hu, H., Liang, Y., Cong, J.: Automated systolic array architecture synthesis for high throughput CNN inference on FPGAS. In: Proceedings of the 54th Annual Design Automation Conference 2017, pp. 1–6 (2017)

  • Xu, J., Fu, H., Luk, W., Gan, L., Shi, W., Xue, W., Yang, C., Jiang, Y., He, C., Yang, G.: Optimizing finite volume method solvers on NVIDIA GPUS. IEEE Trans. Parallel Distrib. Syst. 30(12), 2790–2805 (2019)

    Article  Google Scholar 

  • Yang, C., Xue, W., Fu, H., et al.: A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM SIGPLAN Notices 48(8), 1–12 (2013)

    Article  Google Scholar 

  • Yang, C., Geng, T., Wang, T., Patel, R., Herbordt, M.C.: Fully integrated on-FPGA molecular dynamics simulations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. pp 1–31 (2019)

  • Yang, C., Xue, W., Fu, H., You, H., Wang, X., Ao, Y., Liu, F., Gan, L., Xu, P., Wang, L., et al.: 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, pp 57–68 (2016)

  • Zhang, C., Sun, G., Fang, Z., Zhou, P., Pan, P., Cong, J.: Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(11), 2072–2085 (2018)

    Article  Google Scholar 

  • Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. pp. 161–170 (2015)

  • Zhang, C., Prasanna, V.: Frequency domain acceleration of convolutional neural networks on cpu-fpga shared memory system. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 35–44 (2017)

  • Zhang, X., Wang, J., Zhu, C., Lin, Y., Xiong, J., Hwu, W.m., Chen, D.: Dnnbuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAS. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), IEEE, pp. 1–8 (2018)

  • Zhao, W., Haohuan, F., Fang, J., Zheng, W., Gan, L., Yang, G.: Optimizing Convolutional Neural Networks on the Sunway TaihuLightSupercomputer. ACM Transactions on Architecture and Code Optimization 15(1), 1–26 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China (grant no. 2016YFA0602200, and 2017YFA0604500), the National Natural Science Foundation of China (grant nos. 61672312, 41374113, 91530323, U1839206, 61962051), and by Center for High Performance Computing and System Simulation, Pilot National Laboratory for Marine Science and Technology (Qingdao).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Gan.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gan, L., Yuan, M., Yang, J. et al. High performance reconfigurable computing for numerical simulation and deep learning. CCF Trans. HPC 2, 196–208 (2020). https://doi.org/10.1007/s42514-020-00032-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00032-x

Keywords

Navigation