Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool

Castano-Londono, Luis; Alzate Anzola, Cristian; Marquez-Viloria, David; Gallo, Guillermo; Osorio, Gustavo

doi:10.1007/978-3-030-31019-6_5

Luis Castano-Londono^12,14,
Cristian Alzate Anzola¹²,
David Marquez-Viloria¹²,
Guillermo Gallo¹³ &
…
Gustavo Osorio¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1052))

Included in the following conference series:

Workshop on Engineering Applications

1280 Accesses

Abstract

Iterative stencil computations are present in many scientific and engineering applications. The acceleration of stencil codes using parallel architectures has been widely studied. The parallelization of the stencil computation on FPGA based heterogeneous architectures has been reported with the use of traditional RTL logic design or the use of directives in C/C++ codes on high level synthesis tools. In both cases, it has been shown that FPGAs provide better performance per watt compared to CPU or GPU-based systems. High level synthesis tools are limited to the use of parallelization directives without evaluating other possibilities of their application based on the adaptation of the algorithm. In this document, it is proposed a division of the inner loop of the stencil-based code in such a way that total latency is reduced using memory partition and pipeline directives. As a case study is used the two-dimensional Laplace equation implemented on a ZedBoard and an Ultra96 board using Vivado HLS. The performance is evaluated according to the amount of inner loop divisions and the on-chip memory partitions, in terms of the latency, power consumption, use of FPGA resources, and speed-up.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Castano, L., Osorio, G.: An approach to the numerical solution of one-dimensional heat equation on SoC FPGA. Revista Científica de Ingeniería Electrónica, Automática y Comunicaciones 38(2), 83–93 (2017). ISSN 1815–5928
Google Scholar
Cattaneo, R., Natale, G., Sicignano, C., Sciuto, D., Santambrogio, M.D.: On how to accelerate iterative stencil loops: a scalable streaming-based approach. ACM Trans. Archit. Code Optim. (TACO) 12(4), 53 (2016)
Google Scholar
Chugh, N., Vasista, V., Purini, S., Bondhugula, U.: A DSL compiler for accelerating image processing pipelines on FPGAs. In: 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 327–338. IEEE (2016)
Google Scholar
Cong, J., Li, P., Xiao, B., Zhang, P.: An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers. In: Proceedings of the 51st Annual Design Automation Conference, pp. 1–6. ACM (2014)
Google Scholar
Deest, G., Estibals, N., Yuki, T., Derrien, S., Rajopadhye, S.: Towards scalable and efficient FPGA stencil accelerators. In: IMPACT 2016 - 6th International Workshop on Polyhedral Compilation Techniques, Held with HIPEAC 2016 (2016)
Google Scholar
Deest, G., Yuki, T., Rajopadhye, S., Derrien, S.: One size does not fit all: implementation trade-offs for iterative stencil computations on FPGAs. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–8. IEEE (2017)
Google Scholar
Del Sozzo, E., Baghdadi, R., Amarasinghe, S., Santambrogio, M.D.: A common backend for hardware acceleration on FPGA. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 427–430. IEEE (2017)
Google Scholar
Escobedo, J., Lin, M.: Graph-theoretically optimal memory banking for stencil-based computing kernels. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 199–208. ACM (2018)
Google Scholar
de Fine Licht, J., Blott, M., Hoefler, T.: Designing scalable FPGA architectures using high-level synthesis. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2018), vol. 53, pp. 403–404. ACM (2018)
Google Scholar
Kobayashi, R., Oobata, Y., Fujita, N., Yamaguchi, Y., Boku, T.: OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 192–201. ACM (2018)
Google Scholar
László, E., Nagy, Z., Giles, M.B., Reguly, I., Appleyard, J., Szolgay, P.: Analysis of parallel processor architectures for the solution of the Black-Scholes PDE. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1977–1980. IEEE (2015)
Google Scholar
Liu, J., Bayliss, S., Constantinides, G.A.: Offline synthesis of online dependence testing: parametric loop pipelining for HLS. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 159–162. IEEE (2015)
Google Scholar
Liu, J., Wickerson, J., Bayliss, S., Constantinides, G.A.: Polyhedral-baseddynamic loop pipelining for high-level synthesis. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 37, 1802–1815 (2017)
Article Google Scholar
Liu, J., Wickerson, J., Constantinides, G.A.: Loop splitting for efficient pipelining in high-level synthesis. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 72–79. IEEE (2016)
Google Scholar
Mokhov, A., et al.: Language and hardware acceleration backend for graph processing. In: 2017 Forum on Specification and Design Languages (FDL), pp. 1–7. IEEE (2017)
Google Scholar
Mondigo, A., Ueno, T., Tanaka, D., Sano, K., Yamamoto, S.: Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs. In: 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), pp. 1–8. IEEE (2017)
Google Scholar
Nacci, A.A., Rana, V., Bruschi, F., Sciuto, D., Beretta, I., Atienza, D.: A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices. In: Proceedings of the 50th Annual Design Automation Conference, p. 52. ACM (2013)
Google Scholar
Natale, G., Stramondo, G., Bressana, P., Cattaneo, R., Sciuto, D., Santambrogio, M.D.: A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops. In: 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2016)
Google Scholar
de Oliveira, C.B., Cardoso, J.M., Marques, E.: High-level synthesis from C vs. a DSL-based approach. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 257–262. IEEE (2014)
Google Scholar
Reagen, B., Adolf, R., Shao, Y.S., Wei, G.Y., Brooks, D.: Machsuite: benchmarks for accelerator design and customized architectures. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 110–119. IEEE (2014)
Google Scholar
Reiche, O., Özkan, M.A., Hannig, F., Teich, J., Schmid, M.: Loop parallelization techniques for FPGA accelerator synthesis. J. Signal Process. Syst. 90(1), 3–27 (2018)
Article Google Scholar
Sakai, R., Sugimoto, N., Miyajima, T., Fujita, N., Amano, H.: Acceleration of full-pic simulation on a CPU-FPGA tightly coupled environment. In: 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pp. 8–14. IEEE (2016)
Google Scholar
Sano, K., Hatsuda, Y., Yamamoto, S.: Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth. IEEE Trans. Parallel Distrib. Syst. 25(3), 695–705 (2014)
Article Google Scholar
Schmid, M., Reiche, O., Schmitt, C., Hannig, F., Teich, J.: Code generation for high-level synthesis of multiresolution applications on FPGAs. arXiv preprint arXiv:1408.4721 (2014)
Shao, Y.S., Reagen, B., Wei, G.Y., Brooks, D.: Aladdin: a pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In: ACM SIGARCH Computer Architecture News, vol. 42, pp. 97–108. IEEE Press (2014)
Google Scholar
Waidyasooriya, H.M., Takei, Y., Tatsumi, S., Hariyama, M.: Opencl-based FPGA-platform for stencil computation and its optimization methodology. IEEE Trans. Parallel Distrib. Syst. 28(5), 1390–1402 (2017)
Article Google Scholar
Wang, S., Liang, Y.: A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model. In: 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2017)
Google Scholar
Zohouri, H.R., Podobas, A., Matsuoka, S.: Combined spatial and temporal blocking for high-performance stencil computation on FPGAs using OpenCL. In: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 153–162. ACM (2018)
Google Scholar

Download references

Acknowledgements

This study were supported by the AE&CC research Group COL0053581, at the Sistemas de Control y Robótica Laboratory, attached to the Instituto Tecnológico Metropolitano. This work is part of the project “Improvement of visual perception in humanoid robots for objects recognition in natural environments using Deep Learning” with ID P17224, co-funded by the Instituto Tecnológico Metropolitano and Universidad de Antioquia.

Author information

Authors and Affiliations

Department of Electronics and Telecommunication Engineering, Instituto Tecnológico Metropolitano ITM, Medellín, Colombia
Luis Castano-Londono, Cristian Alzate Anzola & David Marquez-Viloria
Rynova Research Group, Rymel Company, Medellín, Colombia
Guillermo Gallo
Department of Electrical, Electronics and Computing Engineering, Universidad Nacional de Colombia, Manizales, Colombia
Luis Castano-Londono & Gustavo Osorio

Authors

Luis Castano-Londono
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Alzate Anzola
View author publications
You can also search for this author in PubMed Google Scholar
David Marquez-Viloria
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Gallo
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Osorio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Castano-Londono .

Editor information

Editors and Affiliations

Department of Industrial Engineering, Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García
Universidad Antonio Nariño, Bogotá, Colombia
Mario Duarte-González
Universidad Antonio Nariño, Bogotá, Colombia
Sebastián Jaramillo-Isaza
Universidad del Rosario, Bogotá, Colombia
Alvaro David Orjuela-Cañon
Corporación Unificada Nacional CUN, Bogotá, Colombia
Yesid Díaz-Gutierrez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castano-Londono, L., Alzate Anzola, C., Marquez-Viloria, D., Gallo, G., Osorio, G. (2019). Evaluation of Stencil Based Algorithm Parallelization over System-on-Chip FPGA Using a High Level Synthesis Tool. In: Figueroa-García, J., Duarte-González, M., Jaramillo-Isaza, S., Orjuela-Cañon, A., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2019. Communications in Computer and Information Science, vol 1052. Springer, Cham. https://doi.org/10.1007/978-3-030-31019-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-31019-6_5
Published: 09 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31018-9
Online ISBN: 978-3-030-31019-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics