On the OpenCL Support for Streaming Fixed-Function Accelerators on Embedded SoC FPGAs

Mousouliotis, Panagiotis; Leppänen, Topi; Jääskeläinen, Pekka; Petrellis, Nikos; Christakos, Panagiotis; Keramidas, Georgios; Antonopoulos, Christos; Voros, Nikolaos

doi:10.1007/978-3-031-42921-7_4

Panagiotis Mousouliotis¹¹,
Topi Leppänen¹²,
Pekka Jääskeläinen¹²,
Nikos Petrellis¹¹,
Panagiotis Christakos¹¹,
Georgios Keramidas¹¹,
Christos Antonopoulos¹¹ &
…
Nikolaos Voros¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14251))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

518 Accesses

Abstract

OpenCL is used in contemporary FPGA High-level Synthesis (HLS) design tools for the development of the host-side code which controls the data transfer between the processing system and the FPGA design. High performance FPGA designs in embedded SoC FPGAs often make use of data movers with streaming capabilities for the direct data transfer between the host’s main memory and the local memory of the FPGA accelerator. Unfortunately, the OpenCL memory model does not currently support streaming data movement between the host system and the FPGA accelerator. Earlier work has shown up to 8x latency improvement in data transfer when streaming data movement is used. To emphasize on this important issue, this work extends the Portable Computing Language (PoCL) OpenCL framework to support direct streaming data movement between the host’s main memory and the accelerator’s local memory. Furthermore, this work uses the CNN-Grinder workflow to map the execution of a traffic sign recognition Convolutional Neural Network (CNN) on the SqueezeJet-3 FPGA accelerator in order to showcase the details of controlling the SqueezeJet-3 streaming accelerator from a PoCL application. Results show that it is possible to achieve high performance accelerator execution and efficiently control an FPGA streaming accelerator on an embedded SoC FPGA using OpenCL augmented with direct streaming data transfer capabilities between the host and the kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://benchmark.ini.rub.de/gtsrb_dataset.html.

References

AMD Xilinx: Vitis Unified Software Platform Documentation, Application Acceleration Development, UG1393 (v2022.2), 7 December 2022. https://docs.xilinx.com/viewer/book-attachment/aJhJw9uEf3GPMuRNo0jveg/5jCMHSlRPIRfufLlzZMsOQ. Accessed 31 Mar 2023
Cong, J., et al.: FPGA HLS today: successes, challenges, and opportunities. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 15(4), 1–42 (2022)
Article Google Scholar
Gysel, P., Pimentel, J., Motamedi, M., Ghiasi, S.: Ristretto: a framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Networks Learn. Syst. 29(11), 5784–5789 (2018)
Article Google Scholar
Hoozemans, J., Van Straten, J., Viitanen, T., Tervo, A., Kadlec, J., Al-Ars, Z.: ALMARVI execution platform: heterogeneous video processing SoC platform on FPGA. J. Sig. Process. Syst. 91, 61–73 (2019)
Article Google Scholar
HSA™ Foundation: HSA Platform System Architecture Specification v1.2. http://hsa.glossner.org/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf. Accessed 31 Mar 2023
Intel: Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide. https://cdrdv2.intel.com/v1/dl/getContent/749418?fileName=aocl_programming_guide-683846-749418.pdf. Accessed 31 Mar 2023
Jääskeläinen, P., Sanchez de La Lama, C., Schnetter, E., Raiskila, K., Takala, J., Berg, H.: pocl: a performance-portable OpenCL implementation. Int. J. Parallel Program. 43(5), 752–785 (2015)
Google Scholar
Kang, K., Yiannacouras, P.: Host pipes: direct streaming interface between OpenCL host and Kernel. In: Proceedings of the 5th International Workshop on OpenCL, pp. 1–2 (2017)
Google Scholar
Khronos® OpenCL Working Group: The OpenCL™ Specification. https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf. Accessed 31 Mar 2023
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lahti, S., Sjövall, P., Vanne, J., Hämäläinen, T.D.: Are we there yet? A study on the state of high-level synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 898–911 (2018)
Article Google Scholar
Leppänen, T., Lotvonen, A., Jääskeläinen, P.: Cross-vendor programming abstraction for diverse heterogeneous platforms. Frontiers Comput. Sci. 4 (2022)
Google Scholar
Leppänen, T., Lotvonen, A., Mousouliotis, P., Multanen, J., Keramidas, G., Jääskeläinen, P.: Efficient OpenCL system integration of non-blocking FPGA accelerators. Microprocess. Microsyst., 104772 (2023)
Google Scholar
Leppänen, T., Mousouliotis, P., Keramidas, G., Multanen, J., Jääskeläinen, P.: Unified OpenCL integration methodology for FPGA designs. In: 2021 IEEE Nordic Circuits and Systems Conference (NorCAS), pp. 1–7. IEEE (2021)
Google Scholar
Mousouliotis, P., Tampouratzis, N., Papaefstathiou, I.: SqueezeJet-3: an HLS-based accelerator for edge CNN applications on SoC FPGAs. In: 2023 XXIX International Conference on Information, Communication and Automation Technologies (ICAT), pp. 1–6. IEEE (2023)
Google Scholar
Mousouliotis, P.G., Petrou, L.P.: CNN-grinder: from algorithmic to high-level synthesis descriptions of CNNs for low-end-low-cost FPGA SoCs. Microprocess. Microsyst. 73, 102990 (2020). https://doi.org/10.1016/j.micpro.2020.102990
Article Google Scholar
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The German traffic sign recognition benchmark: a multi-class classification competition. In: The 2011 International Joint Conference on Neural Networks, pp. 1453–1460. IEEE (2011)
Google Scholar
Xilinx: SDSoC Profiling and Optimization Guide. https://www.xilinx.com/support/documents/sw_manuals/xilinx2019_1/ug1235-sdsoc-optimization-guide.pdf. Accessed 31 Mar 2023

Download references

Acknowledgments

This work has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No 872614 - SMART4ALL: Selfsustained CrossBorder Customized Cyberphysical System Experiments for Capacity Building among European Stakeholders.

Author information

Authors and Affiliations

University of Peloponnese, Patra, Greece
Panagiotis Mousouliotis, Nikos Petrellis, Panagiotis Christakos, Georgios Keramidas, Christos Antonopoulos & Nikolaos Voros
Tampere University, Tampere, Finland
Topi Leppänen & Pekka Jääskeläinen

Authors

Panagiotis Mousouliotis
View author publications
You can also search for this author in PubMed Google Scholar
Topi Leppänen
View author publications
You can also search for this author in PubMed Google Scholar
Pekka Jääskeläinen
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Petrellis
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Christakos
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Keramidas
View author publications
You can also search for this author in PubMed Google Scholar
Christos Antonopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Voros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Mousouliotis .

Editor information

Editors and Affiliations

Università degli Studi di Sassari, Sassari, Italy
Francesca Palumbo
Aristotle University of Thessaloniki, Thessaloniki, Greece
Georgios Keramidas
University of Peloponnese, Patras, Greece
Nikolaos Voros
University of Porto, Porto, Portugal
Pedro C. Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mousouliotis, P. et al. (2023). On the OpenCL Support for Streaming Fixed-Function Accelerators on Embedded SoC FPGAs. In: Palumbo, F., Keramidas, G., Voros, N., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2023. Lecture Notes in Computer Science, vol 14251. Springer, Cham. https://doi.org/10.1007/978-3-031-42921-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-42921-7_4
Published: 16 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42920-0
Online ISBN: 978-3-031-42921-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the OpenCL Support for Streaming Fixed-Function Accelerators on Embedded SoC FPGAs