Supporting On-Chip Dynamic Parallelism for Task-Based Hardware Accelerators

Heinz, Carsten; Koch, Andreas

doi:10.1007/978-3-030-79025-7_6

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12700))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

829 Accesses
2 Citations

Abstract

The open-source hardware/software framework TaPaSCo aims to make reconfigurable computing on FPGAs more accessible to non-experts. To this end, it provides an easily usable task-based programming abstraction, and combines this with powerful tool support to automatically implement the individual hardware accelerators and integrate them into usable system-on-chips. Currently, TaPaSCo relies on the host to manage task parallelism and perform the actual task launches. However, for more expressive parallel programming patterns, such as pipelines of task farms, the round trips from the hardware accelerators back to the host for launching child tasks, especially when exploiting data-dependent execution times, quickly add up. The major contribution of this work is the addition of on-chip task scheduling and launching capabilities to TaPaSCo. This enables not only low-latency dynamic task parallelism, it also encompasses the efficient on-chip exchange of parameter values and task results between parent and child accelerator tasks. Our solution is able to handle recursive task structures and is shown to have latency reductions of over 35x compared to the prior approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Canis, A., et al.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 33–36 (2011)
Google Scholar
Chen, T., Srinath, S., Batten, C., Suh, G.E.: An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardware. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 55–67. IEEE (2018)
Google Scholar
Dubucq, T., Forlini, T., Dos Reis, V.L., Santos, I.: Matrix: bench - benchmarking the state-of-the-art task execution frameworks of many-task computing (2015)
Google Scholar
Ernsting, S., Kuchen, H.: A scalable farm skeleton for hybrid parallel and distributed programming. Int. J. Parallel Program. 42(6), 968–987 (2014). https://doi.org/10.1007/s10766-013-0269-2
Heinz, C., Hofmann, J.A., Sommer, L., Koch, A.: Improving job launch rates in the TaPaSCo FPGA middleware by hardware/software-co-design. In: 2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), pp. 22–30 (2020). https://doi.org/10.1109/ROSS51935.2020.00008
Heinz, C., Hofmann, J., Korinth, J., Sommer, L., Weber, L., Koch, A.: The TaPaSCo open-source toolflow. J. Signal Process. Syst. (2021). https://doi.org/10.1007/s11265-021-01640-8
Mühlbach, S., Brunner, M., Roblee, C., Koch, A.: MalCoBox: designing a 10 Gb/s malware collection honeypot using reconfigurable technology. In: 2010 International Conference on Field Programmable Logic and Applications, pp. 592–595 (2010). https://doi.org/10.1109/FPL.2010.116
Prabhakar, R., et al.: Generating configurable hardware from parallel patterns. ACM SIGPLAN Notices 51(4), 651–665 (2016)
Article Google Scholar
Vinçon, T., et al.: nKV in action: accelerating KV-stores on native computation storage with near-data processing. In: Proceedings of the VLDB Endowment, vol. 13 (2020)
Google Scholar
Xilinx Inc: Performance and resource utilization for axi4-stream interconnect rtl v1.1. https://www.xilinx.com/support/documentation/ip_documentation/ru/axis-interconnect.html#virtexuplus

Download references

Acknowledgment

This research was funded by the German Federal Ministry for Education and Research (BMBF) in project 01 IS 17091 B.

Author information

Authors and Affiliations

Embedded Systems and Applications Group, TU Darmstadt, Darmstadt, Germany
Carsten Heinz & Andreas Koch

Authors

Carsten Heinz
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Koch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carsten Heinz .

Editor information

Editors and Affiliations

IRISA, University of Rennes 1, Rennes, France
Steven Derrien
Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Frank Hannig
INESC-ID, Lisboa, Portugal
Pedro C. Diniz
ENSSAT, University of Rennes 1, Lannion, France
Daniel Chillet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heinz, C., Koch, A. (2021). Supporting On-Chip Dynamic Parallelism for Task-Based Hardware Accelerators. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-79025-7_6
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79024-0
Online ISBN: 978-3-030-79025-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics