Skip to main content

Supporting On-Chip Dynamic Parallelism for Task-Based Hardware Accelerators

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2021)

Abstract

The open-source hardware/software framework TaPaSCo aims to make reconfigurable computing on FPGAs more accessible to non-experts. To this end, it provides an easily usable task-based programming abstraction, and combines this with powerful tool support to automatically implement the individual hardware accelerators and integrate them into usable system-on-chips. Currently, TaPaSCo relies on the host to manage task parallelism and perform the actual task launches. However, for more expressive parallel programming patterns, such as pipelines of task farms, the round trips from the hardware accelerators back to the host for launching child tasks, especially when exploiting data-dependent execution times, quickly add up. The major contribution of this work is the addition of on-chip task scheduling and launching capabilities to TaPaSCo. This enables not only low-latency dynamic task parallelism, it also encompasses the efficient on-chip exchange of parameter values and task results between parent and child accelerator tasks. Our solution is able to handle recursive task structures and is shown to have latency reductions of over 35x compared to the prior approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Canis, A., et al.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 33–36 (2011)

    Google Scholar 

  2. Chen, T., Srinath, S., Batten, C., Suh, G.E.: An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardware. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 55–67. IEEE (2018)

    Google Scholar 

  3. Dubucq, T., Forlini, T., Dos Reis, V.L., Santos, I.: Matrix: bench - benchmarking the state-of-the-art task execution frameworks of many-task computing (2015)

    Google Scholar 

  4. Ernsting, S., Kuchen, H.: A scalable farm skeleton for hybrid parallel and distributed programming. Int. J. Parallel Program. 42(6), 968–987 (2014). https://doi.org/10.1007/s10766-013-0269-2

  5. Heinz, C., Hofmann, J.A., Sommer, L., Koch, A.: Improving job launch rates in the TaPaSCo FPGA middleware by hardware/software-co-design. In: 2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), pp. 22–30 (2020). https://doi.org/10.1109/ROSS51935.2020.00008

  6. Heinz, C., Hofmann, J., Korinth, J., Sommer, L., Weber, L., Koch, A.: The TaPaSCo open-source toolflow. J. Signal Process. Syst. (2021). https://doi.org/10.1007/s11265-021-01640-8

  7. Mühlbach, S., Brunner, M., Roblee, C., Koch, A.: MalCoBox: designing a 10 Gb/s malware collection honeypot using reconfigurable technology. In: 2010 International Conference on Field Programmable Logic and Applications, pp. 592–595 (2010). https://doi.org/10.1109/FPL.2010.116

  8. Prabhakar, R., et al.: Generating configurable hardware from parallel patterns. ACM SIGPLAN Notices 51(4), 651–665 (2016)

    Article  Google Scholar 

  9. Vinçon, T., et al.: nKV in action: accelerating KV-stores on native computation storage with near-data processing. In: Proceedings of the VLDB Endowment, vol. 13 (2020)

    Google Scholar 

  10. Xilinx Inc: Performance and resource utilization for axi4-stream interconnect rtl v1.1. https://www.xilinx.com/support/documentation/ip_documentation/ru/axis-interconnect.html#virtexuplus

Download references

Acknowledgment

This research was funded by the German Federal Ministry for Education and Research (BMBF) in project 01 IS 17091 B.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carsten Heinz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Heinz, C., Koch, A. (2021). Supporting On-Chip Dynamic Parallelism for Task-Based Hardware Accelerators. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79025-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79024-0

  • Online ISBN: 978-3-030-79025-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics