Abstract
FPGAs are part of a modern data-centre and are used as hardware accelerators, which allows to accelerate applications and adapting to the current compute requirements dynamically. Overlay architectures provide a flexible system, which enables the hardware accelerator to adapt its applications by exchanging (sub-)functions on run-time. Such overlay architectures usually consist of multiple run-time reconfigurable tiles. Multiple tiles can be connected to form an application-specific accelerator. In this paper, we present an AXI-Stream-compliant overlay architecture – called StreamGrid with advanced multi-stream routing architecture, memory (DDR4, HBM) access for the application, and a configuration and monitoring system. Furthermore, the impact of buffering strategies, grid-size, and data width of the AXI-Stream interface is explored in terms of resource utilization and the achievable clock frequency. The fastest configuration of the overlay architecture has a maximum clock frequency of 752 MHz on a Xilinx Alveo U280 FPGA Card. Furthermore, a case study of a database query engine is evaluated and compared to a static design with the same functionality. The raw execution performance is comparable for both design, but the set up times is now drastically reduced from several 10 min to less than 3 ms, efficiently enabling hardware-accelerated queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
arm Inc.: AMBA 4 AXI4-Stream Protocol Specification (2010). https://developer.arm.com/documentation/ihi0051/a/Introduction/About-the-AXI4-Stream-protocol
Bhattacharyya, S.S., Murthy, P.K., Lee, E.A.: Synthesis of embedded software from synchronous dataflow specifications. J. VLSI Signal Process. 21(2), 151–166 (1999). https://doi.org/10.1023/A:1008052406396. http://link.springer.com/10.1023/A:1008052406396
Billauer, E.: Xillybus PCIe Core (2017). http://xillybus.com/downloads/xillybus_product_brief.pdf
Capalija, D., Abdelrahman, T.S.: A high-performance overlay architecture for pipelined execution of data flow graphs. In: 2013 23rd International Conference on Field programmable Logic and Applications, pp. 1–8. IEEE (2013). https://doi.org/10.1109/FPL.2013.6645515. http://ieeexplore.ieee.org/document/6645515/
Kudlur, M., Mahlke, S., Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI 2008, New York, New York, USA, vol. 43, p. 114. ACM Press (2008). https://doi.org/10.1145/1375581.1375596. http://portal.acm.org/citation.cfm?doid=1375581.1375596
Lindtjorn, O., Clapp, R., Pell, O., Fu, H., Flynn, M., Mencer, O.: Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro 31(2), 41–49 (2011). https://doi.org/10.1109/MM.2011.17. http://ieeexplore.ieee.org/document/5719584/
Ma, S., Aklah, Z., Andrews, D.: A run time interpretation approach for creating custom accelerators. In: 2015 25th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE (2015). https://doi.org/10.1109/FPL.2015.7293996. http://ieeexplore.ieee.org/document/7293996/
Mandebi Mbongue, J., Tchuinkou Kwadjo, D., Bobda, C.: FLexiTASK. In: Proceedings of the 2018 on Great Lakes Symposium on VLSI - GLSVLSI 2018, New York, New York, USA, pp. 483–486. ACM Press (2018). https://doi.org/10.1145/3194554.3194644. http://dl.acm.org/citation.cfm?doid=3194554.3194644
Mueller, R., Teubner, J., Alonso, G.: Streams on wires. Proc. VLDB Endow. 2(1), 229–240 (2014). https://doi.org/10.14778/1687627.1687654. http://dl.acm.org/citation.cfm?doid=1687627.1687654
Prabhakar, R., et al.: Plasticine: a reconfigurable architecture for parallel patterns. In: Proceedings - International Symposium on Computer Architecture, vol. 17 (2017). https://doi.org/10.1145/3079856.3080256. https://doi.org/10.1145/3079856.3080256
Werner, S.: Hybrid architecture for hardware-accelerated query processing in semantic web databases based on runtime reconfigurable FPGAs. Ph.D. thesis, e University of Lübeck, Lübeck (2016). https://www.zhb.uni-luebeck.de/epubs/ediss1886.pdf
Werner, S., et al.: Automated composition and execution of hardware-Accelerated operator graphs. In: 10th International Symposium on Reconfigurable and Communication-centric Systems-on-Chip, ReCoSoC 2015 (2015). https://doi.org/10.1109/ReCoSoC.2015.7238078
Werner, S., Heinrich, D., Pionteck, T., Groppe, S.: Semi-static operator graphs for accelerated query execution on FPGAs. Microprocess. Microsyst. 53, 178–189 (2017). https://doi.org/10.1016/J.MICPRO.2017.07.010. https://www.sciencedirect.com/science/article/abs/pii/S0141933117303757?via%3Dihub
Wilson, D., Stitt, G., Coole, J.: A recurrently generated overlay architecture for rapid FPGA application development (2018). https://doi.org/10.1145/3241793.3241797. https://doi.org/10.1145/3241793.3241797
Xilinx Inc: Video Processing Subsystem v2.0 LogiCORE IP Product Guide Vivado Design Suite. Technical report, Xilinx Inc., San Jose, California (2021). https://www.xilinx.com/products/intellectual-property/video-processing-subsystem.html
Xilinx Inc.: Vivado AXI Reference Guide. Technical report, Xilinx Inc., San Jose, California (2017). https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf
Xilinx Inc.: SDAccel Environment Profiling and Optimization Guide. Technical report, Xilinx Inc., San Jose, California (2018). https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug1207-sdaccel-optimization-guide.pdf
Xilinx Inc.: Adaptable Accelerator Cards for Data Center Workloads. Technical report, Xilinx Inc., San Jose, California (2019). www.xilinx.com/u280
Xilinx Inc.: Vivado Design Suite User Guide Dynamic Function eXchange. Technical report, Xilinx Inc., San Jose, California (2020). https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug909-vivado-partial-reconfiguration.pdf
Xilinx Inc.: UltraScale+ FPGA Product Tables and Product Selection Guide. Technical report, Xilinx Inc., San Jose, California (2021). https://www.xilinx.com/support/documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Blochwitz, C., Philipp, L., Berekovic, M., Pionteck, T. (2021). StreamGrid - An AXI-Stream-Compliant Overlay Architecture. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-79025-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79024-0
Online ISBN: 978-3-030-79025-7
eBook Packages: Computer ScienceComputer Science (R0)