Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

Hu, Wei; Li, Heyuan; Liu, Fang; Zhong, Zhiyv

doi:10.1007/978-981-97-2387-4_22

Wei Hu^12,13,
Heyuan Li^12,13,
Fang Liu^14,15 &
…
Zhiyv Zhong^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14333))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

45 Accesses

Abstract

Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to the field of computer vision CV. However, since self-attention lacks some inductive biases inherent to CNNs, it cannot achieve good generalization in the case of insufficient data. To solve this problem, researchers have proposed to combine the convolution module with the self-attention mechanism module to complement the inductive bias lacking by the self-attention mechanism. Many models based on this idea have been generated with good results. However, traditional central processor architectures cannot take good advantage of the parallel nature of these models. Among various computing platforms, FPGA becomes a suitable solution for algorithm acceleration with its high parallelism. At the same time, we note that the combined modules of convolution and self-attention have not received enough attention in terms of acceleration. Therefore, customizing computational units using FPGAs to improve model parallelism is a feasible solution. In this paper, we optimize the parallelism of the combined model of convolution and self-attention, and design algorithm optimization for two of the most complex generic nonlinear functions from the perspective of hardware-software co-optimization to further reduce the hardware complexity and the latency of the whole system, and design the corresponding hardware modules. The design is coded in HDL, a hardware description language, and simulated on a Xilinx FPGA. The experimental results show that the hardware resource consumption of the ZCU216 FPGA-based design is greatly reduced compared to the conventional design, while the throughput is increased by 8.82\(\times \) and 1.23\(\times \) compared to the CPU and GPU, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vaswani, A., et al.: Attention is all you need. arXiv (2017)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020)
Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
Google Scholar
Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: Escaping the big data paradigm with compact transformers (2021)
Google Scholar
Peng, Z., et al.: Conformer: local features coupling global representations for visual recognition (2021)
Google Scholar
Mao, M., et al.: Dual-stream network for visual recognition (2021)
Google Scholar
Lin, J., Han, S., Lin, Y., Wu, Z., Liu, Z.: Lite transformer with long-short range attention (2020)
Google Scholar
Bello, I., Zoph, B., Le, Q., Vaswani, A., Shlens, J.: Attention augmented convolutional networks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020)
Google Scholar
Ham, T.J., et al.: A\(^3\): accelerating attention mechanisms in neural networks with approximation. In: IEEE (2020)
Google Scholar
Zhang, X., Wu, Y., Zhou, P., Tang, X., Hu, J.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embedded Comput. Syst. (TECS) 20(5s), 1–24 (2021)
Google Scholar
Chen, Y., Zhang, N., Yan, J., Zhu, G., Min, G.: Optimization of maintenance personnel dispatching strategy in smart grid. World Wide Web 26(1), 139–162 (2023)
Article Google Scholar
Xu, D., Chen, Y., Cui, N., Li, J.: Towards multi-dimensional knowledge-aware approach for effective community detection in LBSN. In: World Wide Web, pp. 1–24 (2022)
Google Scholar
Li, B., Pandey, S., Fang, H., Lyv, Y., Ding, C.: FTRANS: energy-efficient acceleration of transformers using FPGA. In: ACM (2020)
Google Scholar
Ahmad, A., Pasha, M.A.: FFConv: an FPGA-based accelerator for fast convolution layers in convolutional neural networks. ACM Trans. Embedded Comput. Syst. 19(2), 1–24 (2020)
Article Google Scholar
Guo, K., Zeng, S., Yu, J., Wang, Y., Yang, H.: [DL] a survey of FPGA-based neural network inference accelerators. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 12(1), 1–26 (2019)
Article Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)
Google Scholar
Milakov, M., Gimelshein, N.: Online normalizer calculation for softmax (2018)
Google Scholar
Lu, S., Wang, M., Liang, S., Lin, J., Wang, Z.: Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In: System-on-Chip Conference (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Wuhan University of Science and Technology, Wuhan, China
Wei Hu, Heyuan Li & Zhiyv Zhong
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
Wei Hu, Heyuan Li & Zhiyv Zhong
College of Computer Science, Wuhan University, Wuhan, China
Fang Liu
Department of Information Engineering, Wuhan Institute of City, Wuhan, China
Fang Liu

Authors

Wei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Heyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyv Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heyuan Li .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, W., Li, H., Liu, F., Zhong, Z. (2024). Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14333. Springer, Singapore. https://doi.org/10.1007/978-981-97-2387-4_22

Download citation

DOI: https://doi.org/10.1007/978-981-97-2387-4_22
Published: 28 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2386-7
Online ISBN: 978-981-97-2387-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA