Hardware and Software Co-optimization for Windows Attention

Hu, Wei; Hu, Kejie; Liu, Fang; Fan, Jie

doi:10.1007/978-3-031-10989-8_52

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13370))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1877 Accesses

Abstract

Since the attention mechanism was proposed, there have been many researches on the combination of deep learning and visual attention mechanism. Among them, Models built with self-attention mechanism have achieved SOTA results in the field of computer vision. However, the large number of parameters and the computational complexity of such models hinder their development and limit their use on resource-limited devices and platforms. In this paper, we make improvements to Windows Attention in Swin Transformer from the perspective of software and hardware co-optimization, and parallelize the design on FPGA platform. In Softmax module design, we use Taylor expansion to replace exp function which needs more computing resources under the premise of less accuracy loss; we also optimize the computational process of matrix multiplication, which is used many times, and design the corresponding hardware module. Experimental results show that our resource consumption on the ZCU102 FPGA decreases by 93% compared to the traditional exp function, and the throughput improves by 7.73\(\times \) and 1.21\(\times \) compared to the CPU and GPU, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Qiu, M., Guo, M., Liu, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. J. Parallel Distrib. Comput. 69(6), 546–558 (2009)
Article Google Scholar
Qiu, M., Chen, Z., Niu, J., et al.: Data allocation for hybrid memory with genetic algorithm. IEEE Trans. Emer. Topics Comput. 3(4), 544–555 (2015)
Google Scholar
Qiu, M., Xue, C., Shao, Z., Sha, E.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In:2007 Design, Automation & Test in Europe Conference & Exhibition, pp. 1–6 (2007)
Google Scholar
Qiu, M., Xue, C., Shao, Z., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. IEEE Conference on Embedded and Ubiquitous Computing, pp. 25–34 (2006)
Google Scholar
Qiu, M., Liu, J., Li, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. IEEE/ACM Conference on Galaxy Community Conference (2011)
Google Scholar
Wang, J., Qiu, M., Guo, B.: Enabling real-time information service on telehealth system over cloud-based big data platform. J. Syst. Architect. 72, 69–79 (2017)
Google Scholar
Li, J., Qiu, M., et al.: Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Trans. Embed. Comput. Syst. 12(2), 1–22 (2013)
Google Scholar
Qiu, M., Li, H., Sha, E.:Heterogeneous real-time embedded software optimization considering hardware platform. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1637–1641 (2009)
Google Scholar
Qiu, M., Sha, E., et al.: Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP. J. Parallel Distrib. Comput. 68(4), 443–455 (2008)
Article Google Scholar
Ham, T.J., Jung, S.J., Kim, S., et al.: A\(^{\wedge }\)3: accelerating attention mechanisms in neural networks with approximation. In: IEEE HPCA, pp. 328–341 (2020)
Google Scholar
Tay, Y., Dehghani, M., Bahri, D., et al.: Efficient transformers: a survey. arXiv preprint arXiv:2009.06732 (2020)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision Transformer using shifted Windows. arXiv preprint arXiv:2103.14030 (2021)
Zhang, X., Wu, Y., Zhou, P., et al.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embed. Comput. Syst. 20(5s), 1–24 (2021)
Google Scholar
Wu, Z., Liu, Z., Lin, J., et al.: Lite transformer with long-short range attention. arXiv preprint arXiv:2004.11886 (2020)
Dong, X., Bao, J., Chen, D., et al.: CSwin transformer: a general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652 (2021)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)
Shen, Z., Zhang, M., Zhao, H., et al.: Efficient attention: attention with linear complexities. In: IEEE/CVF International Conference on Computer Vision, pp. 3531–3539 (2021)
Google Scholar
Shazeer, N., Lan, Z., Cheng, Y., et al.: Talking-heads attention. arXiv preprint arXiv:2003.02436 (2020)
Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)
Park, J., Yoon, H., Ahn, D., et al.: OPTIMUS: OPTImized matrix MUltiplication structure for transformer neural network accelerator. Proc. Mach. Learn. Syst. 2, 363–378 (2020)
Google Scholar
Lu, S., Wang, M., Liang, S., et al.: Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In: SOCC2020 (2020)
Google Scholar
Li, B., Pandey, S., Fang, H., et al.: Ftrans: energy-efficient acceleration of Transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Download references

Author information

Authors and Affiliations

College of Computer Science, Wuhan University of Science and Technology, Wuhan, China
Wei Hu, Kejie Hu & Jie Fan
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
Wei Hu, Kejie Hu & Jie Fan
School of Computer Science, Wuhan University, Wuhan, China
Fang Liu
Department of Information Engineering, Wuhan Institute of City, Wuhan, China
Fang Liu

Authors

Wei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Kejie Hu
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kejie Hu .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, W., Hu, K., Liu, F., Fan, J. (2022). Hardware and Software Co-optimization for Windows Attention. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13370. Springer, Cham. https://doi.org/10.1007/978-3-031-10989-8_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-10989-8_52
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10988-1
Online ISBN: 978-3-031-10989-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hardware and Software Co-optimization for Windows Attention