Abstract
Since the attention mechanism was proposed, there have been many researches on the combination of deep learning and visual attention mechanism. Among them, Models built with self-attention mechanism have achieved SOTA results in the field of computer vision. However, the large number of parameters and the computational complexity of such models hinder their development and limit their use on resource-limited devices and platforms. In this paper, we make improvements to Windows Attention in Swin Transformer from the perspective of software and hardware co-optimization, and parallelize the design on FPGA platform. In Softmax module design, we use Taylor expansion to replace exp function which needs more computing resources under the premise of less accuracy loss; we also optimize the computational process of matrix multiplication, which is used many times, and design the corresponding hardware module. Experimental results show that our resource consumption on the ZCU102 FPGA decreases by 93% compared to the traditional exp function, and the throughput improves by 7.73\(\times \) and 1.21\(\times \) compared to the CPU and GPU, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Qiu, M., Guo, M., Liu, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. J. Parallel Distrib. Comput. 69(6), 546–558 (2009)
Qiu, M., Chen, Z., Niu, J., et al.: Data allocation for hybrid memory with genetic algorithm. IEEE Trans. Emer. Topics Comput. 3(4), 544–555 (2015)
Qiu, M., Xue, C., Shao, Z., Sha, E.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In:2007 Design, Automation & Test in Europe Conference & Exhibition, pp. 1–6 (2007)
Qiu, M., Xue, C., Shao, Z., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. IEEE Conference on Embedded and Ubiquitous Computing, pp. 25–34 (2006)
Qiu, M., Liu, J., Li, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. IEEE/ACM Conference on Galaxy Community Conference (2011)
Wang, J., Qiu, M., Guo, B.: Enabling real-time information service on telehealth system over cloud-based big data platform. J. Syst. Architect. 72, 69–79 (2017)
Li, J., Qiu, M., et al.: Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Trans. Embed. Comput. Syst. 12(2), 1–22 (2013)
Qiu, M., Li, H., Sha, E.:Heterogeneous real-time embedded software optimization considering hardware platform. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1637–1641 (2009)
Qiu, M., Sha, E., et al.: Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP. J. Parallel Distrib. Comput. 68(4), 443–455 (2008)
Ham, T.J., Jung, S.J., Kim, S., et al.: A\(^{\wedge }\)3: accelerating attention mechanisms in neural networks with approximation. In: IEEE HPCA, pp. 328–341 (2020)
Tay, Y., Dehghani, M., Bahri, D., et al.: Efficient transformers: a survey. arXiv preprint arXiv:2009.06732 (2020)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision Transformer using shifted Windows. arXiv preprint arXiv:2103.14030 (2021)
Zhang, X., Wu, Y., Zhou, P., et al.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embed. Comput. Syst. 20(5s), 1–24 (2021)
Wu, Z., Liu, Z., Lin, J., et al.: Lite transformer with long-short range attention. arXiv preprint arXiv:2004.11886 (2020)
Dong, X., Bao, J., Chen, D., et al.: CSwin transformer: a general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652 (2021)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)
Shen, Z., Zhang, M., Zhao, H., et al.: Efficient attention: attention with linear complexities. In: IEEE/CVF International Conference on Computer Vision, pp. 3531–3539 (2021)
Shazeer, N., Lan, Z., Cheng, Y., et al.: Talking-heads attention. arXiv preprint arXiv:2003.02436 (2020)
Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)
Park, J., Yoon, H., Ahn, D., et al.: OPTIMUS: OPTImized matrix MUltiplication structure for transformer neural network accelerator. Proc. Mach. Learn. Syst. 2, 363–378 (2020)
Lu, S., Wang, M., Liang, S., et al.: Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In: SOCC2020 (2020)
Li, B., Pandey, S., Fang, H., et al.: Ftrans: energy-efficient acceleration of Transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2021)
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, W., Hu, K., Liu, F., Fan, J. (2022). Hardware and Software Co-optimization for Windows Attention. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13370. Springer, Cham. https://doi.org/10.1007/978-3-031-10989-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-031-10989-8_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10988-1
Online ISBN: 978-3-031-10989-8
eBook Packages: Computer ScienceComputer Science (R0)