Skip to main content

Hardware and Software Co-optimization for Windows Attention

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2022)

Abstract

Since the attention mechanism was proposed, there have been many researches on the combination of deep learning and visual attention mechanism. Among them, Models built with self-attention mechanism have achieved SOTA results in the field of computer vision. However, the large number of parameters and the computational complexity of such models hinder their development and limit their use on resource-limited devices and platforms. In this paper, we make improvements to Windows Attention in Swin Transformer from the perspective of software and hardware co-optimization, and parallelize the design on FPGA platform. In Softmax module design, we use Taylor expansion to replace exp function which needs more computing resources under the premise of less accuracy loss; we also optimize the computational process of matrix multiplication, which is used many times, and design the corresponding hardware module. Experimental results show that our resource consumption on the ZCU102 FPGA decreases by 93% compared to the traditional exp function, and the throughput improves by 7.73\(\times \) and 1.21\(\times \) compared to the CPU and GPU, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  2. Qiu, M., Guo, M., Liu, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. J. Parallel Distrib. Comput. 69(6), 546–558 (2009)

    Article  Google Scholar 

  3. Qiu, M., Chen, Z., Niu, J., et al.: Data allocation for hybrid memory with genetic algorithm. IEEE Trans. Emer. Topics Comput. 3(4), 544–555 (2015)

    Google Scholar 

  4. Qiu, M., Xue, C., Shao, Z., Sha, E.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In:2007 Design, Automation & Test in Europe Conference & Exhibition, pp. 1–6 (2007)

    Google Scholar 

  5. Qiu, M., Xue, C., Shao, Z., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. IEEE Conference on Embedded and Ubiquitous Computing, pp. 25–34 (2006)

    Google Scholar 

  6. Qiu, M., Liu, J., Li, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. IEEE/ACM Conference on Galaxy Community Conference (2011)

    Google Scholar 

  7. Wang, J., Qiu, M., Guo, B.: Enabling real-time information service on telehealth system over cloud-based big data platform. J. Syst. Architect. 72, 69–79 (2017)

    Google Scholar 

  8. Li, J., Qiu, M., et al.: Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Trans. Embed. Comput. Syst. 12(2), 1–22 (2013)

    Google Scholar 

  9. Qiu, M., Li, H., Sha, E.:Heterogeneous real-time embedded software optimization considering hardware platform. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1637–1641 (2009)

    Google Scholar 

  10. Qiu, M., Sha, E., et al.: Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP. J. Parallel Distrib. Comput. 68(4), 443–455 (2008)

    Article  Google Scholar 

  11. Ham, T.J., Jung, S.J., Kim, S., et al.: A\(^{\wedge }\)3: accelerating attention mechanisms in neural networks with approximation. In: IEEE HPCA, pp. 328–341 (2020)

    Google Scholar 

  12. Tay, Y., Dehghani, M., Bahri, D., et al.: Efficient transformers: a survey. arXiv preprint arXiv:2009.06732 (2020)

  13. Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision Transformer using shifted Windows. arXiv preprint arXiv:2103.14030 (2021)

  14. Zhang, X., Wu, Y., Zhou, P., et al.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embed. Comput. Syst. 20(5s), 1–24 (2021)

    Google Scholar 

  15. Wu, Z., Liu, Z., Lin, J., et al.: Lite transformer with long-short range attention. arXiv preprint arXiv:2004.11886 (2020)

  16. Dong, X., Bao, J., Chen, D., et al.: CSwin transformer: a general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652 (2021)

  17. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)

  18. Shen, Z., Zhang, M., Zhao, H., et al.: Efficient attention: attention with linear complexities. In: IEEE/CVF International Conference on Computer Vision, pp. 3531–3539 (2021)

    Google Scholar 

  19. Shazeer, N., Lan, Z., Cheng, Y., et al.: Talking-heads attention. arXiv preprint arXiv:2003.02436 (2020)

  20. Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)

  21. Park, J., Yoon, H., Ahn, D., et al.: OPTIMUS: OPTImized matrix MUltiplication structure for transformer neural network accelerator. Proc. Mach. Learn. Syst. 2, 363–378 (2020)

    Google Scholar 

  22. Lu, S., Wang, M., Liang, S., et al.: Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In: SOCC2020 (2020)

    Google Scholar 

  23. Li, B., Pandey, S., Fang, H., et al.: Ftrans: energy-efficient acceleration of Transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2021)

    Google Scholar 

  24. Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kejie Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, W., Hu, K., Liu, F., Fan, J. (2022). Hardware and Software Co-optimization for Windows Attention. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13370. Springer, Cham. https://doi.org/10.1007/978-3-031-10989-8_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10989-8_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10988-1

  • Online ISBN: 978-3-031-10989-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics