SSA: A Content-Based Sparse Attention Mechanism

Sun, Yang; Hu, Wei; Liu, Fang; Huang, Feihu; Wang, Yonghao

doi:10.1007/978-3-031-10989-8_53

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13370))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1816 Accesses

Abstract

Recently, many scholars have used attention mechanisms to achieve excellent performance results on various neural network applications. However, the attention mechanism also has shortcomings. Firstly, the high computational and storage consumption makes the attention mechanism difficult to apply on long sequences. Second, all tokens are involved in the computation of the attention map, which may increase the influence of noisy tokens on the results and lead to poor training results. Due to these two shortcomings, attention models are usually strictly limited to sequence length. Further, attention models have difficulty in exploiting their excellent properties for modelling long sequences. To solve the above problems, an efficient sparse attention mechanism (SSA) is proposed in this paper. SSA is based on two separate layers: the local layer and the global layer. These two layers jointly encode local sequence information and global context. This new sparse-attention patterns is powerful in accelerating reasoning. The experiments in this paper validate the effectiveness of the SSA mechanism by replacing the self-attentive structure with an SSA structure in a variety of transformer models. The SSA attention mechanism has achieved state-of-the-art performance on several major benchmarks. SSA was validated on a variety of datasets and models encompassing language translation, language modelling and image recognition. With a small improvement in accuracy, the inference calculation speed was increased by 24%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Cordonnier, J.B., Loukas, A., Jaggi, M.: On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584 (2019)
Dai, Z., et al.: Transformer-XL: language modeling with longer-term dependency (2018)
Google Scholar
Gai, K., Du, Z., et al.: Efficiency-aware workload optimizations of heterogeneous cloud computing for capacity planning in financial industry. In: IEEE 2nd CSCloud (2015)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. (CSUR) (2021)
Google Scholar
Kitaev, N., Kaiser, Ł., Levskaya, A.: Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451 (2020)
Kolesnikov, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale (2021)
Google Scholar
Li, Y., Song, Y., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE TII 17(4), 2833–2841 (2020)
Google Scholar
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. arXiv preprint arXiv:2106.04554 (2021)
Liu, M., Zhang, S., et al.: H\(_\infty \) state estimation for discrete-time chaotic systems based on a unified model. IEEE Trans. SMC (B) 42(4), 1053–1063 (2012)
Google Scholar
Lu, R., Jin, X., et al.: A study on big knowledge and its engineering issues. IEEE TKDE 31(9), 1630–1644 (2019)
Google Scholar
Lu, Z., Wang, N., et al.: IoTDeM: an IoT big data-oriented MapReduce performance prediction extended model in multiple edge clouds. JPDC 118, 316–327 (2018)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Niu, J., Gao, Y., et al.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. JPDC 72, 1565–1575 (2012)
Google Scholar
Qiu, H., Qiu, M., Lu, R.: Secure V2X communication network based on intelligent PKI and edge computing. IEEE Netw. 34(42), 172–178 (2019)
Google Scholar
Qiu, H., Qiu, M., Lu, Z.: Selective encryption on ECG data in body sensor network based on supervised machine learning. Inf. Fusion 55, 59–67 (2020)
Article Google Scholar
Qiu, H., Qiu, M., et al.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE J. Biomed. Health Inform. 24, 2499–2505 (2020)
Article Google Scholar
Qiu, H., Zheng, Q., et al.: Deep residual learning-based enhanced JPEG compression in the Internet of Things. IEEE TII 17(3), 2124–2133 (2020)
Google Scholar
Qiu, H., Zheng, Q., et al.: Topological graph convolutional network-based urban traffic flow and density prediction. IEEE ITS 22(7), 4560–4569 (2020)
Google Scholar
Qiu, L., Gai, K., Qiu, M.: Optimal big data sharing approach for tele-health in cloud computing. In: IEEE SmartCloud, pp. 184–189 (2016)
Google Scholar
Qiu, M., Cao, D., et al.: Data transfer minimization for financial derivative pricing using Monte Carlo simulation with GPU in 5G. Int. J. Commun Syst 29(16), 2364–2374 (2016)
Article Google Scholar
Qiu, M., Gai, K., Xiong, Z.: Privacy-preserving wireless communications using bipartite matching in social big data. FGCS 87, 772–781 (2018)
Article Google Scholar
Qiu, M., Guo, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. JPDC 69, 546–558 (2009)
Google Scholar
Qiu, M., Liu, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: IEEE/ACM Conference on GCC (2011)
Google Scholar
Qiu, M., Xue, C., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. In: IEEE EUC Conference, pp. 25–34 (2006)
Google Scholar
Qiu, M., Xue, C., et al.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE Conference, pp. 1–6 (2007)
Google Scholar
Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Linguist. 9, 53–68 (2021)
Article Google Scholar
Tay, Y., Bahri, D., Yang, L., Metzler, D., Juan, D.C.: Sparse sinkhorn attention. In: International Conference on Machine Learning, pp. 9438–9447. PMLR (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, J., Qiu, M., Guo, B.: High reliable real-time bandwidth scheduling for virtual machines with hidden Markov predicting in telehealth platform. FGCS 49, 68–76 (2015)
Article Google Scholar
Wang, S., Zhou, L., et al.: Cluster-former: clustering-based sparse transformer for question answering. In: ACL-IJCNLP, pp. 3958–3968 (2021)
Google Scholar
Wu, C., Wu, F., Qi, T., Huang, Y., Xie, X.: Fastformer: additive attention can be all you need. arXiv preprint arXiv:2108.09084 (2021)
Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distributed system monitoring. JPDC 73(3), 330–340 (2013)
MATH Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1480–1489 (2016)
Google Scholar
Yuan, L., Chen, Y., et al.: Tokens-to-Token ViT: training vision transformers from scratch on ImageNet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021)
Google Scholar
Zhang, Z., Jiang, Y., et al.: STAR: a structure-aware lightweight transformer for real-time image enhancement. In: IEEE/CVF CV, pp. 4106–4115 (2021)
Google Scholar
Zhou, C., Bai, J., et al.: ATRank: an attention-based user behavior modeling framework for recommendation. In: 32nd AAAI (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Wuhan University of Science and Technology, Wuhan, China
Yang Sun, Wei Hu & Feihu Huang
Wuhan University, Wuhan, China
Fang Liu
Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China
Yang Sun, Wei Hu & Feihu Huang
Department of Information Engineering, Wuhan Institute of City, Wuhan, China
Fang Liu
DMT Lab, Birmingham City University, Birmingham, UK
Yonghao Wang

Authors

Yang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Feihu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yonghao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Hu .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Hu, W., Liu, F., Huang, F., Wang, Y. (2022). SSA: A Content-Based Sparse Attention Mechanism. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13370. Springer, Cham. https://doi.org/10.1007/978-3-031-10989-8_53

Download citation

DOI: https://doi.org/10.1007/978-3-031-10989-8_53
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10988-1
Online ISBN: 978-3-031-10989-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics