Skip to main content
Log in

Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the position-wise feed-forward network. However, the above two components of transformers are position-independent, which causes transformers to be weak in modeling sentence structures. Existing studies commonly utilized positional encoding or mask strategies for capturing the structural information of sentences. In this paper, we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects, containing the absolute position of tokens, the relative distance, and the direction between tokens. We propose a novel bidirectional Transformer with absolute-position aware relative position encoding (BiAR-Transformer) that combines the positional encoding and the mask strategy together. We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding. Meanwhile, we apply a bidirectional mask strategy for modeling the direction between tokens. Experimental results on the natural language inference, paraphrase identification, sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior performance than other strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems. 2017, 5998–6008

  2. Guo M, Zhang Y, Liu T. Gaussian transformer: a lightweight approach for natural language inference. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 6489–6496

  3. Yu A W, Dohan D, Luong M T, Zhao R, Chen K, Norouzi M, Le Q V. QANet: combining local convolution with global self-attention for reading comprehension. In: Proceedings of the 6th International Conference on Learning Representations. 2018

  4. Dai Z, Yang Z, Yang Y, Carbonell J, Le Q V, Salakhutdinov R. Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 2978–2988

  5. Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186

  6. Shen T, Jiang J, Zhou T, Pan S, Long G, Zhang C. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 5446–5455

  7. Bowman S R, Angeli G, Potts C, Manning C D. A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 632–642

  8. Williams A, Nangia N, Bowman S. A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018, 1112–1122

  9. Wang Z, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 4144–4150

  10. Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, 1532–1543

  11. Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A. Advances in pre-training distributed word representations. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018

  12. Hendrycks D, Gimpel K. Gaussian error linear units (GELUs). 2016, arXiv preprint arXiv: 1606.08415

  13. Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019

  14. Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z. Star-transformer. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 1315–1325

  15. Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015, 1556–1566

  16. Wu W, Wang H, Liu T, Ma S. Phrase-level self-attention networks for universal sentence encoding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3729–3738

  17. Im J, Cho S. Distance-based self-attention network for natural language inference. 2017, arXiv preprint arXiv: 1712.02047

  18. Kim S, Kang I, Kwak N. Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 6586–6593

  19. Talman A, Yli-Jyrä A, Tiedemann J. Sentence embeddings in NLI with iterative refinement encoders. Natural Language Engineering, 2019, 25(4): 467–482

    Article  Google Scholar 

  20. Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018, 464–468

  21. Chen K, Wang R, Utiyama M, Sumita E. Recurrent positional embedding for neural machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 1361–1367

  22. Chen K, Wang R, Utiyama M, Sumita E. Neural machine translation with reordering embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 1787–1799

  23. Zheng Z, Huang S, Weng R, Dai X Y, Chen J. Improving self-attention networks with sequential relations. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1707–1716

    Article  Google Scholar 

  24. Hewitt J, Manning C D. A structural probe for finding syntax in word representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4129–4138

  25. Wang Y, Lee H Y, Chen Y N. Tree transformer: integrating tree structures into self-attention. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 1061–1070

Download references

Acknowledgements

This work was supported by the Key Development Program of the Ministry of Science and Technology (2019YFF0303003), the National Natural Science Foundation of China (Grant No.61976068) and “Hundreds, Millions” Engineering Science and Technology Major Special Project of Heilongjiang Province (2020ZX14A02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Additional information

Le Qi is a PhD student in the Research Center for Social Computing and Information Retrieval, School of Computer Science and Technology, Harbin Institute of Technology, China. His research interests lie in semantic text matching and question answering system.

Yu Zhang is a professor in the Research Center for Social Computing and Information Retrieval, School of Computer Science and Technology, Harbin Institute of Technology, China. His primary research interest is question answering and personalized information retrieval.

Ting Liu is a professor in the Research Center for Social Computing and Information Retrieval, School of Computer Science and Technology, Harbin Institute of Technology, China. His primary research interest is natural language processing, information retrieval, and social computing.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, L., Zhang, Y. & Liu, T. Bidirectional Transformer with absolute-position aware relative position encoding for encoding sentences. Front. Comput. Sci. 17, 171301 (2023). https://doi.org/10.1007/s11704-022-0610-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-0610-2

Keywords

Navigation