Protoformer: Embedding Prototypes for Transformers

Farhangi, Ashkan; Sui, Ning; Hua, Nan; Bai, Haiyan; Huang, Arthur; Guo, Zhishan

doi:10.1007/978-3-031-05933-9_35

Ashkan Farhangi¹³,
Ning Sui¹³,
Nan Hua¹³,
Haiyan Bai¹³,
Arthur Huang¹³ &
…
Zhishan Guo¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13280))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3518 Accesses
8 Citations
38 Altmetric

Abstract

Transformers have been widely applied in text classification. Unfortunately, real-world data contain anomalies and noisy labels that cause challenges for state-of-art Transformers. This paper proposes Protoformer, a novel self-learning framework for Transformers that can leverage problematic samples for text classification. Protoformer features a selection mechanism for embedding samples that allows us to efficiently extract and utilize anomalies prototypes and difficult class prototypes. We demonstrated such capabilities on datasets with diverse textual structures (e.g., Twitter, IMDB, ArXiv). We also applied the framework to several models. The results indicate that Protoformer can improve current Transformers in various empirical settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Logarithm of Maximum Posterior Evidence: Advanced Model Selection for Text Classification

Multi-label, Multi-class Classification Using Polylingual Embeddings

TextSMatch: Safe Semi-supervised Text Classification with Domain Adaption

Notes

1.
For large-scale datasets, one can randomly choose a limited number (e.g., q) of samples per class to develop a triangular similarity matrix $S^{q\times q}$ which can enhance the computational efficiency.
2.
$sign(x)\,=\,1$ for $x > 0$, $sign(x)\,=\,0$ for $x\,=\,0$, and $sign(x)\,=\,-1$ otherwise.
3.
Self-gathered datasets are accessible at https://github.com/ashfarhangi/Protoformer.

References

Adhikari, A., Ram, A., Tang, R., Lin, J.: DocBERT: BERT for document classification. arXiv preprint arXiv:1904.08398 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fiok, K., et al.: A study of the effects of the COVID-19 pandemic on the experience of back pain reported on Twitter® in the United States: a natural language processing approach. Int. J. Environ. Res. Public Health 18(9), 4543 (2021)
Article Google Scholar
Garg, S., Vu, T., Moschitti, A.: TandA: transfer and adapt pre-trained transformer models for answer sentence selection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 7780–7788 (2020)
Google Scholar
Han, J., Luo, P., Wang, X.: Deep self-learning from noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5138–5147 (2019)
Google Scholar
Krishnan, R., Shalit, U., Sontag, D.: Structured inference networks for nonlinear state space models. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 31 (2017)
Google Scholar
Lee, K.H., He, X., Zhang, L., Yang, L.: CleanNet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456 (2018)
Google Scholar
Li, S., et al.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)
Google Scholar
Meyer, D., Leisch, F., Hornik, K.: The support vector machine under test. Neurocomputing 55(1–2), 169–186 (2003)
Article Google Scholar
Pleiss, G., Zhang, T., Elenberg, E., Weinberger, K.Q.: Identifying mislabeled data using the area under the margin ranking. Adv. Neural Inf. Process. Syst. 33, 17044–17056 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wei, H., Feng, L., Chen, X., An, B.: Combating noisy labels by agreement: a joint training method with co-regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13726–13735 (2020)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar

Download references

Acknowledgement

Our work has been supported by the US National Science Foundation under grants No. 2028481, 1937833, and 1850851.

Author information

Authors and Affiliations

University of Central Florida, Orlando, FL, USA
Ashkan Farhangi, Ning Sui, Nan Hua, Haiyan Bai, Arthur Huang & Zhishan Guo

Authors

Ashkan Farhangi
View author publications
You can also search for this author in PubMed Google Scholar
Ning Sui
View author publications
You can also search for this author in PubMed Google Scholar
Nan Hua
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Bai
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhishan Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashkan Farhangi .

Editor information

Editors and Affiliations

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
João Gama
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Tianrui Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Yang Yu
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Enhong Chen
JD iCity, JD Technology & JD Intelligent Cities Research, Beijing, China
Yu Zheng
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Fei Teng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farhangi, A., Sui, N., Hua, N., Bai, H., Huang, A., Guo, Z. (2022). Protoformer: Embedding Prototypes for Transformers. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-05933-9_35
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05932-2
Online ISBN: 978-3-031-05933-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Protoformer: Embedding Prototypes for Transformers