Evaluation of Regularization Techniques for Transformers-Based Models

Oliveira, Hugo S.; Ribeiro, Pedro P.; Oliveira, Helder P.

doi:10.1007/978-3-031-36616-1_25

Hugo S. Oliveira^11,12,
Pedro P. Ribeiro^11,12 &
Helder P. Oliveira^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

833 Accesses

Abstract

In recent years the great success of transformers-based models initially employed in Natural Language (NLP) tasks has led to the development of several transformers variations to be employed in a wide range of domains, such as vision. With the correct amount of training data and proper training, transformers can perform excellently compared to the Convolution Neural Networks (CNN) counterpart in the vision tasks. However, the main drawback of transformers concerns the know memory requirements that often exceed the available training platform, growing in a quadratic form regarding the input image size, and a great tendency to overfit.

Several works address the memory problem by relaxing the model architecture versions, but mainly with reduced prediction capabilities. In this work, we evaluate Random Patch erasing among the image patch level of the transformer model as a regularization technique to reduce overfitting while at the same time alleviating training time. The evaluated regularization technique achieves competitive results on several image classification medical datasets. The evaluated Visual Transformers (ViT) models allow to be trained in a single GPU, reaching similar results to CNN counterparts, obtaining an accuracy 91.2%, 79.2% in two competitive image datasets, and reducing the training time on average by 22% on the transformers models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Tenney, I., Das, D., Pavlick, E.: Bert rediscovers the classical nlp pipeline, arXiv preprint arXiv:1905.05950 (2019)
Liu, Y., Lapata, M.: Text summarization with pretrained encoders, arXiv preprint arXiv:1908.08345 (2019)
Nguyen, C., Asad, Z., Huo, Y.: Evaluating transformer-based semantic segmentation networks for pathological image segmentation, arXiv preprint arXiv:2108.11993 (2021)
Dosovitskiy, A., et al.: An image is worth \(16\, \times \, 16\) words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Yang, H., Chen, J., Xu, M.: Fundus disease image classification based on improved transformer. In: 2021 International Conference on Neuromorphic Computing (ICNC), pp. 207–214. IEEE (2021)
Google Scholar
Chu, X., et al.: Conditional positional encodings for vision transformers, arXiv preprint arXiv:2102.10882 (2021)
Amir, S., Gandelsman, Y., Bagon, S., Dekel, T.: Deep vit features as dense visual descriptors, vol. 2, no. 3, p. 4 (2021). arXiv preprint arXiv:2112.05814
Yu, S., et al.: MIL-VT: multiple instance learning enhanced vision transformer for fundus image classification. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 45–54. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_5
Chapter Google Scholar
Dai, Y., Gao, Y., Liu, F.: Transmed: transformers advance multi-modal medical image classification. Diagnostics 11(8), 1384 (2021)
Article Google Scholar
Lu, M., et al.: Smile: sparse-attention based multiple instance contrastive learning for glioma sub-type classification using pathological images. In: MICCAI Workshop on Computational Pathology, pp. 159–169. PMLR (2021)
Google Scholar
Khan, A., Lee, B.: Gene transformer: transformers for the gene expression-based classification of lung cancer subtypes, arXiv preprint arXiv:2108.11833 (2021)
Gheflati, B., Rivaz, H.: Vision transformer for classification of breast ultrasound images (2021). arXiv preprint arXiv:2110.14731
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data Brief 28, 104863 (2020)
Article Google Scholar
Shah, S.M., Khan, R.A., Arif, S., Sajid, U.: Artificial intelligence for breast cancer detection: trends & directions, arXiv preprint arXiv:2110.00942 (2021)
Sandy, N., Plevritis Sylvia, K.: Nsclc radiogenomics: initial stanford study of 26 cases. the cancer imaging archive (2014)
Google Scholar
Chen, H., et al.: Gashis-transformer: a multi-scale visual transformer approach for gastric histopathology image classification, arXiv preprint arXiv:2104.14528 (2021)
Jiang, Z., Dong, Z., Wang, L., Jiang, W.: Method for diagnosis of acute lymphoblastic leukemia based on ViT-CNN ensemble model. Computational Intelligence and Neuroscience, vol. 2021 (2021)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018)
Google Scholar
Koutini, K., Schlüter, J., Eghbal-zadeh, H., Widmer, G.: Efficient training of audio transformers with patchout, arXiv preprint arXiv:2110.05069 (2021)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Nahata, H., Singh, S.P.: Deep learning solutions for skin cancer detection and diagnosis. In: Jain, V., Chatterjee, J.M. (eds.) Machine Learning with Health Care Perspective. LAIS, vol. 13, pp. 159–182. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40850-3_8
Chapter Google Scholar
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017)
Google Scholar

Download references

Acknowledgements

This work is Co-financed by Component 5 - Capitalization and Business Innovation, integrated in the Resilience Dimension of the Recovery and Resilience Plan within the scope of the Recovery and Resilience Mechanism (MRR) of the European Union (EU), framed in the Next Generation EU, for the period 2021–2026, and by National Funds through the Portuguese funding agency, FCT-Foundation for Science and Technology Portugal, a PhD Grant Number 2021.06275.

Author information

Authors and Affiliations

Faculty of Sciences, Computer Science Department (DCC), University of Porto, Porto, Portugal
Hugo S. Oliveira, Pedro P. Ribeiro & Helder P. Oliveira
Instituto de Engenharia de Sistemas e Computadores, INESC TEC, Porto, Portugal
Hugo S. Oliveira, Pedro P. Ribeiro & Helder P. Oliveira

Authors

Hugo S. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Pedro P. Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Helder P. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hugo S. Oliveira .

Editor information

Editors and Affiliations

University of Alicante, Alicante, Spain
Antonio Pertusa
University of Alicante, Alicante, Spain
Antonio Javier Gallego
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez
IPO Porto, Coimbra, Portugal
Inês Domingues

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oliveira, H.S., Ribeiro, P.P., Oliveira, H.P. (2023). Evaluation of Regularization Techniques for Transformers-Based Models. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-36616-1_25
Published: 25 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Evaluation of Regularization Techniques for Transformers-Based Models