Abstract
In recent years the great success of transformers-based models initially employed in Natural Language (NLP) tasks has led to the development of several transformers variations to be employed in a wide range of domains, such as vision. With the correct amount of training data and proper training, transformers can perform excellently compared to the Convolution Neural Networks (CNN) counterpart in the vision tasks. However, the main drawback of transformers concerns the know memory requirements that often exceed the available training platform, growing in a quadratic form regarding the input image size, and a great tendency to overfit.
Several works address the memory problem by relaxing the model architecture versions, but mainly with reduced prediction capabilities. In this work, we evaluate Random Patch erasing among the image patch level of the transformer model as a regularization technique to reduce overfitting while at the same time alleviating training time. The evaluated regularization technique achieves competitive results on several image classification medical datasets. The evaluated Visual Transformers (ViT) models allow to be trained in a single GPU, reaching similar results to CNN counterparts, obtaining an accuracy 91.2%, 79.2% in two competitive image datasets, and reducing the training time on average by 22% on the transformers models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Tenney, I., Das, D., Pavlick, E.: Bert rediscovers the classical nlp pipeline, arXiv preprint arXiv:1905.05950 (2019)
Liu, Y., Lapata, M.: Text summarization with pretrained encoders, arXiv preprint arXiv:1908.08345 (2019)
Nguyen, C., Asad, Z., Huo, Y.: Evaluating transformer-based semantic segmentation networks for pathological image segmentation, arXiv preprint arXiv:2108.11993 (2021)
Dosovitskiy, A., et al.: An image is worth \(16\, \times \, 16\) words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Yang, H., Chen, J., Xu, M.: Fundus disease image classification based on improved transformer. In: 2021 International Conference on Neuromorphic Computing (ICNC), pp. 207–214. IEEE (2021)
Chu, X., et al.: Conditional positional encodings for vision transformers, arXiv preprint arXiv:2102.10882 (2021)
Amir, S., Gandelsman, Y., Bagon, S., Dekel, T.: Deep vit features as dense visual descriptors, vol. 2, no. 3, p. 4 (2021). arXiv preprint arXiv:2112.05814
Yu, S., et al.: MIL-VT: multiple instance learning enhanced vision transformer for fundus image classification. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 45–54. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_5
Dai, Y., Gao, Y., Liu, F.: Transmed: transformers advance multi-modal medical image classification. Diagnostics 11(8), 1384 (2021)
Lu, M., et al.: Smile: sparse-attention based multiple instance contrastive learning for glioma sub-type classification using pathological images. In: MICCAI Workshop on Computational Pathology, pp. 159–169. PMLR (2021)
Khan, A., Lee, B.: Gene transformer: transformers for the gene expression-based classification of lung cancer subtypes, arXiv preprint arXiv:2108.11833 (2021)
Gheflati, B., Rivaz, H.: Vision transformer for classification of breast ultrasound images (2021). arXiv preprint arXiv:2110.14731
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data Brief 28, 104863 (2020)
Shah, S.M., Khan, R.A., Arif, S., Sajid, U.: Artificial intelligence for breast cancer detection: trends & directions, arXiv preprint arXiv:2110.00942 (2021)
Sandy, N., Plevritis Sylvia, K.: Nsclc radiogenomics: initial stanford study of 26 cases. the cancer imaging archive (2014)
Chen, H., et al.: Gashis-transformer: a multi-scale visual transformer approach for gastric histopathology image classification, arXiv preprint arXiv:2104.14528 (2021)
Jiang, Z., Dong, Z., Wang, L., Jiang, W.: Method for diagnosis of acute lymphoblastic leukemia based on ViT-CNN ensemble model. Computational Intelligence and Neuroscience, vol. 2021 (2021)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018)
Koutini, K., Schlüter, J., Eghbal-zadeh, H., Widmer, G.: Efficient training of audio transformers with patchout, arXiv preprint arXiv:2110.05069 (2021)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Nahata, H., Singh, S.P.: Deep learning solutions for skin cancer detection and diagnosis. In: Jain, V., Chatterjee, J.M. (eds.) Machine Learning with Health Care Perspective. LAIS, vol. 13, pp. 159–182. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40850-3_8
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017)
Acknowledgements
This work is Co-financed by Component 5 - Capitalization and Business Innovation, integrated in the Resilience Dimension of the Recovery and Resilience Plan within the scope of the Recovery and Resilience Mechanism (MRR) of the European Union (EU), framed in the Next Generation EU, for the period 2021–2026, and by National Funds through the Portuguese funding agency, FCT-Foundation for Science and Technology Portugal, a PhD Grant Number 2021.06275.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Oliveira, H.S., Ribeiro, P.P., Oliveira, H.P. (2023). Evaluation of Regularization Techniques for Transformers-Based Models. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-36616-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)