Research on image recognition of ethnic minority clothing based on improved vision transformer

Taishen Wang; Bin Wen

doi:10.3934/mfc.2022054

Article Contents

2024, Volume 7, Issue 1: 84-97. Doi: 10.3934/mfc.2022054

This issue Previous Article Granule description based on possible attribute analysis Next Article Short-term household load forecasting based on Stacking-SCN

Research on image recognition of ethnic minority clothing based on improved vision transformer

Taishen Wang^1, and
Bin Wen^2, ,

1.
School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China
2.
Yunnan Key Laboratory of Smart Education, Yunnan Normal University, Kunming 650500, China

^* Corresponding author: Bin Wen
^* Corresponding author: Bin Wen

Received: May 2022

Revised: September 2022

Early access: November 2022

Published: February 2024

Abstract / Introduction Full Text(HTML) Figure(5) / Table(4) Related Papers Cited by

Abstract

Due to the complex ornamentation and special composition of ethnic minority costumes, the performance of current costume image recognition algorithms is limited.Models based on convolutional neural networks can extract deep semantic features from clothing images, and perform better in datasets with more images, but ignore the large-scale features of images along the dimensional direction. Therefore, we propose an improved model based on Vision Transformer, which extracts the features of the image along the height and width directions through asymmetric convolution, and then inputs them into the Transformer encoder for serialization and encoding, and uses its output to get the recognition result. Using the accuracy as the evaluation index on the minority clothing dataset, the results show that the method we proposed performs better than ResNet34, and is 1.2% higher than the classic Vision Transformer.

Keywords:

Mathematics Subject Classification: Primary: 68T07, 68T45.

Citation:

Full Text(HTML)

Figure 1. Vision Transformer

Download: Full-size image PowerPoint slide

Improved embedding layer, take convolution kernel $1\times S$ as an example

Download: Full-size image PowerPoint slide

Figure 3. Improved Transformer encoder

Download: Full-size image PowerPoint slide

Figure 4. Improved model based on Vision Transformer

Download: Full-size image PowerPoint slide

Figure 5. Accuracy changes on the training set

Download: Full-size image PowerPoint slide

Table 1. Symbol definition

Symbol	Definition
$\times$	Multiplication of Vectors or Matrixs
$\oplus$	Concatenation of Two Vectors
$+$	Addition of Corresponding Elements in two Matrixs or Vectors

| Show Table

DownLoad: CSV

Table 2. Software and hardware environment used in the experiment

CPU	Intel Core i7-12700KF
Host Memory	32GB
GPU	NVIDIA GeForce RTX3090
GPU Memory	24GB
Operating System	Windows 11
Programming Language	Python
Deep Learning Framework	Pytorch
Dependency Library	Cuda 11.3

| Show Table

DownLoad: CSV

Table 3. Definitions of TP and FN

Number of Samples Predicted	Number of Samples Belonging to the Current Recognition
Number of Samples Predicted to Be Currently Classification	TP
Number of Samples Predicted to Be Other Classification	FN

| Show Table

DownLoad: CSV

Table 4. Results on the Test Set

Used Neural Network	Accuracy	Recall			AUC
Used Neural Network	Accuracy	Hani	Wa	Yi	AUC
ViT base	98.6%	99.12%	99.65%	90.24%	0.9863
ViT Improvement	99.5%	100.00%	99.65%	97.56%	0.9994
ViT Improvement+mask	99.8%	100.00%	100.00%	97.56%	0.9997
Inception v3	99.1%	98.23%	99.31%	100.00%	0.9993
ResNet34	99.3%	99.12%	99.31%	100.00%	0.9965
DenseNet121	99.5%	100.00%	99.65%	97.56%	0.9981

| Show Table

DownLoad: CSV

Related Papers

Cited by

References

[1]	Q.-P. Bao and Z.-F. Sun, Metric learning-based clothing image classification and retrieval, Computer Applications and Software, 34 (2017), 255-259.
[2]	L. Bossard, M. Dantone, C. Leistner and et al., Apparel classification with style, Asian Conference on Computer Vision. Springer, Berlin, Heidelberg, Springer, Berlin, Heidelberg, 2012, 321-335.
[3]	H. Chen, A. Gallagher and B. Girod, Describing clothing by semantic attributes, European Conference on Computer Vision, Springer, Berlin, Heidelberg, 2012, 609-623.
[4]	C. Chenbunyanon and J. H. Jiang, Clothing classification with multi-attribute using convolutional neural network, International Computer Symposium, Springer, Singapore, 2018, 190-196.
[5]	Y.-F. Cheng, Feature Extraction and Recognition of Ethnic Minority Costumes, M.E thesis, Guizhou University for Nationalities, 2018.
[6]	A. Dosovitskiy, L. Beyer, A. Kolesnikov and et al., An image is worth 16x16 words: Transformers for image recognition at scale, International Conference on Learning Representations, 2020.
[7]	M. Elleuch, A. Mezghani, M. Khemakhem and et al., Clothing classification using deep CNN architecture based on transfer learning, International Conference on Hybrid Intelligent Systems, Springer, Cham, 2019,240-248.
[8]	K. Hori, S. Okada and K. Nitta, Fashion image classification on mobile phones using layered deep convolutional neural networks, Proceedings of the 15th International Conference on Mobile and Ubiquitous Multimedia, 2016,359-361.
[9]	X.-Q. Jiang and D. Q. Yang, Design and implementation of minority clothing recognition algorithm based on PCA, Computer Knowledge and Technology, 2017.
[10]	A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25 (2012), 1097-1105.
[11]	B. Lao and K. Jagadeesh, Convolutional neural networks for fashion classification and object detection, CCCV 2015: Computer Vision, 2015,120-129.
[12]	Q.-C. Lei, Research and Application of Key Technologies in Image Processing of Ethnic Minority Costumes, M.E thesis, Yunnan Normal University, 2020.
[13]	Z. Liu, Y. Lin, Y. Cao and et al., Swin transformer: Hierarchical vision transformer using shifted windows, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 10012-10022.
[14]	L.-Y. Luo, Construction of National Costume Unicom Learning System Based on Image Recognition Technology, M.E thesis, Yunnan Normal University, 2017.
[15]	M. Shajini and A. Ramanan, A knowledge-sharing semi-supervised approach for fashion clothes classification and attribute prediction, Vis Comput, 2021.
[16]	X.-M. Shen, Research and Implementation of Content-Based Minority Costume Image Retrieval Technology, M.E thesis, Yunnan Normal University, 2016.
[17]	W. Surakarin and P. Chongstitvatana, Predicting types of clothing using SURF and LDP based on Bag of Features, 015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), IEEE, 2015, 1-5.
[18]	A. Vaswani, N. Shazeer, N. Parmar and et al., Attention is all you need, Advances in Neural Information Processing Systems, 2017, 5998-6008.
[19]	S.-M. Wu, L. Liu and X.-D. Fu, et al., Minority clothing recognition combined with human detection and multi-task learning, Journal of Image and Graphics, 24 (2019), 562-572.
[20]	B. Yang, Minority Costume Recognition based on Multi-scale Attention Mechanism, M.E thesis, Yunnan University, 2020.
[21]	B. Yang, D. Xu and H.-Y. Zhang, et al., Recognition of ethnic costumes based on improved DenseNet-BC, Journal of Zhejiang University (Science Edition), 48 (2021), 676-683.
[22]	H.-Y. Zhao, Research on Educational Resources Retrieval of National Costume Image Based on Convolutional Neural Network, M.E thesis, Yunnan Normal University, 2018.