Processing math: 100%
\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Research on image recognition of ethnic minority clothing based on improved vision transformer

  • * Corresponding author: Bin Wen

    * Corresponding author: Bin Wen
Abstract / Introduction Full Text(HTML) Figure(5) / Table(4) Related Papers Cited by
  • Due to the complex ornamentation and special composition of ethnic minority costumes, the performance of current costume image recognition algorithms is limited.Models based on convolutional neural networks can extract deep semantic features from clothing images, and perform better in datasets with more images, but ignore the large-scale features of images along the dimensional direction. Therefore, we propose an improved model based on Vision Transformer, which extracts the features of the image along the height and width directions through asymmetric convolution, and then inputs them into the Transformer encoder for serialization and encoding, and uses its output to get the recognition result. Using the accuracy as the evaluation index on the minority clothing dataset, the results show that the method we proposed performs better than ResNet34, and is 1.2% higher than the classic Vision Transformer.

    Mathematics Subject Classification: Primary: 68T07, 68T45.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Vision Transformer

    Figure 2.  Improved embedding layer, take convolution kernel 1×S as an example

    Figure 3.  Improved Transformer encoder

    Figure 4.  Improved model based on Vision Transformer

    Figure 5.  Accuracy changes on the training set

    Table 1.  Symbol definition

    Symbol Definition
    × Multiplication of Vectors or Matrixs
    Concatenation of Two Vectors
    + Addition of Corresponding Elements in two Matrixs or Vectors
     | Show Table
    DownLoad: CSV

    Table 2.  Software and hardware environment used in the experiment

    CPU Intel Core i7-12700KF
    Host Memory 32GB
    GPU NVIDIA GeForce RTX3090
    GPU Memory 24GB
    Operating System Windows 11
    Programming Language Python
    Deep Learning Framework Pytorch
    Dependency Library Cuda 11.3
     | Show Table
    DownLoad: CSV

    Table 3.  Definitions of TP and FN

    Number of Samples Predicted Number of Samples Belonging to the Current Recognition
    Number of Samples Predicted to Be Currently Classification TP
    Number of Samples Predicted to Be Other Classification FN
     | Show Table
    DownLoad: CSV

    Table 4.  Results on the Test Set

    Used Neural Network Accuracy Recall AUC
    Hani Wa Yi
    ViT base 98.6% 99.12% 99.65% 90.24% 0.9863
    ViT Improvement 99.5% 100.00% 99.65% 97.56% 0.9994
    ViT Improvement+mask 99.8% 100.00% 100.00% 97.56% 0.9997
    Inception v3 99.1% 98.23% 99.31% 100.00% 0.9993
    ResNet34 99.3% 99.12% 99.31% 100.00% 0.9965
    DenseNet121 99.5% 100.00% 99.65% 97.56% 0.9981
     | Show Table
    DownLoad: CSV
  • [1] Q.-P. Bao and Z.-F. Sun, Metric learning-based clothing image classification and retrieval, Computer Applications and Software, 34 (2017), 255-259. 
    [2] L. Bossard, M. Dantone, C. Leistner and et al., Apparel classification with style, Asian Conference on Computer Vision. Springer, Berlin, Heidelberg, Springer, Berlin, Heidelberg, 2012, 321-335.
    [3] H. Chen, A. Gallagher and B. Girod, Describing clothing by semantic attributes, European Conference on Computer Vision, Springer, Berlin, Heidelberg, 2012, 609-623.
    [4] C. Chenbunyanon and J. H. Jiang, Clothing classification with multi-attribute using convolutional neural network, International Computer Symposium, Springer, Singapore, 2018, 190-196.
    [5] Y.-F. Cheng, Feature Extraction and Recognition of Ethnic Minority Costumes, M.E thesis, Guizhou University for Nationalities, 2018.
    [6] A. Dosovitskiy, L. Beyer, A. Kolesnikov and et al., An image is worth 16x16 words: Transformers for image recognition at scale, International Conference on Learning Representations, 2020.
    [7] M. Elleuch, A. Mezghani, M. Khemakhem and et al., Clothing classification using deep CNN architecture based on transfer learning, International Conference on Hybrid Intelligent Systems, Springer, Cham, 2019,240-248.
    [8] K. Hori, S. Okada and K. Nitta, Fashion image classification on mobile phones using layered deep convolutional neural networks, Proceedings of the 15th International Conference on Mobile and Ubiquitous Multimedia, 2016,359-361.
    [9] X.-Q. Jiang and D. Q. Yang, Design and implementation of minority clothing recognition algorithm based on PCA, Computer Knowledge and Technology, 2017.
    [10] A. KrizhevskyI. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25 (2012), 1097-1105. 
    [11] B. Lao and K. Jagadeesh, Convolutional neural networks for fashion classification and object detection, CCCV 2015: Computer Vision, 2015,120-129.
    [12] Q.-C. Lei, Research and Application of Key Technologies in Image Processing of Ethnic Minority Costumes, M.E thesis, Yunnan Normal University, 2020.
    [13] Z. Liu, Y. Lin, Y. Cao and et al., Swin transformer: Hierarchical vision transformer using shifted windows, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 10012-10022.
    [14] L.-Y. Luo, Construction of National Costume Unicom Learning System Based on Image Recognition Technology, M.E thesis, Yunnan Normal University, 2017.
    [15] M. Shajini and A. Ramanan, A knowledge-sharing semi-supervised approach for fashion clothes classification and attribute prediction, Vis Comput, 2021.
    [16] X.-M. Shen, Research and Implementation of Content-Based Minority Costume Image Retrieval Technology, M.E thesis, Yunnan Normal University, 2016.
    [17] W. Surakarin and P. Chongstitvatana, Predicting types of clothing using SURF and LDP based on Bag of Features, 015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), IEEE, 2015, 1-5.
    [18] A. Vaswani, N. Shazeer, N. Parmar and et al., Attention is all you need, Advances in Neural Information Processing Systems, 2017, 5998-6008.
    [19] S.-M. WuL. Liu and X.-D. Fu, et al., Minority clothing recognition combined with human detection and multi-task learning, Journal of Image and Graphics, 24 (2019), 562-572. 
    [20] B. Yang, Minority Costume Recognition based on Multi-scale Attention Mechanism, M.E thesis, Yunnan University, 2020.
    [21] B. YangD. Xu and H.-Y. Zhang, et al., Recognition of ethnic costumes based on improved DenseNet-BC, Journal of Zhejiang University (Science Edition), 48 (2021), 676-683. 
    [22] H.-Y. Zhao, Research on Educational Resources Retrieval of National Costume Image Based on Convolutional Neural Network, M.E thesis, Yunnan Normal University, 2018.
  • 加载中

Figures(5)

Tables(4)

SHARE

Article Metrics

HTML views(2666) PDF downloads(270) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return