Skip to main content
Log in

Item-region-based style classification network (IRSN): a fashion style classifier based on domain knowledge of fashion experts

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Fashion style is expressed through the way clothing and accessories are put together, as well as the silhouettes, textiles, colors, and shape details of each fashion item. The challenge of style classification lies in the wide visual variation within the same style and the existence of visually similar styles. Fashion experts categorize fashion styles not only by global appearance but also by the attributes of individual items and their combinations. We propose an item-region-based fashion style classification network (IRSN) that effectively classifies fashion styles by analyzing item-level features and their combinations. IRSN extracts item features using item region pooling (IRP), analyzes them separately, and aggregates them using gated feature fusion (GFF). In addition, IRSN applies a dual-backbone architecture that combines a domain-specific feature extractor and a general feature extractor pretrained with a large general image-text dataset. In the experiment, we evaluated IRSN variants based on six widely used backbones, including EfficientNet, ConvNeXt, and SwinTransformer. The IRSN models outperformed their baseline models by an average of 8.9% and a maximum of 16.7% on the FashionStyle14 dataset, and by an average of 9.4% and a maximum of 17.0% on the ShowniqV3 dataset. The visualization results support that the IRSN models are more effective than the baseline models in capturing differences between similar style classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

FashionStyle14 [4] is available from https://esslab.jp/~ess/en/data/fashionstyle14/. ShowniqV3 is not publicly available because ShowniqV3 was collected for commercial service.

References

  1. Lee MG, Kim HJ (2021) Analysis of the sales promotion strategy of online fashion shopping mall. Korea Inst Cult Prod Des 64:227–240

    Google Scholar 

  2. Kennedy A, Stoehrer EB, Calderin J (2013) Fashion Design, Referenced: A Visual Guide to the History, Language, and Practice of Fashion. Rockport Publishers, Beverly, Mass

    Google Scholar 

  3. Sorger R, Udale J (2006) The Fundamentals of Fashion Design. AVA Publishing, Worthing, West Sussex, United Kingdom

    Google Scholar 

  4. Takagi M, Simo-Serra E, Iizuka S, Ishikawa H (2017) What Makes a Style: Experimental Analysis of Fashion Prediction. In: Proceedings of the international conference on computer vision workshops (ICCVW). https://doi.org/10.1109/ICCVW.2017.263

  5. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  6. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114. PMLR

  7. Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364

    Article  Google Scholar 

  8. Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986

  9. Woo S, Debnath S, Hu R, Chen X, Liu Z, Kweon IS, Xie S (2023) Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16133–16142

  10. Wang W, Dai J, Chen Z, Huang Z, Li Z, Zhu X, Hu X, Lu T, Lu L, Li H et al (2023) Internimage: Exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14408–14419

  11. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30

  12. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations

  13. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  14. Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L et al (2022) Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12009–12019

  15. Mishra S, Liang P, Czajka A, Chen DZ, Hu XS (2019) Cc-net: Image complexity guided network compression for biomedical image segmentation. In: 2019 IEEE 16th International symposium on biomedical imaging (ISBI 2019), pp 57–60. IEEE

  16. Sun G-L, Wu X, Chen H-H, Peng Q (2015) Clothing style recognition using fashion attribute detection. In: Proceedings of the 8th international conference on mobile multimedia communications. MobiMedia ’15, pp 145–148. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL

  17. Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  19. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR. arXiv:1704.04861

  20. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612

  21. Wan Q, Huang Z, Lu J, Gang Y, Zhang L (2022) Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation. In: The eleventh international conference on learning representations

  22. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357. PMLR

  23. Dai Z, Liu H, Le QV, Tan M (2021) Coatnet: Marrying convolution and attention for all data sizes. Adv Neural Inf Process Syst 34:3965–3977

    Google Scholar 

  24. Park N, Kim S (2021) How do vision transformers work? In: International conference on learning representations

  25. Kim S, Choi Y, Park J (2021) Recognition of multi label fashion styles based on transfer learning and graph convolution network. J Soc e-Bus Stud 26(1):29–41. https://doi.org/10.7838/jsebs.2021.26.1.029

    Article  Google Scholar 

  26. Chen X, Deng Y, Di C, Li H, Tang G, Cai H (2022) High-accuracy clothing and style classification via multi-feature fusion. Appl Sci 12(19):10062. https://doi.org/10.3390/app121910062

    Article  Google Scholar 

  27. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  28. Hendrycks D, Lee K, Mazeika M (2019) Using pre-training can improve model robustness and uncertainty. In: International conference on machine learning, pp 2712–2721. PMLR

  29. He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4918–4927

  30. Ke A, Ellsworth W, Banerjee O, Ng AY, Rajpurkar P (2021) Chextransfer: performance and parameter efficiency of imagenet models for chest x-ray interpretation. In: Proceedings of the conference on health, inference, and learning, pp 116–124

  31. Marmanis D, Datcu M, Esch T, Stilla U (2015) Deep learning earth observation classification using imagenet pretrained networks. IEEE Geosci Remote Sens Lett 13(1):105–109

    Article  Google Scholar 

  32. Li A, Jabri A, Joulin A, Van Der Maaten L (2017) Learning visual n-grams from web data. In: Proceedings of the IEEE international conference on computer vision, pp 4183–4192

  33. Joulin A, Van Der Maaten L, Jabri A, Vasilache N (2016) Learning visual features from large weakly supervised data. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pp 67–84. Springer

  34. Zhang Y, Jiang H, Miura Y, Manning CD, Langlotz CP (2022) Contrastive learning of medical visual representations from paired images and text. In: Machine learning for healthcare conference, pp 2–25. PMLR

  35. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, pp 8748–8763. PMLR

  36. Lüddecke T, Ecker A (2022) Image segmentation using text and image prompts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7086–7096

  37. Kumari T, Syal P, Aggarwal AK, Guleria V (2020) Hybrid image registration methods: a review. Int J Adv Trends Comput Sci Eng 9:1134–1142

    Article  Google Scholar 

  38. Maini D, Aggarwal AK (2018) Camera position estimation using 2d image dataset. Int J Innov Eng Technol 10:199–203

    Google Scholar 

  39. Arora K, Kumar A (2017) A comparative study on content based image retrieval methods. Int J Latest Technol Eng Manag Appl Sci 6(4):77–80

    MathSciNet  Google Scholar 

  40. Arora K, Aggarwal AK (2017) Approaches for image database retrieval based on color, texture, and shape features. Handbook of research on advanced concepts in real-time image and video processing, 28

  41. Aggarwal AK (2022) Learning texture features from glcm for classification of brain tumor mri images using random forest classifier. Trans Signal Process 18:60–63

    Article  Google Scholar 

  42. Kumari T, Guleria V, Syal P, Aggarwal AK (2021) A feature cum intensity based ssim optimised hybrid image registration technique. In: 2021 International conference on computing, communication and green engineering (CCGE), pp 1–8. IEEE

  43. https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html

  44. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  45. Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Kloft M, Shen D, Yin J, Gao W (2019) Multiple kernel \( k \) k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42(5):1191–1204

    Google Scholar 

  46. Zhou Z, Zhang B, Yu X (2022) Immune coordination deep network for hand heat trace extraction. Infrared Phys Technol 127:104400

    Article  Google Scholar 

  47. Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digit Signal Process 123:103442

  48. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  49. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

Download references

Acknowledgements

This research was supported by Deep Fashion Co., Ltd and the MSIT (Ministry of Science and ICT), Korea, under the National Program for Excellence in SW supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation) in 2023 (2023-0-00055).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the conception and design of the study. Jinyoung Choi mainly developed the model and performed the experiments together with Youngchae Kwon under the supervision of Injung Kim. The manuscript was drafted, revised, and approved by all authors.

Corresponding author

Correspondence to Injung Kim.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, J., Kwon, Y. & Kim, I. Item-region-based style classification network (IRSN): a fashion style classifier based on domain knowledge of fashion experts. Appl Intell 54, 9579–9593 (2024). https://doi.org/10.1007/s10489-024-05683-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05683-9

Keywords