Item-region-based style classification network (IRSN): a fashion style classifier based on domain knowledge of fashion experts

Choi, Jinyoung; Kwon, Youngchae; Kim, Injung

doi:10.1007/s10489-024-05683-9

Item-region-based style classification network (IRSN): a fashion style classifier based on domain knowledge of fashion experts

Published: 18 July 2024

Volume 54, pages 9579–9593, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

304 Accesses
Explore all metrics

Abstract

Fashion style is expressed through the way clothing and accessories are put together, as well as the silhouettes, textiles, colors, and shape details of each fashion item. The challenge of style classification lies in the wide visual variation within the same style and the existence of visually similar styles. Fashion experts categorize fashion styles not only by global appearance but also by the attributes of individual items and their combinations. We propose an item-region-based fashion style classification network (IRSN) that effectively classifies fashion styles by analyzing item-level features and their combinations. IRSN extracts item features using item region pooling (IRP), analyzes them separately, and aggregates them using gated feature fusion (GFF). In addition, IRSN applies a dual-backbone architecture that combines a domain-specific feature extractor and a general feature extractor pretrained with a large general image-text dataset. In the experiment, we evaluated IRSN variants based on six widely used backbones, including EfficientNet, ConvNeXt, and SwinTransformer. The IRSN models outperformed their baseline models by an average of 8.9% and a maximum of 16.7% on the FashionStyle14 dataset, and by an average of 9.4% and a maximum of 17.0% on the ShowniqV3 dataset. The visualization results support that the IRSN models are more effective than the baseline models in capturing differences between similar style classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Clothing fashion style recognition with design issue graph

Article 16 November 2020

Conceptual framework of hybrid style in fashion image datasets for machine learning

Article Open access 15 May 2023

Generative AI-based style recommendation using fashion item detection and classification

Article 13 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

FashionStyle14 [4] is available from https://esslab.jp/~ess/en/data/fashionstyle14/. ShowniqV3 is not publicly available because ShowniqV3 was collected for commercial service.

References

Lee MG, Kim HJ (2021) Analysis of the sales promotion strategy of online fashion shopping mall. Korea Inst Cult Prod Des 64:227–240
Google Scholar
Kennedy A, Stoehrer EB, Calderin J (2013) Fashion Design, Referenced: A Visual Guide to the History, Language, and Practice of Fashion. Rockport Publishers, Beverly, Mass
Google Scholar
Sorger R, Udale J (2006) The Fundamentals of Fashion Design. AVA Publishing, Worthing, West Sussex, United Kingdom
Google Scholar
Takagi M, Simo-Serra E, Iizuka S, Ishikawa H (2017) What Makes a Style: Experimental Analysis of Fashion Prediction. In: Proceedings of the international conference on computer vision workshops (ICCVW). https://doi.org/10.1109/ICCVW.2017.263
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114. PMLR
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364
Article Google Scholar
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
Woo S, Debnath S, Hu R, Chen X, Liu Z, Kweon IS, Xie S (2023) Convnext v2: Co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16133–16142
Wang W, Dai J, Chen Z, Huang Z, Li Z, Zhu X, Hu X, Lu T, Lu L, Li H et al (2023) Internimage: Exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14408–14419
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L et al (2022) Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12009–12019
Mishra S, Liang P, Czajka A, Chen DZ, Hu XS (2019) Cc-net: Image complexity guided network compression for biomedical image segmentation. In: 2019 IEEE 16th International symposium on biomedical imaging (ISBI 2019), pp 57–60. IEEE
Sun G-L, Wu X, Chen H-H, Peng Q (2015) Clothing style recognition using fashion attribute detection. In: Proceedings of the 8th international conference on mobile multimedia communications. MobiMedia ’15, pp 145–148. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR. arXiv:1704.04861
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Wan Q, Huang Z, Lu J, Gang Y, Zhang L (2022) Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation. In: The eleventh international conference on learning representations
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357. PMLR
Dai Z, Liu H, Le QV, Tan M (2021) Coatnet: Marrying convolution and attention for all data sizes. Adv Neural Inf Process Syst 34:3965–3977
Google Scholar
Park N, Kim S (2021) How do vision transformers work? In: International conference on learning representations
Kim S, Choi Y, Park J (2021) Recognition of multi label fashion styles based on transfer learning and graph convolution network. J Soc e-Bus Stud 26(1):29–41. https://doi.org/10.7838/jsebs.2021.26.1.029
Article Google Scholar
Chen X, Deng Y, Di C, Li H, Tang G, Cai H (2022) High-accuracy clothing and style classification via multi-feature fusion. Appl Sci 12(19):10062. https://doi.org/10.3390/app121910062
Article Google Scholar
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Hendrycks D, Lee K, Mazeika M (2019) Using pre-training can improve model robustness and uncertainty. In: International conference on machine learning, pp 2712–2721. PMLR
He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4918–4927
Ke A, Ellsworth W, Banerjee O, Ng AY, Rajpurkar P (2021) Chextransfer: performance and parameter efficiency of imagenet models for chest x-ray interpretation. In: Proceedings of the conference on health, inference, and learning, pp 116–124
Marmanis D, Datcu M, Esch T, Stilla U (2015) Deep learning earth observation classification using imagenet pretrained networks. IEEE Geosci Remote Sens Lett 13(1):105–109
Article Google Scholar
Li A, Jabri A, Joulin A, Van Der Maaten L (2017) Learning visual n-grams from web data. In: Proceedings of the IEEE international conference on computer vision, pp 4183–4192
Joulin A, Van Der Maaten L, Jabri A, Vasilache N (2016) Learning visual features from large weakly supervised data. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pp 67–84. Springer
Zhang Y, Jiang H, Miura Y, Manning CD, Langlotz CP (2022) Contrastive learning of medical visual representations from paired images and text. In: Machine learning for healthcare conference, pp 2–25. PMLR
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, pp 8748–8763. PMLR
Lüddecke T, Ecker A (2022) Image segmentation using text and image prompts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7086–7096
Kumari T, Syal P, Aggarwal AK, Guleria V (2020) Hybrid image registration methods: a review. Int J Adv Trends Comput Sci Eng 9:1134–1142
Article Google Scholar
Maini D, Aggarwal AK (2018) Camera position estimation using 2d image dataset. Int J Innov Eng Technol 10:199–203
Google Scholar
Arora K, Kumar A (2017) A comparative study on content based image retrieval methods. Int J Latest Technol Eng Manag Appl Sci 6(4):77–80
MathSciNet Google Scholar
Arora K, Aggarwal AK (2017) Approaches for image database retrieval based on color, texture, and shape features. Handbook of research on advanced concepts in real-time image and video processing, 28
Aggarwal AK (2022) Learning texture features from glcm for classification of brain tumor mri images using random forest classifier. Trans Signal Process 18:60–63
Article Google Scholar
Kumari T, Guleria V, Syal P, Aggarwal AK (2021) A feature cum intensity based ssim optimised hybrid image registration technique. In: 2021 International conference on computing, communication and green engineering (CCGE), pp 1–8. IEEE
https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Kloft M, Shen D, Yin J, Gao W (2019) Multiple kernel $ k $ k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42(5):1191–1204
Google Scholar
Zhou Z, Zhang B, Yu X (2022) Immune coordination deep network for hand heat trace extraction. Infrared Phys Technol 127:104400
Article Google Scholar
Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digit Signal Process 123:103442
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

Download references

Acknowledgements

This research was supported by Deep Fashion Co., Ltd and the MSIT (Ministry of Science and ICT), Korea, under the National Program for Excellence in SW supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation) in 2023 (2023-0-00055).

Author information

Authors and Affiliations

Dept. of CSEE, Handong Graduate School, Handong Global Univ., 558 Handong-ro Buk-gu, Pohang, 37554, Gyeongbuk, Republic of Korea
Jinyoung Choi & Youngchae Kwon
School of CSEE, Handong Global University, 558 Handong-ro Buk-gu, Pohang, 37554, Gyeongbuk, Republic of Korea
Injung Kim

Authors

Jinyoung Choi
View author publications
You can also search for this author inPubMed Google Scholar
Youngchae Kwon
View author publications
You can also search for this author inPubMed Google Scholar
Injung Kim
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the conception and design of the study. Jinyoung Choi mainly developed the model and performed the experiments together with Youngchae Kwon under the supervision of Injung Kim. The manuscript was drafted, revised, and approved by all authors.

Corresponding author

Correspondence to Injung Kim.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Choi, J., Kwon, Y. & Kim, I. Item-region-based style classification network (IRSN): a fashion style classifier based on domain knowledge of fashion experts. Appl Intell 54, 9579–9593 (2024). https://doi.org/10.1007/s10489-024-05683-9

Download citation

Accepted: 09 July 2024
Published: 18 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10489-024-05683-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Item-region-based style classification network (IRSN): a fashion style classifier based on domain knowledge of fashion experts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Clothing fashion style recognition with design issue graph

Conceptual framework of hybrid style in fashion image datasets for machine learning

Generative AI-based style recommendation using fashion item detection and classification

Explore related subjects

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now