Abstract
Detecting visually similar images is a particularly useful attribute to look to when calculating product recommendations. Embedding similarity, which utilizes pre-trained computer vision models to extract high-level image features, has demonstrated remarkable efficacy in identifying images with similar compositions. However, there is a lack of methods for evaluating the embeddings generated by these models, as conventional loss and performance metrics do not adequately capture their performance in image similarity search tasks.
In this paper, we evaluate the viability of the image embeddings from numerous pre-trained computer vision models using a novel approach named CorrEmbed. Our approach computes the correlation between distances in image embeddings and distances in human-generated tag vectors. We extensively evaluate numerous pre-trained Torchvision models using this metric, revealing an intuitive relationship of linear scaling between ImageNet1k accuracy scores and tag-correlation scores. Importantly, our method also identifies deviations from this pattern, providing insights into how different models capture high-level image features.
By offering a robust performance evaluation of these pre-trained models, CorrEmbed serves as a valuable tool for researchers and practitioners seeking to develop effective, data-driven approaches to similar item recommendations in fashion retail. All code and experiments are openly available at https://github.com/cair/CorrEmbed_Evaluating_Pre-trained_Model_Efficacy/tree/main.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Tise: https://tise.com/.
- 2.
TorchVision’s model set is available at https://pytorch.org/vision/stable/models.html.
- 3.
References
Afsar, M.M., Crump, T., Far, B.: Reinforcement learning based recommender systems: a survey. ACM Comput. Surv. 55(7), 1–38 (2022). https://doi.org/10.1145/3543846
Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, San Juan, PR, USA (2015)
Borgersen, K.A., Goodwin, M., Sharma, J.: A comparison between Tsetlin machines and deep neural networks in the context of recommendation systems. In: Northern Lights Deep Learning Workshop, vol. 4 (2023). https://doi.org/10.7557/18.6807
Da’u, A., Salim, N.: Recommendation system based on deep learning methods: a systematic review and new directions. Artif. Intell. Rev. 53(4), 2709–2748 (2020)
Deldjoo, Y., et al.: A review of modern fashion recommender systems. arXiv preprint arXiv:2202.02757 (2022)
Fu, Y., Xiang, T., Jiang, Y.G., Xue, X., Sigal, L., Gong, S.: Recent advances in zero-shot recognition: toward data-efficient understanding of visual content. IEEE Signal Process. Mag. 35(1), 112–125 (2018). https://doi.org/10.1109/MSP.2017.2763441
Garcia, N., Vogiatzis, G.: Learning non-metric visual similarity for image retrieval. Image Vis. Comput. 82, 18–25 (2019). https://doi.org/10.1016/j.imavis.2019.01.001
Gomez Bruballa, R., Burnham-King, L., Sala, A.: Learning users’ preferred visual styles in an image marketplace. In: Proceedings of the 16th ACM Conference on Recommender Systems, pp. 466–468. ACM, New York, NY, USA (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, New York, NY, USA (2016)
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314–1324. IEEE, New York, NY, USA (2019)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 1(1) (2016)
Karypis, G.: Evaluation of item-based top-n recommendation algorithms. In: Proceedings of the tenth International Conference on Information and Knowledge Management, pp. 247–254. ACM, New York, NY, USA (2001)
Koren, Y., Rendle, S., Bell, R.: Advances in collaborative filtering. Recommender Syst. Handb. 1(1), 91–142 (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986. IEEE, New York, NY, USA (2022)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
maintainers, T., contributors: Models and pre-trained weights (2021). http://pytorch.org/vision/stable/models.html. Accessed 17 Jan 2023
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10428–10436. IEEE, Seattle, WA, USA (2020)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: An open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM, New York, NY, USA (1994)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Sarker, I.H.: Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2(6), 1–20 (2021)
Strain, N., Olszewska, J.I.: Naive Bayesian network for automated, fashion personal stylist. In: ICAART (2), pp. 814–821 (2020)
Tan, M., Le, Q.: Efficientnetv2: Smaller models and faster training. In: International Conference on Machine Learning, pp. 10096–10106. PMLR, Cambridge MA: JMLR, Cambridge Massachusetts, USA (2021)
Tarasov, A.S., Tarasova, V.Y., Grinchenko, N.N., Stepanov, M.A.: Development of a search system for similar images. In: 2020 ELEKTRO, pp. 1–6. IEEE (2020)
Tu, Z., et al.: Maxvit: Multi-axis vision transformer. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13684, pp. 459–479. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20053-3_27
Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019). https://doi.org/10.1145/3285029
Acknowledgments
Funded by the Research Council of Norway through the project “Your green, smart and endless wardrobe”, project number 309977. We thank FJONG for providing the data used as a basis for the dataset in this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Borgersen, K.A.K., Goodwin, M., Sharma, J., Aasmoe, T., Leonhardsen, M., Rørvik, G.H. (2023). CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a Novel Metric. In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science(), vol 14381. Springer, Cham. https://doi.org/10.1007/978-3-031-47994-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-47994-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47993-9
Online ISBN: 978-3-031-47994-6
eBook Packages: Computer ScienceComputer Science (R0)