Skip to main content

CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a Novel Metric

  • Conference paper
  • First Online:
Artificial Intelligence XL (SGAI 2023)

Abstract

Detecting visually similar images is a particularly useful attribute to look to when calculating product recommendations. Embedding similarity, which utilizes pre-trained computer vision models to extract high-level image features, has demonstrated remarkable efficacy in identifying images with similar compositions. However, there is a lack of methods for evaluating the embeddings generated by these models, as conventional loss and performance metrics do not adequately capture their performance in image similarity search tasks.

In this paper, we evaluate the viability of the image embeddings from numerous pre-trained computer vision models using a novel approach named CorrEmbed. Our approach computes the correlation between distances in image embeddings and distances in human-generated tag vectors. We extensively evaluate numerous pre-trained Torchvision models using this metric, revealing an intuitive relationship of linear scaling between ImageNet1k accuracy scores and tag-correlation scores. Importantly, our method also identifies deviations from this pattern, providing insights into how different models capture high-level image features.

By offering a robust performance evaluation of these pre-trained models, CorrEmbed serves as a valuable tool for researchers and practitioners seeking to develop effective, data-driven approaches to similar item recommendations in fashion retail. All code and experiments are openly available at https://github.com/cair/CorrEmbed_Evaluating_Pre-trained_Model_Efficacy/tree/main.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Tise: https://tise.com/.

  2. 2.

    TorchVision’s model set is available at https://pytorch.org/vision/stable/models.html.

  3. 3.

    https://en.wikipedia.org/wiki/Pareto_principle.

References

  1. Afsar, M.M., Crump, T., Far, B.: Reinforcement learning based recommender systems: a survey. ACM Comput. Surv. 55(7), 1–38 (2022). https://doi.org/10.1145/3543846

    Article  Google Scholar 

  2. Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, San Juan, PR, USA (2015)

    Google Scholar 

  3. Borgersen, K.A., Goodwin, M., Sharma, J.: A comparison between Tsetlin machines and deep neural networks in the context of recommendation systems. In: Northern Lights Deep Learning Workshop, vol. 4 (2023). https://doi.org/10.7557/18.6807

  4. Da’u, A., Salim, N.: Recommendation system based on deep learning methods: a systematic review and new directions. Artif. Intell. Rev. 53(4), 2709–2748 (2020)

    Article  Google Scholar 

  5. Deldjoo, Y., et al.: A review of modern fashion recommender systems. arXiv preprint arXiv:2202.02757 (2022)

  6. Fu, Y., Xiang, T., Jiang, Y.G., Xue, X., Sigal, L., Gong, S.: Recent advances in zero-shot recognition: toward data-efficient understanding of visual content. IEEE Signal Process. Mag. 35(1), 112–125 (2018). https://doi.org/10.1109/MSP.2017.2763441

    Article  Google Scholar 

  7. Garcia, N., Vogiatzis, G.: Learning non-metric visual similarity for image retrieval. Image Vis. Comput. 82, 18–25 (2019). https://doi.org/10.1016/j.imavis.2019.01.001

    Article  Google Scholar 

  8. Gomez Bruballa, R., Burnham-King, L., Sala, A.: Learning users’ preferred visual styles in an image marketplace. In: Proceedings of the 16th ACM Conference on Recommender Systems, pp. 466–468. ACM, New York, NY, USA (2022)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, New York, NY, USA (2016)

    Google Scholar 

  10. Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1314–1324. IEEE, New York, NY, USA (2019)

    Google Scholar 

  11. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 1(1) (2016)

  12. Karypis, G.: Evaluation of item-based top-n recommendation algorithms. In: Proceedings of the tenth International Conference on Information and Knowledge Management, pp. 247–254. ACM, New York, NY, USA (2001)

    Google Scholar 

  13. Koren, Y., Rendle, S., Bell, R.: Advances in collaborative filtering. Recommender Syst. Handb. 1(1), 91–142 (2021)

    Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  15. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986. IEEE, New York, NY, USA (2022)

    Google Scholar 

  16. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  17. maintainers, T., contributors: Models and pre-trained weights (2021). http://pytorch.org/vision/stable/models.html. Accessed 17 Jan 2023

  18. Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10428–10436. IEEE, Seattle, WA, USA (2020)

    Google Scholar 

  19. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: An open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM, New York, NY, USA (1994)

    Google Scholar 

  20. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  21. Sarker, I.H.: Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2(6), 1–20 (2021)

    Article  MathSciNet  Google Scholar 

  22. Strain, N., Olszewska, J.I.: Naive Bayesian network for automated, fashion personal stylist. In: ICAART (2), pp. 814–821 (2020)

    Google Scholar 

  23. Tan, M., Le, Q.: Efficientnetv2: Smaller models and faster training. In: International Conference on Machine Learning, pp. 10096–10106. PMLR, Cambridge MA: JMLR, Cambridge Massachusetts, USA (2021)

    Google Scholar 

  24. Tarasov, A.S., Tarasova, V.Y., Grinchenko, N.N., Stepanov, M.A.: Development of a search system for similar images. In: 2020 ELEKTRO, pp. 1–6. IEEE (2020)

    Google Scholar 

  25. Tu, Z., et al.: Maxvit: Multi-axis vision transformer. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13684, pp. 459–479. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20053-3_27

    Chapter  Google Scholar 

  26. Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019). https://doi.org/10.1145/3285029

    Article  Google Scholar 

Download references

Acknowledgments

Funded by the Research Council of Norway through the project “Your green, smart and endless wardrobe”, project number 309977. We thank FJONG for providing the data used as a basis for the dataset in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl Audun Kagnes Borgersen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borgersen, K.A.K., Goodwin, M., Sharma, J., Aasmoe, T., Leonhardsen, M., Rørvik, G.H. (2023). CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a Novel Metric. In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science(), vol 14381. Springer, Cham. https://doi.org/10.1007/978-3-031-47994-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47994-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47993-9

  • Online ISBN: 978-3-031-47994-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics