Skip to main content

Advertisement

Log in

SiamYOLOv8: a rapid conditional detection framework for one-shot object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep learning networks typically require vast amounts of labeled data for effective training. However, recent research has introduced a challenging task called One-Shot Object Detection, which addresses scenarios where certain classes are novel and unseen during training and represented by only a single labeled example. In this paper, we propose a novel One-Shot Object Detection model applicable to Conditional Detection without over-training on novel classes. Our approach leverages the strengths of YOLOv8 (You Only Look Once v8), a popular real-time object detector. Specifically, we incorporate a Siamese network and a matching module to enhance One-Shot Object Detection capabilities. Our proposed model, SiamYOLOv8, enables exploration of new applications without being limited by its training data. To evaluate the performance, we introduce a novel methodology for using the Retail Product Checkout (RPC) dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/RPC)”, and extend our evaluation using the Grozi-3.2k dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/GROZI-3.2k)”. In such contexts, new products often lack sufficient data for continuous Deep Learning methods, making individual case identification difficult. Our model outperforms SOTA models, achieving a significant performance improvement of 20.33% increase in Average Precision (+12.41 AP) on the Grozi-3.2k dataset and 25.68% increase (+17.37 AP) on the RPC dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data Availability

The data supporting this study will be available at the same time as the paper.

Code Availability

‘Not applicable’

References

  1. Vinyals O, Blundell C, Lillicrap T, kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2016/hash/90e1357833654983612fb05e3ec9148c-Abstract.html. Accessed 24 Jan 2023

  2. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611. https://doi.org/10.1109/TPAMI.2006.79. Accessed 17 Apr 2023

  3. Huang Q, Zhang H, Xue M, Song J, Song M (2022) A survey of deep learning for low-shot object detection. arXiv. arXiv:2112.02814 [cs]. https://doi.org/10.48550/arXiv.2112.02814. http://arxiv.org/abs/2112.02814 Accessed on 23 Jan 2025

  4. Antonelli S, Avola D, Cinque L, Crisostomi D, Foresti GL, Galasso F, Marini MR, Mecca A, Pannone D (2022) Few-shot object detection: a survey. ACM Comput Surv 54(11s):1–37. https://doi.org/10.1145/3519022. . Accessed 23 Jan 2023

  5. Köhler M, Eisenbach M, Gross H-M (2021) Few-shot object detection: a survey

  6. Liu T, Zhang L, Wang Y, Guan J, Fu Y, Zhou S (2022) An empirical study and comparison of recent few-shot object detection algorithms. arXiv. arXiv:2203.14205 [cs]. https://doi.org/10.48550/arXiv.2203.14205. http://arxiv.org/abs/2203.14205. Accessed 24 Jan 2023

  7. Fu K, Zhang T, Zhang Y, Sun X (2021) OSCD: A one-shot conditional object detection framework. Neurocomputing 425:243–255. https://doi.org/10.1016/j.neucom.2020.04.092. Accessed 23 Jan 2023

  8. Chen T-I, Liu Y-C, Su H-T, Chang Y-C, Lin Y-H, Yeh J-F, Chen W-C, Hsu WH (2023) Dual-awareness attention for few-shot object detection. IEEE Trans Multimedia 25:291–301. https://doi.org/10.1109/TMM.2021.3125195. Conference Name: IEEE Transactions on Multimedia

  9. Gouda A, Roidl M (2023) DoUnseen: tuning-free class-adaptive object detection of unseen objects for robotic grasping. arXiv. arXiv:2304.02833 [cs]. http://arxiv.org/abs/2304.02833. Accessed 16 May 2024

  10. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564–2571. https://doi.org/10.1109/ICCV.2011.6126544. ISSN: 2380-7504

  11. Leutenegger S, Chli M, Siegwart RY (2011) BRISK: binary robust invariant scalable keypoints. In: 2011 International conference on computer vision, pp 2548–2555. https://doi.org/10.1109/ICCV.2011.6126542. ISSN: 2380-7504

  12. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94. Accessed 24 Jan 2023

  13. Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE international conference on computer vision, vol 2, pp 1150–11572. https://doi.org/10.1109/ICCV.1999.790410

  14. Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer Vision – ECCV 2006. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp 404–417. https://doi.org/10.1007/11744023_32

  15. Fan Z, Yu J-G, Liang Z, Ou J, Gao C, Xia G-S, Li Y (2020) FGN: Fully guided network for few-shot instance segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9169–9178. https://doi.org/10.1109/CVPR42600.2020.00919. ISSN: 2575-7075

  16. Han G, He Y, Huang S, Ma J, Chang S-F (2021) Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 3243–3252. https://doi.org/10.1109/ICCV48922.2021.00325. ISSN: 2380-7504

  17. Hsieh T-I, Lo Y-C, Chen H-T, Liu T-L (2019) One-shot object detection with co-attention and co-excitation. In: Advances in neural information processing systems, vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/hash/92af93f73faf3cefc129b6bc55a748a9-Abstract.html. Accessed 23 Jan 2023

  18. Nguyen K, Todorovic S (2021) FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11094–11103. https://doi.org/10.1109/CVPR46437.2021.01095. ISSN: 2575-7075

  19. Zhao Y, Guo X, Lu Y (2022) Semantic-aligned fusion transformer for one-shot object detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7591–7601. https://doi.org/10.1109/CVPR52688.2022.00745. ISSN: 2575-7075

  20. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4. Accessed 23 Jan 2023

  21. Lin T-Y, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollár P (2015) Microsoft COCO: Common objects in context. arXiv. arXiv:1405.0312 [cs]. http://arxiv.org/abs/1405.0312 Accessed 23 Jan 2023

  22. Michaelis C, Ustyuzhaninov I, Bethge M, Ecker AS (2019) One-Shot Instance Segmentation. arXiv. arXiv:1811.11507 [cs]. https://doi.org/10.48550/arXiv.1811.11507 . http://arxiv.org/abs/1811.11507. Accessed 24 Jan 2023

  23. Osokin A, Sumin D, Lomakin V (2020) OS2D: One-stage one-shot object detection by matching anchor features. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Springer, Cham, pp 635–652. https://doi.org/10.1007/978-3-030-58555-6_38

  24. Santra B, Shaw AK, Mukherjee DP (2020) Graph-based non-maximal suppression for detecting products on the rack. Pattern Recognit Lett 140:73–80. https://doi.org/10.1016/j.patrec.2020.09.023. Accessed 25 Oct 2023

  25. Santra B, Shaw AK, Mukherjee DP (2022) Part-based annotation-free fine-grained classification of images of retail products. Pattern Recognit 121:108257. https://doi.org/10.1016/j.patcog.2021.108257. Accessed 25 Oct 2023

  26. Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622. https://doi.org/10.1007/s00521-020-05148-3. Accessed 25 Oct 2023

  27. George M, Floerkemeier C (2014) Recognizing products: A per-exemplar multi-label image classification approach. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014. Lecture Notes in Computer Science. Springer, Cham, pp 440–455. https://doi.org/10.1007/978-3-319-10605-2_29

  28. Guimarães V, Nascimento J, Viana P, Carvalho P (2023) A review of recent advances and challenges in grocery label detection and recognition. Appl Sci 13(5):2871. https://doi.org/10.3390/app13052871. Number: 5 Publisher: Multidisciplinary Digital Publishing Institute. Accessed 25 Oct 2023

  29. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 1199–1208. https://doi.org/10.1109/CVPR.2018.00131. ISSN: 2575-7075

  30. Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019) Few-shot object detection via feature reweighting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 8419–8428. https://doi.org/10.1109/ICCV.2019.00851. ISSN: 2380-7504

  31. Köhler M, Eisenbach M, Gross H-M (2022) Few-shot object detection: a comprehensive survey. arXiv. arXiv:2112.11699 [cs]. https://doi.org/10.48550/arXiv.2112.11699. http://arxiv.org/abs/2112.11699. Accessed 23 Jan 2023

  32. Jocher G, Chaurasia A, Qiu J (2023) Ultralytics YOLO. https://github.com/ultralytics/ultralytics

  33. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html. Accessed 24 Jan 2023

  34. Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169. ISSN: 2380-7504

  35. He K, Gkioxari G, Dollãr P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322. ISSN: 2380-7504

  36. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91. ISSN: 1063-6919

  37. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690. ISSN: 1063-6919

  38. Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv. arXiv:1804.02767 [cs] (2018). https://doi.org/10.48550/arXiv.1804.02767. http://arxiv.org/abs/1804.02767. Accessed 24 Jan 2023

  39. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv. arXiv:2004.10934[cs, eess] (2020). https://doi.org/10.48550/arXiv.2004.10934. http://arxiv.org/abs/2004.10934. Accessed 23 Jan 2023

  40. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) YOLOX: Exceeding YOLO Series in 2021. arXiv. arXiv:2107.08430 [cs]. https://doi.org/10.48550/arXiv.2107.08430. http://arxiv.org/abs/2107.08430. Accessed 23 Jan 2023

  41. Bakkouri I, Bakkouri S (2024) 2mgas-net: multi-level multi-scale gated attentional squeezed network for polyp segmentation. 18(6):5377–5386. https://doi.org/10.1007/s11760-024-03240-y. Accessed 05 Dec 2024

  42. Bakkouri I, Afdel K (2020) DermoNet: A computer-aided diagnosis system for dermoscopic disease recognition. 12119:170–177. https://doi.org/10.1007/978-3-030-51935-3_18. Accessed 05 Dec 2024

  43. Lin T-Y, Dollaár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106. ISSN: 1063-6919

  44. Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv. arXiv:2207.02696 [cs]. https://doi.org/10.48550/arXiv.2207.02696. http://arxiv.org/abs/2207.02696. Accessed 09 Feb 2023

  45. Wang C-Y, Yeh I-H, Liao H-YM (2024) YOLOv9: Learning what you want to learn using programmable gradient information. arXiv:2402.13616 [cs] . https://doi.org/10.48550/arXiv.2402.13616. http://arxiv.org/abs/2402.13616. Accessed 01 Mar 2024

  46. Koch G, Zemel R, Salakhutdinov R et al (2015) Siamese neural networks for one-shot image recognition. In: ICML Deep learning workshop, vol. 2. Lille. Issue: 1

  47. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in neural information processing systems, vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/cb8da6767461f2812ae4290eac7cbc42-Abstract.html. Accessed 24 Jan 2023

  48. Huynh D, Elhamifar E (2021) Compositional fine-grained low-shot learning. arXiv. https://doi.org/10.48550/ARXIV.2105.10438. Publisher: arXiv Version Number: 1. Accessed 17 Apr 2023

  49. Köhler M, Eisenbach M, Gross H-M (2023) Few-shot object detection: a comprehensive survey. IEEE Trans Neural Netw Learn Syst 1–21. https://doi.org/10.1109/TNNLS.2023.3265051. Conference Name: IEEE Transactions on Neural Networks and Learning Systems

  50. Zhang L, Zhou S, Guan J, Zhang J (2021) Accurate few-shot object detection with support-query mutual guidance and hybrid loss. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14419–14427. https://doi.org/10.1109/CVPR46437.2021.01419. ISSN: 2575-7075

  51. Li Y, Feng W, Lyu S, Zhao Q, Li X (2020) MM-FSOD: Meta and metric integrated few-shot object detection. arXiv. arXiv:2012.15159 [cs]. https://doi.org/10.48550/arXiv.2012.15159. http://arxiv.org/abs/2012.15159. Accessed 23 Jan 2023

  52. Lee H, Lee M, Kwak N (2022) Few-shot object detection by attending to per-sample-prototype. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, Waikoloa, HI, USA, pp 1101–1110. https://doi.org/10.1109/WACV51458.2022.00117. https://ieeexplore.ieee.org/document/9706848/. Accessed 28 Apr 2023

  53. Michaelis C, Bethge M, Ecker AS (2022) A broad dataset is all you need for one-shot object detection. arXiv. arXiv:2011.04267 [cs]. https://doi.org/10.48550/arXiv.2011.04267. http://arxiv.org/abs/2011.04267. Accessed 24 Jan 2023

  54. Zhang S, Wang L, Murray N, Koniusz P (2022) Kernelized Few-shot object detection with efficient integral aggregation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, pp 19185–19194. https://doi.org/10.1109/CVPR52688.2022.01861. https://ieeexplore.ieee.org/document/9880443/. Accessed 28 Apr 2023-04-28

  55. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp. 7794–7803. IEEE, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00813. https://ieeexplore.ieee.org/document/8578911/. Accessed 14 June 2023

  56. Yang H, Lin Y, Zhang H, Zhang Y, Xu B (2021) Towards improving classification power for one-shot object detection. Neurocomputing 455:390–400. https://doi.org/10.1016/j.neucom.2021.04.116. Accessed 24 Jan 2023

  57. Yang H, Cai S, Sheng H, Deng B, Huang J, Hua X-S, Tang Y, Zhang Y (2022) Balanced and hierarchical relation learning for one-shot object detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7581–7590. https://doi.org/10.1109/CVPR52688.2022.00744. ISSN: 2575-7075

  58. Chen D-J, Hsieh H-Y, Liu T-L (2021) Adaptive image transformer for one-shot object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12242–12251. https://doi.org/10.1109/CVPR46437.2021.01207. ISSN: 2575-7075

  59. Lin W, Deng Y, Gao Y, Wang N, Zhou J, Liu L, Zhang L, Wang P (2021) CAT: Cross-attention transformer for one-shot object detection. arXiv. arXiv:2104.14984 [cs]. https://doi.org/10.48550/arXiv.2104.14984. http://arxiv.org/abs/2104.14984. Accessed 23 Jan 2023

  60. Zhang X, Wang Y, Boularias A (2024) Detect everything with few examples. arXiv. arXiv:2309.12969 [cs]. http://arxiv.org/abs/2309.12969. Accessed 14 May 2024

  61. Minderer M, Gritsenko A, Stone A, Neumann M, Weissenborn D, Dosovitskiy A, Mahendran A, Arnab A, Dehghani M, Shen Z, Wang X, Zhai X, Kipf T, Houlsby N (2022) Simple open-vocabulary object detection. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds) Computer Vision – ECCV 2022. Springer, Cham, pp 728–755. https://doi.org/10.1007/978-3-031-20080-9_42

  62. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol. 30. Curran Associates, Inc . https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Accessed 2023-01-24

  63. Zhang T, Zhang Y, Sun X, Sun H, Yan M, Yang X, Fu K (2020) Comparison network for one-shot conditional object detection. arXiv. arXiv:1904.02317 [cs]. https://doi.org/10.48550/arXiv.1904.02317. http://arxiv.org/abs/1904.02317. Accessed 24 Jan 2023

  64. Gothai E, Bhatia S, Alabdali A, Sharma D, Kondamudi B, Dadheech P (2022) Design features of grocery product recognition using deep learning. 34(2):1231–1246. https://doi.org/10.32604/iasc.2022.026264. Publisher: Tech Science Press. Accessed 13 Mar 2024

  65. Wu X, Sahoo D, Hoi S (2020) Meta-RCNN: meta learning for few-shot object detection. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 1679–1687. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3394171.3413832. Accessed 24 Jan 2023

  66. Zhang G, Luo Z, Cui K, Lu S (2021) Meta-DETR: Image-level few-shot object detection with inter-class correlation exploitation. arXiv. arXiv:2103.11731 [cs]. https://doi.org/10.48550/arXiv.2103.11731. http://arxiv.org/abs/2103.11731. Accessed 24 Jan 2023

  67. Han G, Huang S, Ma J, He Y, Chang S-F (2022) Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. Proceedings of the AAAI conference on artificial intelligence vol 36(1), pp 780–789. https://doi.org/10.1609/aaai.v36i1.19959. Number: 1. Accessed 28 Apr 2023

  68. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. arXiv. arXiv:2005.12872 [cs]. http://arxiv.org/abs/2005.12872. Accessed 23 Jan 2023

  69. Li X, Zhang L, Chen YP, Tai Y-W, Tang C-K (2020) One-shot object detection without fine-tuning. arXiv. arXiv:2005.03819 [cs, eess]. http://arxiv.org/abs/2005.03819. Accessed 23 Jan 2023

  70. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. vol 8691, pp 346–361. https://doi.org/10.1007/978-3-319-10578-9_23 . arXiv:1406.4729 [cs]. http://arxiv.org/abs/1406.4729. Accessed 24 Feb 2024

  71. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. arXiv. arXiv:1803.01534 [cs]. https://doi.org/10.48550/arXiv.1803.01534. http://arxiv.org/abs/1803.01534. Accessed 14 Feb 2024

  72. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-IoU loss: faster and better learning for bounding box regression. arXiv. arXiv:1911.08287 [cs]. http://arxiv.org/abs/1911.08287. Accessed 03 Nov 2023

  73. Li X, Lv C, Wang W, Li G, Yang L, Yang J (2022) Generalized focal loss: towards efficient representation learning for dense object detection. IEEE Trans Pattern Anal Mach Intell 1–14. https://doi.org/10.1109/TPAMI.2022.3180392. Accessed 03 Nov 2023

  74. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

  75. Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. vol. 10553, pp 240–248. https://doi.org/10.1007/978-3-319-67558-9_28. http://arxiv.org/abs/1707.03237. Accessed 24 Jan 2025

  76. Wei X-S, Cui Q, Yang L, Wang P, Liu L (2019) RPC: A large-scale retail product checkout dataset. arXiv. arXiv:1901.07249 [cs]. http://arxiv.org/abs/1901.07249. Accessed 23 Jan 2023

  77. Fan Q, Zhuo W, Tang C-K, Tai Y-W (2020) Few-shot object detection with attention-rpn and multi-relation detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4012–4021. https://doi.org/10.1109/CVPR42600.2020.00407. ISSN: 2575-7075

  78. Tonioni A, Di Stefano L (2017) Product recognition in store shelves as a sub-graph isomorphism problem. In: Battiato S, Gallo G, Schettini R, Stanco F (eds) Image analysis and processing - ICIAP 2017. Lecture Notes in Computer Science. Springer, Cham, pp 682–693. https://doi.org/10.1007/978-3-319-68560-1_61

  79. Hu T, Mettes P, Huang J-H, Snoek C (2019) SILCO: Show a few images, localize the common object. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 5066–5075. https://doi.org/10.1109/ICCV.2019.00517. ISSN: 2380-7504

  80. Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6154–6162. https://doi.org/10.1109/CVPR.2018.00644. ISSN: 2575-7075

  81. Merler M, Galleguillos C, Belongie S (2007) Recognizing Groceries in situ using in vitro training data. In: 2007 IEEE Conference on computer vision and pattern recognition. IEEE, Minneapolis, MN, USA, pp 1–8. https://doi.org/10.1109/CVPR.2007.383486. http://ieeexplore.ieee.org/document/4270484/. Accessed 17 Apr 2023

  82. Yan X, Chen Z, Xu A, Wang X, Liang X, Lin L (2019) Meta R-CNN: towards general solver for instance-level low-shot learning. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 9576–9585. https://doi.org/10.1109/ICCV.2019.00967. ISSN: 2380-7504

  83. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935. ISSN: 2575-7075

  84. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Hua G, Jégou H (eds) Computer Vision – ECCV 2016 Workshops. Lecture Notes in Computer Science. Springer, Cham, pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56

  85. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch

  86. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90. ISSN: 1063-6919

Download references

Acknowledgements

The authors would like to thank the Generix Group Company for financial support and the L@bISEN laboratory of the ISEN Yncrea Ouest for scientific support.

Funding

This work is funded by our partner Generix (world leader in the SaaS industry), as part of their project to add AI solutions to their existing warehouse management product.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed significantly to the work, in terms of content, analysis, writing, and revision of the manuscript.

Corresponding author

Correspondence to Matthieu Desmarescaux.

Ethics declarations

Conflict of Interest

The authors declare that they have no competing financial interests or personal relationships that could have influenced the work reported in this paper.

Consent for Publication

All authors consent to the publication of this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Desmarescaux, M., Kaddah, W., Alfalou, A. et al. SiamYOLOv8: a rapid conditional detection framework for one-shot object detection. Appl Intell 55, 609 (2025). https://doi.org/10.1007/s10489-025-06513-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-025-06513-2

Keywords

Profiles

  1. Matthieu Desmarescaux