Abstract
Deep learning networks typically require vast amounts of labeled data for effective training. However, recent research has introduced a challenging task called One-Shot Object Detection, which addresses scenarios where certain classes are novel and unseen during training and represented by only a single labeled example. In this paper, we propose a novel One-Shot Object Detection model applicable to Conditional Detection without over-training on novel classes. Our approach leverages the strengths of YOLOv8 (You Only Look Once v8), a popular real-time object detector. Specifically, we incorporate a Siamese network and a matching module to enhance One-Shot Object Detection capabilities. Our proposed model, SiamYOLOv8, enables exploration of new applications without being limited by its training data. To evaluate the performance, we introduce a novel methodology for using the Retail Product Checkout (RPC) dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/RPC)”, and extend our evaluation using the Grozi-3.2k dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/GROZI-3.2k)”. In such contexts, new products often lack sufficient data for continuous Deep Learning methods, making individual case identification difficult. Our model outperforms SOTA models, achieving a significant performance improvement of 20.33% increase in Average Precision (+12.41 AP) on the Grozi-3.2k dataset and 25.68% increase (+17.37 AP) on the RPC dataset.


Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data Availability
The data supporting this study will be available at the same time as the paper.
Code Availability
‘Not applicable’
References
Vinyals O, Blundell C, Lillicrap T, kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2016/hash/90e1357833654983612fb05e3ec9148c-Abstract.html. Accessed 24 Jan 2023
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611. https://doi.org/10.1109/TPAMI.2006.79. Accessed 17 Apr 2023
Huang Q, Zhang H, Xue M, Song J, Song M (2022) A survey of deep learning for low-shot object detection. arXiv. arXiv:2112.02814 [cs]. https://doi.org/10.48550/arXiv.2112.02814. http://arxiv.org/abs/2112.02814 Accessed on 23 Jan 2025
Antonelli S, Avola D, Cinque L, Crisostomi D, Foresti GL, Galasso F, Marini MR, Mecca A, Pannone D (2022) Few-shot object detection: a survey. ACM Comput Surv 54(11s):1–37. https://doi.org/10.1145/3519022. . Accessed 23 Jan 2023
Köhler M, Eisenbach M, Gross H-M (2021) Few-shot object detection: a survey
Liu T, Zhang L, Wang Y, Guan J, Fu Y, Zhou S (2022) An empirical study and comparison of recent few-shot object detection algorithms. arXiv. arXiv:2203.14205 [cs]. https://doi.org/10.48550/arXiv.2203.14205. http://arxiv.org/abs/2203.14205. Accessed 24 Jan 2023
Fu K, Zhang T, Zhang Y, Sun X (2021) OSCD: A one-shot conditional object detection framework. Neurocomputing 425:243–255. https://doi.org/10.1016/j.neucom.2020.04.092. Accessed 23 Jan 2023
Chen T-I, Liu Y-C, Su H-T, Chang Y-C, Lin Y-H, Yeh J-F, Chen W-C, Hsu WH (2023) Dual-awareness attention for few-shot object detection. IEEE Trans Multimedia 25:291–301. https://doi.org/10.1109/TMM.2021.3125195. Conference Name: IEEE Transactions on Multimedia
Gouda A, Roidl M (2023) DoUnseen: tuning-free class-adaptive object detection of unseen objects for robotic grasping. arXiv. arXiv:2304.02833 [cs]. http://arxiv.org/abs/2304.02833. Accessed 16 May 2024
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564–2571. https://doi.org/10.1109/ICCV.2011.6126544. ISSN: 2380-7504
Leutenegger S, Chli M, Siegwart RY (2011) BRISK: binary robust invariant scalable keypoints. In: 2011 International conference on computer vision, pp 2548–2555. https://doi.org/10.1109/ICCV.2011.6126542. ISSN: 2380-7504
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94. Accessed 24 Jan 2023
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE international conference on computer vision, vol 2, pp 1150–11572. https://doi.org/10.1109/ICCV.1999.790410
Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer Vision – ECCV 2006. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp 404–417. https://doi.org/10.1007/11744023_32
Fan Z, Yu J-G, Liang Z, Ou J, Gao C, Xia G-S, Li Y (2020) FGN: Fully guided network for few-shot instance segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9169–9178. https://doi.org/10.1109/CVPR42600.2020.00919. ISSN: 2575-7075
Han G, He Y, Huang S, Ma J, Chang S-F (2021) Query adaptive few-shot object detection with heterogeneous graph convolutional networks. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp 3243–3252. https://doi.org/10.1109/ICCV48922.2021.00325. ISSN: 2380-7504
Hsieh T-I, Lo Y-C, Chen H-T, Liu T-L (2019) One-shot object detection with co-attention and co-excitation. In: Advances in neural information processing systems, vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/hash/92af93f73faf3cefc129b6bc55a748a9-Abstract.html. Accessed 23 Jan 2023
Nguyen K, Todorovic S (2021) FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11094–11103. https://doi.org/10.1109/CVPR46437.2021.01095. ISSN: 2575-7075
Zhao Y, Guo X, Lu Y (2022) Semantic-aligned fusion transformer for one-shot object detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7591–7601. https://doi.org/10.1109/CVPR52688.2022.00745. ISSN: 2575-7075
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4. Accessed 23 Jan 2023
Lin T-Y, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollár P (2015) Microsoft COCO: Common objects in context. arXiv. arXiv:1405.0312 [cs]. http://arxiv.org/abs/1405.0312 Accessed 23 Jan 2023
Michaelis C, Ustyuzhaninov I, Bethge M, Ecker AS (2019) One-Shot Instance Segmentation. arXiv. arXiv:1811.11507 [cs]. https://doi.org/10.48550/arXiv.1811.11507 . http://arxiv.org/abs/1811.11507. Accessed 24 Jan 2023
Osokin A, Sumin D, Lomakin V (2020) OS2D: One-stage one-shot object detection by matching anchor features. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Springer, Cham, pp 635–652. https://doi.org/10.1007/978-3-030-58555-6_38
Santra B, Shaw AK, Mukherjee DP (2020) Graph-based non-maximal suppression for detecting products on the rack. Pattern Recognit Lett 140:73–80. https://doi.org/10.1016/j.patrec.2020.09.023. Accessed 25 Oct 2023
Santra B, Shaw AK, Mukherjee DP (2022) Part-based annotation-free fine-grained classification of images of retail products. Pattern Recognit 121:108257. https://doi.org/10.1016/j.patcog.2021.108257. Accessed 25 Oct 2023
Wang W, Cui Y, Li G, Jiang C, Deng S (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622. https://doi.org/10.1007/s00521-020-05148-3. Accessed 25 Oct 2023
George M, Floerkemeier C (2014) Recognizing products: A per-exemplar multi-label image classification approach. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014. Lecture Notes in Computer Science. Springer, Cham, pp 440–455. https://doi.org/10.1007/978-3-319-10605-2_29
Guimarães V, Nascimento J, Viana P, Carvalho P (2023) A review of recent advances and challenges in grocery label detection and recognition. Appl Sci 13(5):2871. https://doi.org/10.3390/app13052871. Number: 5 Publisher: Multidisciplinary Digital Publishing Institute. Accessed 25 Oct 2023
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 1199–1208. https://doi.org/10.1109/CVPR.2018.00131. ISSN: 2575-7075
Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019) Few-shot object detection via feature reweighting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 8419–8428. https://doi.org/10.1109/ICCV.2019.00851. ISSN: 2380-7504
Köhler M, Eisenbach M, Gross H-M (2022) Few-shot object detection: a comprehensive survey. arXiv. arXiv:2112.11699 [cs]. https://doi.org/10.48550/arXiv.2112.11699. http://arxiv.org/abs/2112.11699. Accessed 23 Jan 2023
Jocher G, Chaurasia A, Qiu J (2023) Ultralytics YOLO. https://github.com/ultralytics/ultralytics
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html. Accessed 24 Jan 2023
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169. ISSN: 2380-7504
He K, Gkioxari G, Dollãr P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322. ISSN: 2380-7504
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91. ISSN: 1063-6919
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690. ISSN: 1063-6919
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv. arXiv:1804.02767 [cs] (2018). https://doi.org/10.48550/arXiv.1804.02767. http://arxiv.org/abs/1804.02767. Accessed 24 Jan 2023
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv. arXiv:2004.10934[cs, eess] (2020). https://doi.org/10.48550/arXiv.2004.10934. http://arxiv.org/abs/2004.10934. Accessed 23 Jan 2023
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) YOLOX: Exceeding YOLO Series in 2021. arXiv. arXiv:2107.08430 [cs]. https://doi.org/10.48550/arXiv.2107.08430. http://arxiv.org/abs/2107.08430. Accessed 23 Jan 2023
Bakkouri I, Bakkouri S (2024) 2mgas-net: multi-level multi-scale gated attentional squeezed network for polyp segmentation. 18(6):5377–5386. https://doi.org/10.1007/s11760-024-03240-y. Accessed 05 Dec 2024
Bakkouri I, Afdel K (2020) DermoNet: A computer-aided diagnosis system for dermoscopic disease recognition. 12119:170–177. https://doi.org/10.1007/978-3-030-51935-3_18. Accessed 05 Dec 2024
Lin T-Y, Dollaár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106. ISSN: 1063-6919
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv. arXiv:2207.02696 [cs]. https://doi.org/10.48550/arXiv.2207.02696. http://arxiv.org/abs/2207.02696. Accessed 09 Feb 2023
Wang C-Y, Yeh I-H, Liao H-YM (2024) YOLOv9: Learning what you want to learn using programmable gradient information. arXiv:2402.13616 [cs] . https://doi.org/10.48550/arXiv.2402.13616. http://arxiv.org/abs/2402.13616. Accessed 01 Mar 2024
Koch G, Zemel R, Salakhutdinov R et al (2015) Siamese neural networks for one-shot image recognition. In: ICML Deep learning workshop, vol. 2. Lille. Issue: 1
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in neural information processing systems, vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/cb8da6767461f2812ae4290eac7cbc42-Abstract.html. Accessed 24 Jan 2023
Huynh D, Elhamifar E (2021) Compositional fine-grained low-shot learning. arXiv. https://doi.org/10.48550/ARXIV.2105.10438. Publisher: arXiv Version Number: 1. Accessed 17 Apr 2023
Köhler M, Eisenbach M, Gross H-M (2023) Few-shot object detection: a comprehensive survey. IEEE Trans Neural Netw Learn Syst 1–21. https://doi.org/10.1109/TNNLS.2023.3265051. Conference Name: IEEE Transactions on Neural Networks and Learning Systems
Zhang L, Zhou S, Guan J, Zhang J (2021) Accurate few-shot object detection with support-query mutual guidance and hybrid loss. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14419–14427. https://doi.org/10.1109/CVPR46437.2021.01419. ISSN: 2575-7075
Li Y, Feng W, Lyu S, Zhao Q, Li X (2020) MM-FSOD: Meta and metric integrated few-shot object detection. arXiv. arXiv:2012.15159 [cs]. https://doi.org/10.48550/arXiv.2012.15159. http://arxiv.org/abs/2012.15159. Accessed 23 Jan 2023
Lee H, Lee M, Kwak N (2022) Few-shot object detection by attending to per-sample-prototype. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, Waikoloa, HI, USA, pp 1101–1110. https://doi.org/10.1109/WACV51458.2022.00117. https://ieeexplore.ieee.org/document/9706848/. Accessed 28 Apr 2023
Michaelis C, Bethge M, Ecker AS (2022) A broad dataset is all you need for one-shot object detection. arXiv. arXiv:2011.04267 [cs]. https://doi.org/10.48550/arXiv.2011.04267. http://arxiv.org/abs/2011.04267. Accessed 24 Jan 2023
Zhang S, Wang L, Murray N, Koniusz P (2022) Kernelized Few-shot object detection with efficient integral aggregation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, pp 19185–19194. https://doi.org/10.1109/CVPR52688.2022.01861. https://ieeexplore.ieee.org/document/9880443/. Accessed 28 Apr 2023-04-28
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp. 7794–7803. IEEE, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00813. https://ieeexplore.ieee.org/document/8578911/. Accessed 14 June 2023
Yang H, Lin Y, Zhang H, Zhang Y, Xu B (2021) Towards improving classification power for one-shot object detection. Neurocomputing 455:390–400. https://doi.org/10.1016/j.neucom.2021.04.116. Accessed 24 Jan 2023
Yang H, Cai S, Sheng H, Deng B, Huang J, Hua X-S, Tang Y, Zhang Y (2022) Balanced and hierarchical relation learning for one-shot object detection. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7581–7590. https://doi.org/10.1109/CVPR52688.2022.00744. ISSN: 2575-7075
Chen D-J, Hsieh H-Y, Liu T-L (2021) Adaptive image transformer for one-shot object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12242–12251. https://doi.org/10.1109/CVPR46437.2021.01207. ISSN: 2575-7075
Lin W, Deng Y, Gao Y, Wang N, Zhou J, Liu L, Zhang L, Wang P (2021) CAT: Cross-attention transformer for one-shot object detection. arXiv. arXiv:2104.14984 [cs]. https://doi.org/10.48550/arXiv.2104.14984. http://arxiv.org/abs/2104.14984. Accessed 23 Jan 2023
Zhang X, Wang Y, Boularias A (2024) Detect everything with few examples. arXiv. arXiv:2309.12969 [cs]. http://arxiv.org/abs/2309.12969. Accessed 14 May 2024
Minderer M, Gritsenko A, Stone A, Neumann M, Weissenborn D, Dosovitskiy A, Mahendran A, Arnab A, Dehghani M, Shen Z, Wang X, Zhai X, Kipf T, Houlsby N (2022) Simple open-vocabulary object detection. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds) Computer Vision – ECCV 2022. Springer, Cham, pp 728–755. https://doi.org/10.1007/978-3-031-20080-9_42
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol. 30. Curran Associates, Inc . https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Accessed 2023-01-24
Zhang T, Zhang Y, Sun X, Sun H, Yan M, Yang X, Fu K (2020) Comparison network for one-shot conditional object detection. arXiv. arXiv:1904.02317 [cs]. https://doi.org/10.48550/arXiv.1904.02317. http://arxiv.org/abs/1904.02317. Accessed 24 Jan 2023
Gothai E, Bhatia S, Alabdali A, Sharma D, Kondamudi B, Dadheech P (2022) Design features of grocery product recognition using deep learning. 34(2):1231–1246. https://doi.org/10.32604/iasc.2022.026264. Publisher: Tech Science Press. Accessed 13 Mar 2024
Wu X, Sahoo D, Hoi S (2020) Meta-RCNN: meta learning for few-shot object detection. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 1679–1687. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3394171.3413832. Accessed 24 Jan 2023
Zhang G, Luo Z, Cui K, Lu S (2021) Meta-DETR: Image-level few-shot object detection with inter-class correlation exploitation. arXiv. arXiv:2103.11731 [cs]. https://doi.org/10.48550/arXiv.2103.11731. http://arxiv.org/abs/2103.11731. Accessed 24 Jan 2023
Han G, Huang S, Ma J, He Y, Chang S-F (2022) Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. Proceedings of the AAAI conference on artificial intelligence vol 36(1), pp 780–789. https://doi.org/10.1609/aaai.v36i1.19959. Number: 1. Accessed 28 Apr 2023
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. arXiv. arXiv:2005.12872 [cs]. http://arxiv.org/abs/2005.12872. Accessed 23 Jan 2023
Li X, Zhang L, Chen YP, Tai Y-W, Tang C-K (2020) One-shot object detection without fine-tuning. arXiv. arXiv:2005.03819 [cs, eess]. http://arxiv.org/abs/2005.03819. Accessed 23 Jan 2023
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. vol 8691, pp 346–361. https://doi.org/10.1007/978-3-319-10578-9_23 . arXiv:1406.4729 [cs]. http://arxiv.org/abs/1406.4729. Accessed 24 Feb 2024
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. arXiv. arXiv:1803.01534 [cs]. https://doi.org/10.48550/arXiv.1803.01534. http://arxiv.org/abs/1803.01534. Accessed 14 Feb 2024
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-IoU loss: faster and better learning for bounding box regression. arXiv. arXiv:1911.08287 [cs]. http://arxiv.org/abs/1911.08287. Accessed 03 Nov 2023
Li X, Lv C, Wang W, Li G, Yang L, Yang J (2022) Generalized focal loss: towards efficient representation learning for dense object detection. IEEE Trans Pattern Anal Mach Intell 1–14. https://doi.org/10.1109/TPAMI.2022.3180392. Accessed 03 Nov 2023
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence
Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. vol. 10553, pp 240–248. https://doi.org/10.1007/978-3-319-67558-9_28. http://arxiv.org/abs/1707.03237. Accessed 24 Jan 2025
Wei X-S, Cui Q, Yang L, Wang P, Liu L (2019) RPC: A large-scale retail product checkout dataset. arXiv. arXiv:1901.07249 [cs]. http://arxiv.org/abs/1901.07249. Accessed 23 Jan 2023
Fan Q, Zhuo W, Tang C-K, Tai Y-W (2020) Few-shot object detection with attention-rpn and multi-relation detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4012–4021. https://doi.org/10.1109/CVPR42600.2020.00407. ISSN: 2575-7075
Tonioni A, Di Stefano L (2017) Product recognition in store shelves as a sub-graph isomorphism problem. In: Battiato S, Gallo G, Schettini R, Stanco F (eds) Image analysis and processing - ICIAP 2017. Lecture Notes in Computer Science. Springer, Cham, pp 682–693. https://doi.org/10.1007/978-3-319-68560-1_61
Hu T, Mettes P, Huang J-H, Snoek C (2019) SILCO: Show a few images, localize the common object. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 5066–5075. https://doi.org/10.1109/ICCV.2019.00517. ISSN: 2380-7504
Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6154–6162. https://doi.org/10.1109/CVPR.2018.00644. ISSN: 2575-7075
Merler M, Galleguillos C, Belongie S (2007) Recognizing Groceries in situ using in vitro training data. In: 2007 IEEE Conference on computer vision and pattern recognition. IEEE, Minneapolis, MN, USA, pp 1–8. https://doi.org/10.1109/CVPR.2007.383486. http://ieeexplore.ieee.org/document/4270484/. Accessed 17 Apr 2023
Yan X, Chen Z, Xu A, Wang X, Liang X, Lin L (2019) Meta R-CNN: towards general solver for instance-level low-shot learning. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 9576–9585. https://doi.org/10.1109/ICCV.2019.00967. ISSN: 2380-7504
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935. ISSN: 2575-7075
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Hua G, Jégou H (eds) Computer Vision – ECCV 2016 Workshops. Lecture Notes in Computer Science. Springer, Cham, pp 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90. ISSN: 1063-6919
Acknowledgements
The authors would like to thank the Generix Group Company for financial support and the L@bISEN laboratory of the ISEN Yncrea Ouest for scientific support.
Funding
This work is funded by our partner Generix (world leader in the SaaS industry), as part of their project to add AI solutions to their existing warehouse management product.
Author information
Authors and Affiliations
Contributions
All authors contributed significantly to the work, in terms of content, analysis, writing, and revision of the manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no competing financial interests or personal relationships that could have influenced the work reported in this paper.
Consent for Publication
All authors consent to the publication of this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Desmarescaux, M., Kaddah, W., Alfalou, A. et al. SiamYOLOv8: a rapid conditional detection framework for one-shot object detection. Appl Intell 55, 609 (2025). https://doi.org/10.1007/s10489-025-06513-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06513-2
Keywords
Profiles
- Matthieu Desmarescaux View author profile