short-paper

Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection

Authors:

Mylène C. Q. Farias,

Li WeigangAuthors Info & Claims

WebMedia '22: Proceedings of the Brazilian Symposium on Multimedia and the Web

Pages 298 - 304

https://doi.org/10.1145/3539637.3557929

Published: 07 November 2022 Publication History

Abstract

Object detection is an essential capacity of computer vision solutions. It has gained attention over the last years by using a core component of the “Once learning” and “Few-shot learning” mechanism. This research analyzes the ability of a machine learning framework named “You Only Look Once,” to perform object localization task in a “Heuristic once learning” context. It will also study the advantages and practical limitations of YOLO by experimenting with two types of implementation: 1) the simplest one (a.k.a tiny YOLO), and 2) the first version of YOLO. The case studies are carried out in various visual data types and object contexts, such as object deformation caused by fast-forward frame, spatial distortion caused by isometric projection, and gaming images with abnormal objects. Finally, we build a dataset accounting for a new task so-called “Heuristic once learning”. Results using YOLO-v5 in such conditions showed that YOLO had difficulties to generalize simple abstractions of the characters, pointing to the necessity of new approaches to solve such challenges.

References

[1]

Rowel Atienza. 2022. Improving Model Generalization by Agreement of Learned Representations from Data Augmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022), 3927–3936. https://doi.org/10.1109/WACV51458.2022.00398

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. https://doi.org/doi/abs/10.5555/3495724.3495883

[3]

Zhang Deyin, Wei Penghui, Tang Mingwei, Chen Conghan, Wang Li, and Hong Wenxuan. 2020. Investigation of Aircraft Surface Defects Detection Based on YOLO Neural Network. 2020 7th International Conference on Information Science and Control Engineering (ICISCE)(2020), 781–785. https://doi.org/10.1109/ICISCE50968.2020.00165

[4]

Li Fei-Fei, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop. IEEE, 178–178. https://doi.org/10.1109/CVPR.2004.383

[5]

Hasna Fadhilah Hasya, Hilal Hudan Nuha, and Maman Abdurohman. 2021. Real Time-based Skin Cancer Detection System using Convolutional Neural Network and YOLO. 2021 4th International Conference of Computer and Informatics Engineering (IC2IE) (2021), 152–157. https://doi.org/10.1109/IC2IE53219.2021.9649224

[6]

Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. 2022. A Review of Yolo Algorithm Developments. Procedia Computer Science(2022).

[7]

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332–1338.

[8]

Shutao Li, Xudong Kang, Leyuan Fang, Jianwen Hu, and Haitao Yin. 2017. Pixel-level image fusion: A survey of the state of the art. Inf. Fusion 33(2017), 100–112. https://doi.org/10.1016/j.inffus.2016.05.004

Digital Library

[9]

Weigang Li and Nilton Correia da Silva. 1999. A study of parallel neural networks. In IJCNN.

[10]

Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Chung-Yi Li, and Hung yi Lee. 2020. Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization.

[11]

W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV. https://doi.org/10.1007/978-3-319-46448-0_2

[12]

Elizabeth F. Loftus, D. Glen Miller, and Herbert J. Burns. 1978. Semantic integration of verbal information into a visual memory. Journal of experimental psychology. Human learning and memory 4 1 (1978), 19–31. https://doi.org/10.1037//0278-7393.4.1.19

[13]

Erik G Miller, Nicholas E Matsakis, and Paul A Viola. 2000. Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), Vol. 1. IEEE, 464–471. https://doi.org/10.1109/CVPR.2000.855856

[14]

Milad Moradi, Kathrin Blagec, Florian Haberl, and Matthias Samwald. 2021. GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain. ArXiv abs/2109.02555(2021).

[15]

Telmo J. P. Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In ACL. https://doi.org/10.18653/v1/P19-1493

[16]

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 779–788.

[17]

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, 2022. A generalist agent. arXiv preprint arXiv:2205.06175(2022).

[18]

Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-Transfer Learning for Few-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 403–412. https://doi.org/10.1109/CVPR.2019.00049

[19]

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to Compare: Relation Network for Few-Shot Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 1199–1208.

[20]

Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using Convolutional Networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 648–656. https://doi.org/10.1109/CVPR.2015.7298664

[21]

Oriol Vinyals, Charles Blundell, Timothy P. Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In NIPS. https://doi.org/doi/10.5555/3157382.3157504

[22]

Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34. https://doi.org/10.1145/3386252

Digital Library

[23]

Li Weigang. 1998. A Study of Parallel Self-Organizing Map. arXiv: Quantum Physics(1998).

[24]

Shijie Wu and Mark Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. In EMNLP. https://doi.org/10.18653/v1/D19-1077

[25]

Wenyan Yang, Yanlin Qian, Francesco Cricri, Lixin Fan, and Joni-Kristian Kämäräinen. 2018. Object Detection in Equirectangular Panorama. 2018 24th International Conference on Pattern Recognition (ICPR) (2018), 2190–2195.

[26]

Xuanrui Zhang, Xieyang Su, Junbo Yu, Weihong Jiang, Shengchun Wang, Yuan Zhang, Zhiyong Zhang, and Liang Wang. 2021. Combine Object Detection with Skeleton-Based Action Recognition to Detect Smoking Behavior. 2021 The 5th International Conference on Video and Image Processing (2021). https://doi.org/10.1145/3511176.3511194

Digital Library

[27]

Yifan Zhang, Xu Li, Feiyue Wang, Baoguo Wei, and Lixin Li. 2021. A Comprehensive Review of One-stage Networks for Object Detection. 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) (2021), 1–6.

[28]

Xingkui Zhu, Shuchang Lyu, Xu Wang, and Qi Zhao. 2021. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021), 2778–2788. https://doi.org/10.1109/ICCVW54120.2021.00312

Cited By

Pokuciński SMrozek D(2024)Object Detection with YOLOv5 in Indoor Equirectangular PanoramasProcedia Computer Science10.1016/j.procs.2023.10.233225:C(2420-2428)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.10.233
Ye HWang Y(2023)Residual Transformer YOLO for Detecting Multi-Scale Crowded PedestrianApplied Sciences10.3390/app13211203213:21(12032)Online publication date: 4-Nov-2023
https://doi.org/10.3390/app132112032
Enamoto LWeigang LFilho GCosta P(2023)Generic Multimodal Gradient-based Meta Learner Framework2023 26th International Conference on Information Fusion (FUSION)10.23919/FUSION52260.2023.10224143(1-8)Online publication date: 28-Jun-2023
https://doi.org/10.23919/FUSION52260.2023.10224143
Show More Cited By

Index Terms

Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Learning paradigms

Recommendations

Incremental Object Detection based on YOLO v5 and EWC Models
AIPR '23: Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

Current object detection models become increasingly sophisticated under the challenges of open domain. It is strongly needed to continuously learn new unseen images to achieving incremental object detection capacity. The existing methods mainly rely on ...
Few-Shot Object Detection via Transfer Learning and Contrastive Reweighting
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
In recent years, there has been increasing interest in few-shot object detection (FSOD), which involves detecting novel objects from just a few annotated examples. Transfer learning has been identified as an effective method for solving this task, ...
Weakly- and Semi-Supervised Fast Region-Based CNN for Object Detection
Abstract
Learning an effective object detector with little supervision is an essential but challenging problem in computer vision applications. In this paper, we consider the problem of learning a deep convolutional neural network (CNN) based object ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WebMedia '22: Proceedings of the Brazilian Symposium on Multimedia and the Web

November 2022

389 pages

ISBN:9781450394093

DOI:10.1145/3539637

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

CAPES

Conference

WebMedia '22

WebMedia '22: Brazilian Symposium on Multimedia and Web

November 7 - 11, 2022

Curitiba, Brazil

Acceptance Rates

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
134
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pokuciński SMrozek D(2024)Object Detection with YOLOv5 in Indoor Equirectangular PanoramasProcedia Computer Science10.1016/j.procs.2023.10.233225:C(2420-2428)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.10.233
Ye HWang Y(2023)Residual Transformer YOLO for Detecting Multi-Scale Crowded PedestrianApplied Sciences10.3390/app13211203213:21(12032)Online publication date: 4-Nov-2023
https://doi.org/10.3390/app132112032
Enamoto LWeigang LFilho GCosta P(2023)Generic Multimodal Gradient-based Meta Learner Framework2023 26th International Conference on Information Fusion (FUSION)10.23919/FUSION52260.2023.10224143(1-8)Online publication date: 28-Jun-2023
https://doi.org/10.23919/FUSION52260.2023.10224143
Singla SChadha R(2023)Detecting Criminal Activities From CCTV by using Object Detection and machine Learning Algorithms2023 3rd International Conference on Intelligent Technologies (CONIT)10.1109/CONIT59222.2023.10205699(1-6)Online publication date: 23-Jun-2023
https://doi.org/10.1109/CONIT59222.2023.10205699
Weigang LMartins LFerreira NMiranda CAlthoff LPessoa WFarias MJacobi RRincon M(2022)Heuristic Once Learning for Image & Text Duality Information Processing2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00195(1353-1359)Online publication date: Dec-2022
https://doi.org/10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00195

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents