skip to main content
10.1145/3539637.3557929acmconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection

Published: 07 November 2022 Publication History

Abstract

Object detection is an essential capacity of computer vision solutions. It has gained attention over the last years by using a core component of the “Once learning” and “Few-shot learning” mechanism. This research analyzes the ability of a machine learning framework named “You Only Look Once,” to perform object localization task in a “Heuristic once learning” context. It will also study the advantages and practical limitations of YOLO by experimenting with two types of implementation: 1) the simplest one (a.k.a tiny YOLO), and 2) the first version of YOLO. The case studies are carried out in various visual data types and object contexts, such as object deformation caused by fast-forward frame, spatial distortion caused by isometric projection, and gaming images with abnormal objects. Finally, we build a dataset accounting for a new task so-called “Heuristic once learning”. Results using YOLO-v5 in such conditions showed that YOLO had difficulties to generalize simple abstractions of the characters, pointing to the necessity of new approaches to solve such challenges.

References

[1]
Rowel Atienza. 2022. Improving Model Generalization by Agreement of Learned Representations from Data Augmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022), 3927–3936. https://doi.org/10.1109/WACV51458.2022.00398
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. https://doi.org/doi/abs/10.5555/3495724.3495883
[3]
Zhang Deyin, Wei Penghui, Tang Mingwei, Chen Conghan, Wang Li, and Hong Wenxuan. 2020. Investigation of Aircraft Surface Defects Detection Based on YOLO Neural Network. 2020 7th International Conference on Information Science and Control Engineering (ICISCE)(2020), 781–785. https://doi.org/10.1109/ICISCE50968.2020.00165
[4]
Li Fei-Fei, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop. IEEE, 178–178. https://doi.org/10.1109/CVPR.2004.383
[5]
Hasna Fadhilah Hasya, Hilal Hudan Nuha, and Maman Abdurohman. 2021. Real Time-based Skin Cancer Detection System using Convolutional Neural Network and YOLO. 2021 4th International Conference of Computer and Informatics Engineering (IC2IE) (2021), 152–157. https://doi.org/10.1109/IC2IE53219.2021.9649224
[6]
Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. 2022. A Review of Yolo Algorithm Developments. Procedia Computer Science(2022).
[7]
Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332–1338.
[8]
Shutao Li, Xudong Kang, Leyuan Fang, Jianwen Hu, and Haitao Yin. 2017. Pixel-level image fusion: A survey of the state of the art. Inf. Fusion 33(2017), 100–112. https://doi.org/10.1016/j.inffus.2016.05.004
[9]
Weigang Li and Nilton Correia da Silva. 1999. A study of parallel neural networks. In IJCNN.
[10]
Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Chung-Yi Li, and Hung yi Lee. 2020. Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization.
[11]
W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV. https://doi.org/10.1007/978-3-319-46448-0_2
[12]
Elizabeth F. Loftus, D. Glen Miller, and Herbert J. Burns. 1978. Semantic integration of verbal information into a visual memory. Journal of experimental psychology. Human learning and memory 4 1 (1978), 19–31. https://doi.org/10.1037//0278-7393.4.1.19
[13]
Erik G Miller, Nicholas E Matsakis, and Paul A Viola. 2000. Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), Vol. 1. IEEE, 464–471. https://doi.org/10.1109/CVPR.2000.855856
[14]
Milad Moradi, Kathrin Blagec, Florian Haberl, and Matthias Samwald. 2021. GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain. ArXiv abs/2109.02555(2021).
[15]
Telmo J. P. Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In ACL. https://doi.org/10.18653/v1/P19-1493
[16]
Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 779–788.
[17]
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, 2022. A generalist agent. arXiv preprint arXiv:2205.06175(2022).
[18]
Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-Transfer Learning for Few-Shot Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 403–412. https://doi.org/10.1109/CVPR.2019.00049
[19]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to Compare: Relation Network for Few-Shot Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 1199–1208.
[20]
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using Convolutional Networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 648–656. https://doi.org/10.1109/CVPR.2015.7298664
[21]
Oriol Vinyals, Charles Blundell, Timothy P. Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In NIPS. https://doi.org/doi/10.5555/3157382.3157504
[22]
Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34. https://doi.org/10.1145/3386252
[23]
Li Weigang. 1998. A Study of Parallel Self-Organizing Map. arXiv: Quantum Physics(1998).
[24]
Shijie Wu and Mark Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. In EMNLP. https://doi.org/10.18653/v1/D19-1077
[25]
Wenyan Yang, Yanlin Qian, Francesco Cricri, Lixin Fan, and Joni-Kristian Kämäräinen. 2018. Object Detection in Equirectangular Panorama. 2018 24th International Conference on Pattern Recognition (ICPR) (2018), 2190–2195.
[26]
Xuanrui Zhang, Xieyang Su, Junbo Yu, Weihong Jiang, Shengchun Wang, Yuan Zhang, Zhiyong Zhang, and Liang Wang. 2021. Combine Object Detection with Skeleton-Based Action Recognition to Detect Smoking Behavior. 2021 The 5th International Conference on Video and Image Processing (2021). https://doi.org/10.1145/3511176.3511194
[27]
Yifan Zhang, Xu Li, Feiyue Wang, Baoguo Wei, and Lixin Li. 2021. A Comprehensive Review of One-stage Networks for Object Detection. 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) (2021), 1–6.
[28]
Xingkui Zhu, Shuchang Lyu, Xu Wang, and Qi Zhao. 2021. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021), 2778–2788. https://doi.org/10.1109/ICCVW54120.2021.00312

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebMedia '22: Proceedings of the Brazilian Symposium on Multimedia and the Web
November 2022
389 pages
ISBN:9781450394093
DOI:10.1145/3539637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Few-shot learning
  2. Object Detection
  3. Once learning
  4. YOLO.

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • CAPES

Conference

WebMedia '22
WebMedia '22: Brazilian Symposium on Multimedia and Web
November 7 - 11, 2022
Curitiba, Brazil

Acceptance Rates

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Object Detection with YOLOv5 in Indoor Equirectangular PanoramasProcedia Computer Science10.1016/j.procs.2023.10.233225:C(2420-2428)Online publication date: 4-Mar-2024
  • (2023)Residual Transformer YOLO for Detecting Multi-Scale Crowded PedestrianApplied Sciences10.3390/app13211203213:21(12032)Online publication date: 4-Nov-2023
  • (2023)Generic Multimodal Gradient-based Meta Learner Framework2023 26th International Conference on Information Fusion (FUSION)10.23919/FUSION52260.2023.10224143(1-8)Online publication date: 28-Jun-2023
  • (2023)Detecting Criminal Activities From CCTV by using Object Detection and machine Learning Algorithms2023 3rd International Conference on Intelligent Technologies (CONIT)10.1109/CONIT59222.2023.10205699(1-6)Online publication date: 23-Jun-2023
  • (2022)Heuristic Once Learning for Image & Text Duality Information Processing2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00195(1353-1359)Online publication date: Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media