research-article

Generic Skeleton Object Detection Framework with Gradient Maps

Authors:

Mengyu HuangAuthors Info & Claims

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

Article No.: 41, Pages 1 - 8

https://doi.org/10.1145/3604078.3604119

Published: 26 October 2023 Publication History

Abstract

In real-world applications, we find a special type of object with poor detection results, which we call skeleton objects. They have a relatively small percentage of truly meaningful pixels in the bounding box. The hollows of skeleton objects contain a lot of cluttered background information. Through observation and experience from previous practice, we try to use gradient map to improve the detection results of skeleton objects. Because the gradient map is equivalent to sharpening the foreground information with regular texture and smoothing the background information with clutter, which meets our requirements. So we propose the GAM (gradient attention module) to let the gradient map guide the learning of semantic information of the original image by the network. We also construct a dataset for skeleton object detection, containing 3131 images with 7 categories. We conduct experiments in several state-of-the-art object detection frameworks such as Faster R-CNN, RetinaNet, and YOLOv5, and our method is obviously superior to the corresponding baseline in almost all categories. Meanwhile the method we proposed can be easily generalized to various detection frameworks. `1`

References

[1]

Bharat Singh and Larry S. Davis. 2017. An Analysis of Scale Invariance in Object Detection - SNIP. CoRR abs/1711.08189(2017). arXiv:1711.08189 http://arxiv.org/abs/1711.08189

[2]

Bharat Singh, Mahyar Najibi, and Larry S. Davis. 2018. SNIPER: Efficient Multi-Scale Training. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 933

[3]

Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2019. Scale-Aware Trident Networks for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

[4]

Jeong-Seon Lim, Marcella Astrid, Hyun-Jin Yoon, and Seung-Ik Lee. 2021. Small Object Detection using Context and Attention. In 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). 181–186. https://doi.org/ 10.1109/ICAIIC51459.2021.9415217

[5]

Han Qiu, Yuchen Ma, Zeming Li, Songtao Liu, and Jian Sun. 2020. BorderDet: Border Feature for Dense Object Detection. CoRR abs/2007.11056 (2020). arXiv:2007.11056 https://arxiv.org/abs/2007.11056

[6]

Glenn Jocher. 2020. ultralytics/yolov5. Retrieved Nov 22, 2022 from https://github.com/ultralytics/yolov5

[7]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis Machine Intelligence 39, 6 (2017), 1137–1149

Digital Library

[8]

K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014)

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

[10]

Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6154–6162. https://doi.org/10.1109/CVPR.2018.00644

[11]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779–788. https://doi.org/10.1109/CVPR.2016.91

[12]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, cham, 21-37

[13]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9626–9635. https://doi.org/10.1109/ICCV.2019.00972

[14]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 936–944. https://doi.org/10.1109/CVPR.2017.106

[15]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8759–8768. https://doi.org/10.1109/CVPR.2018.00913

[16]

Songtao Liu, Di Huang, and Yunhong Wang. 2019. Learning Spatial Fusion for Single-Shot Object Detection. CoRR abs/1911.09516 (2019). arXiv:1911.09516 http://arxiv.org/abs/1911.09516

[17]

P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 1. https://doi.org/10.1109/CVPR.2001.990517

[18]

N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 886–893 vol. 1. https://doi.org/10.1109/CVPR.2005.177

Digital Library

[19]

Pedro Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1–8. https://doi.org/10.1109/CVPR.2008.4587597

[20]

Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167

Digital Library

[21]

Ross B. Girshick, Pedro F. Felzenszwalb, and David McAllester. 2011. Object Detection with Grammar Models. In Proceedings of the 24th International Conference on Neural Information Processing Systems (Granada, Spain) (NIPS’11).Curran Associates Inc., Red Hook, NY, US, 442-450

Digital Library

[22]

Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik. 2010. Detecting People Using Mutually Consistent Poselet Activations. In Proceedings of the 11th European Conference on Computer Vision: Part VI (Heraklion, Crete, Greece) (ECCV’10). Springer-Verlag, Berlin, Heidelberg, 168–181

Digital Library

[23]

Long Zhu, Yuanhao Chen, Alan Yuille, and William Freeman. 2010. Latent hierarchical structural learning for object detection. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1062–1069. https://doi.org/10.1109/CVPR.2010.5540096

[24]

Naresh Kumar and Nagarajan Sukavanam. 2018. Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints. Journal of Image and Graphics 6 (01 2018), 174–180. https://doi.org/10.18178/joig.6.2.174-180

[25]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. CoRR abs/1912.01703 (2019). arXiv:1912.01703 http://arxiv.org/abs/1912.01703

[26]

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab Detection Toolbox and Benchmark. CoRR abs/1906.07155 (2019). arXiv:1906.07155 http://arxiv.org/abs/1906.07155

[27]

Martin Zinkevich, Markus Weimer, Alexander Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. Advances in Neural Information Processing Systems 23, 2595–2603

[28]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826

Index Terms

Generic Skeleton Object Detection Framework with Gradient Maps
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Regionlets for Generic Object Detection
Generic object detection is confronted by dealing with different degrees of variations, caused by viewpoints or deformations in distinct object classes, with tractable computations. This demands for descriptive and flexible object representations which ...
Regionlets for Generic Object Detection
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many ...
Gradient optimization for object detection in learning with noisy labels
Abstract
Deep neural networks have made significant progress benefiting large-scale correctly human-labeled datasets. However, large-scale human-labeled datasets are often ambiguous because the limited experience can lead to mislabeled classes. Most ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

May 2023

711 pages

ISBN:9798400708237

DOI:10.1145/3604078

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Hefei Municipal Natural Science Foundation
University Synergy Innovation Program of Anhui Province
Natural Science Foundation for the Higher Education Institutions of Anhui Province
National Natural Science Foundation of China
University Synergy Innovation Program of Anhui Province

Conference

ICDIP 2023

ICDIP 2023: The 15th International Conference on Digital Image Processing

May 19 - 22, 2023

Nanjing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
30
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)5

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten