skip to main content
10.1145/3604078.3604119acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdipConference Proceedingsconference-collections
research-article

Generic Skeleton Object Detection Framework with Gradient Maps

Published: 26 October 2023 Publication History

Abstract

In real-world applications, we find a special type of object with poor detection results, which we call skeleton objects. They have a relatively small percentage of truly meaningful pixels in the bounding box. The hollows of skeleton objects contain a lot of cluttered background information. Through observation and experience from previous practice, we try to use gradient map to improve the detection results of skeleton objects. Because the gradient map is equivalent to sharpening the foreground information with regular texture and smoothing the background information with clutter, which meets our requirements. So we propose the GAM (gradient attention module) to let the gradient map guide the learning of semantic information of the original image by the network. We also construct a dataset for skeleton object detection, containing 3131 images with 7 categories. We conduct experiments in several state-of-the-art object detection frameworks such as Faster R-CNN, RetinaNet, and YOLOv5, and our method is obviously superior to the corresponding baseline in almost all categories. Meanwhile the method we proposed can be easily generalized to various detection frameworks. `1`

References

[1]
Bharat Singh and Larry S. Davis. 2017. An Analysis of Scale Invariance in Object Detection - SNIP. CoRR abs/1711.08189(2017). arXiv:1711.08189 http://arxiv.org/abs/1711.08189
[2]
Bharat Singh, Mahyar Najibi, and Larry S. Davis. 2018. SNIPER: Efficient Multi-Scale Training. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 933
[3]
Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2019. Scale-Aware Trident Networks for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
[4]
Jeong-Seon Lim, Marcella Astrid, Hyun-Jin Yoon, and Seung-Ik Lee. 2021. Small Object Detection using Context and Attention. In 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). 181–186. https://doi.org/ 10.1109/ICAIIC51459.2021.9415217
[5]
Han Qiu, Yuchen Ma, Zeming Li, Songtao Liu, and Jian Sun. 2020. BorderDet: Border Feature for Dense Object Detection. CoRR abs/2007.11056 (2020). arXiv:2007.11056 https://arxiv.org/abs/2007.11056
[6]
Glenn Jocher. 2020. ultralytics/yolov5. Retrieved Nov 22, 2022 from https://github.com/ultralytics/yolov5
[7]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis Machine Intelligence 39, 6 (2017), 1137–1149
[8]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014)
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
[10]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6154–6162. https://doi.org/10.1109/CVPR.2018.00644
[11]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779–788. https://doi.org/10.1109/CVPR.2016.91
[12]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, cham, 21-37
[13]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
[14]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 936–944. https://doi.org/10.1109/CVPR.2017.106
[15]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
[16]
Songtao Liu, Di Huang, and Yunhong Wang. 2019. Learning Spatial Fusion for Single-Shot Object Detection. CoRR abs/1911.09516 (2019). arXiv:1911.09516 http://arxiv.org/abs/1911.09516
[17]
P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 1. https://doi.org/10.1109/CVPR.2001.990517
[18]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 886–893 vol. 1. https://doi.org/10.1109/CVPR.2005.177
[19]
Pedro Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1–8. https://doi.org/10.1109/CVPR.2008.4587597
[20]
Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167
[21]
Ross B. Girshick, Pedro F. Felzenszwalb, and David McAllester. 2011. Object Detection with Grammar Models. In Proceedings of the 24th International Conference on Neural Information Processing Systems (Granada, Spain) (NIPS’11).Curran Associates Inc., Red Hook, NY, US, 442-450
[22]
Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik. 2010. Detecting People Using Mutually Consistent Poselet Activations. In Proceedings of the 11th European Conference on Computer Vision: Part VI (Heraklion, Crete, Greece) (ECCV’10). Springer-Verlag, Berlin, Heidelberg, 168–181
[23]
Long Zhu, Yuanhao Chen, Alan Yuille, and William Freeman. 2010. Latent hierarchical structural learning for object detection. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1062–1069. https://doi.org/10.1109/CVPR.2010.5540096
[24]
Naresh Kumar and Nagarajan Sukavanam. 2018. Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints. Journal of Image and Graphics 6 (01 2018), 174–180. https://doi.org/10.18178/joig.6.2.174-180
[25]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. CoRR abs/1912.01703 (2019). arXiv:1912.01703 http://arxiv.org/abs/1912.01703
[26]
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab Detection Toolbox and Benchmark. CoRR abs/1906.07155 (2019). arXiv:1906.07155 http://arxiv.org/abs/1906.07155
[27]
Martin Zinkevich, Markus Weimer, Alexander Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. Advances in Neural Information Processing Systems 23, 2595–2603
[28]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826

Index Terms

  1. Generic Skeleton Object Detection Framework with Gradient Maps

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
    May 2023
    711 pages
    ISBN:9798400708237
    DOI:10.1145/3604078
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Gradient feature
    2. Object detection
    3. Skeleton object

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Hefei Municipal Natural Science Foundation
    • University Synergy Innovation Program of Anhui Province
    • Natural Science Foundation for the Higher Education Institutions of Anhui Province
    • National Natural Science Foundation of China
    • University Synergy Innovation Program of Anhui Province

    Conference

    ICDIP 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 30
      Total Downloads
    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media