End-to-End Object Detection with YOLOF

Xi, Xing; Huang, Yangyang; Wu, Weiye; Luo, Ronghua

doi:10.1007/978-981-97-5600-1_9

Xing Xi¹⁰,
Yangyang Huang¹⁰,
Weiye Wu¹⁰ &
…
Ronghua Luo¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14868))

Included in the following conference series:

International Conference on Intelligent Computing

364 Accesses

Abstract

Within the field of computer vision, object detection is a core issue. A technique extensively utilized in convolution-oriented detectors is Non-Maximum Suppression (NMS), designed to suppress redundant predictions. However, the sequential nature intrinsic to NMS inhibits its capacity for parallel execution, consequently restricting the inference speed. Furthermore, the recall rate of detectors with NMS is also affected in scenes with high object density and overlap. In this paper, we propose a real-time and end-to-end detector with YOLOF (You Only Look One-level Feature). The proposed methods do not introduce additional parameters or attention mechanisms, making them practical for real-time applications. Specifically, we propose the stop-gradient strategy to train only a portion of parameters to address the problem of weak supervision in one-to-one label assignment. We also present auxiliary losses to strengthen the supervision of negative samples during training and use semantic anchor optimization to suppress other anchors in the same location. These techniques allow the improved YOLOF to discard NMS within a 1 mAP gap and achieve faster inference speed. Our YOLOF-CSP-D53-DC5 achieves 42.7 mAP, only 0.5 mAP lower than the original version. Additionally, our YOLOF-R50 achieves a 37.1 mAP at 38 FPS and exceeds state-of-the-art networks by more than 1.5 times in inference speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rethinking the Misalignment Problem in Dense Object Detection

Enabling Deep Residual Networks for Weakly Supervised Object Detection

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

Notes

1.
For simplicity, we use ${ }{\mathop \pi \limits_{oto}}$ and ${\mathop \pi \limits_{otm}}$ to denote one-to-one and one-to-many label assignments, respectively. YOLOF_nms and YOLOF_end represent the NMS-dependent and NMS-independent YOLOF, respectively.
2.
For clarity, we omit image indices k.

References

Bolya, D., Foley, S., Hays, J., Hoffman, J.: Tide: a general toolbox for identifying object detection errors. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 558–573. Springer (2020)
Google Scholar
Chen, K., et al.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13039–13048 (2021)
Google Scholar
Chen, Y., Chen, Q., Hu, Q., Cheng, J.: Date: dual assignment for end-to-end fully convolutional object detection. arXiv preprint arXiv:2211.13859 (2022)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)
Google Scholar
Rezatofighi, H., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666 (2019)
Google Scholar
Shao, S., et al.: Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Sun, P., et al.: What makes for end-to-end object detection? In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 9934–9944. PMLR (2021)
Google Scholar
Sun, P., et al.: Sparse r-cnn: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14454–14463 (2021)
Google Scholar
Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: Generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
12Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636 (2019)
Google Scholar
Wang, J., et al.: End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15849–15858 (2021)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Zheng, Z., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation (2021)
Google Scholar
16Zhou, Q., Yu, C., Shen, C., Wang, Z., Li, H.: Object detection made simpler by eliminating heuristic nms (2021)
Google Scholar
Zhu*, B., et al.: cvpods: All-in-one toolbox for computer vision research (2020)
Google Scholar
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9157–9166 (2019)
Google Scholar
Zheng, Z., et al.: Distance-iou loss: Faster and better learning for bounding box regression. In: The AAAI Conference on Artificial Intelligence (AAAI) (2020)
Google Scholar

Download references

Acknowledgements

This work was also partially supported by Guangdong Artificial Intelligence and Digital Economy Laboratory (Guangzhou).

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Xing Xi, Yangyang Huang, Weiye Wu & Ronghua Luo

Authors

Xing Xi
View author publications
You can also search for this author in PubMed Google Scholar
Yangyang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weiye Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ronghua Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronghua Luo .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Tianjin University of Science and Technology, Tianjin, China
Chuanlei Zhang
Eastern Institute of Technology, Ningbo, China
Qinhu Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xi, X., Huang, Y., Wu, W., Luo, R. (2024). End-to-End Object Detection with YOLOF. In: Huang, DS., Zhang, C., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14868. Springer, Singapore. https://doi.org/10.1007/978-981-97-5600-1_9

Download citation

DOI: https://doi.org/10.1007/978-981-97-5600-1_9
Published: 30 July 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5599-8
Online ISBN: 978-981-97-5600-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

End-to-End Object Detection with YOLOF

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking the Misalignment Problem in Dense Object Detection

Enabling Deep Residual Networks for Weakly Supervised Object Detection

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

End-to-End Object Detection with YOLOF

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking the Misalignment Problem in Dense Object Detection

Enabling Deep Residual Networks for Weakly Supervised Object Detection

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation