research-article

VLP Based Open-set Object Detection with Improved RT-DETR

Authors:

Yanwei ZhangAuthors Info & Claims

CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

Pages 101 - 106

https://doi.org/10.1145/3690407.3690424

Published: 24 October 2024 Publication History

Abstract

Despite the remarkable accuracy of traditional object detectors, they are unable to detect novel categories. This paper proposes a method for open-set object detection based on generating pseudo-labels using the Vision-Language Pre-trained (VLP) model. This approach enables traditional object detectors to perform open-set object detection and can be generalized to all object detectors. Additionally, this paper introduces two improvements to RT-DETR. First, replacing the RepC3 in the fusion module with Manhattan Self-Attention (MaSA) to better construct global features. Second, using MPDIoU loss instead of GIoU loss. The results demonstrate that the improved RT-DETR achieves increases of 3.1%, 3.8%, and 3.1% mAP for all classes, base classes, and novel classes on the Pascal VOC07+12 dataset, respectively. Furthermore, the proposed method shows a 1.3% improvement in mAP for open-set object detection (64.6% mAP for novel classes) compared to ZSD methods.

References

[1]

Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).

[2]

Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[3]

Carion, Nicolas, et al. "End-to-end object detection with transformers." European conference on computer vision. Cham: Springer International Publishing, 2020.

[4]

Gu, Xiuye et al. “Open-vocabulary Object Detection via Vision and Language Knowledge Distillation.” International Conference on Learning Representations (2021).

[5]

Radford, Alec et al. “Learning Transferable Visual Models From Natural Language Supervision.” International Conference on Machine Learning (2021).

[6]

Jaiswal, Ayush et al. “Class-agnostic Object Detection.” 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (2020): 918-927.

[7]

Lv, Wenyu et al. “DETRs Beat YOLOs on Real-time Object Detection.” ArXiv abs/2304.08069 (2023): n. pag.

[8]

Fan, Qihang et al. “RMT: Retentive Networks Meet Vision Transformers.” ArXiv abs/2309.11523 (2023): n. pag.

[9]

Rezatofighi, Seyed Hamid et al. “Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019): 658-666.

[10]

Ma, Siliang and Yong Xu. “MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression.” ArXiv abs/2307.07662 (2023): n. pag.

[11]

Mark Everingham, Luc Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (VOC) challenge. Int. J. on Computer Vision, 88(2):303–338, 2010.

Digital Library

[12]

Demirel, Berkan et al. “Zero-Shot Object Detection by Hybrid Region Embedding.” British Machine Vision Conference (2018).

[13]

Shafin Rahman, Salman Khan, and Fatih Porikli. Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In Asian Conference on Computer Vision, pages 547–563. Springer, 2018.

[14]

Ye Zheng, Ruoran Huang, Chuanqi Han, Xi Huang, and Li Cui. Background learnable cascade for zero-shot object detection. In Proceedings of the Asian Conference on Computer Vision, 2020.

[15]

Nasir Hayat, Munawar Hayat, Shafin Rahman, Salman Khan, Syed Waqas Zamir, and Fahad Shahbaz Khan. Synthesizing the unseen for zero-shot object detection. In Proceedings ofthe Asian Conference on Computer Vision, 2020.

[16]

Sarma, Sandipan et al. “Resolving Semantic Confusions for Improved Zero-Shot Detection.” British Machine Vision Conference (2022).

[17]

Redmon, Joseph and Ali Farhadi. “YOLO9000: Better, Faster, Stronger.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 6517-6525.

[18]

Cai, Zhaowei and Nuno Vasconcelos. “Cascade R-CNN: Delving Into High Quality Object Detection.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017): 6154-6162.

Index Terms

VLP Based Open-set Object Detection with Improved RT-DETR
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Enhancing Open-Set Object Detection via Uncertainty-Boxes Identification
Pattern Recognition and Computer Vision
Abstract
Open-set object detection is a challenging task in computer vision, which aims to detect known object categories while simultaneously identifying unknown objects. Inspired by how humans naturally distinguish unseen objects by comparing their ...
Uncertainty-Aware Deep Open-Set Object Detection
Rough Sets
Abstract
Open-set object detection better simulates the real world compared with close-set object detection. Besides the classes of interest, it also pays attention to unknown objects in the environment. We extend the previous concept of open-set object ...
FedsNet: the real-time network for pedestrian detection based on RT-DETR
Abstract
In response to the problems of complex model networks, low detection accuracy, and the detection of small targets prone to false detections and omissions in pedestrian detection, this paper proposes FedsNet, a pedestrian detection network based on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CAIBDA '24: Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

June 2024

1206 pages

ISBN:9798400710247

DOI:10.1145/3690407

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CAIBDA 2024

CAIBDA 2024: 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms

June 21 - 23, 2024

Zhengzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)5

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten