Abstract:
Self-supervised representation learning has shown promising results in recent years. However, most of the proposed methods are pre-trained on object-centric datasets with...Show MoreMetadata
Abstract:
Self-supervised representation learning has shown promising results in recent years. However, most of the proposed methods are pre-trained on object-centric datasets with image-level pretext tasks. In this study, we follow DenseCL, which is pre-trained on pixel-level scene-centric datasets with contrastive learning. Our goal is to alleviate the false negative pairing problem in contrastive learning by consistency regularization. Our method outperforms DenseCL and PixContrast models in most of the scenarios. In PASCAL VOC object detection, we see 0.2% AP50 and 0.3% AP improvements. In COCO object detection, we get 0.3% AP and 0.7% AP boosts. We also improve by 0.4% AP and 0.6% AP in COCO instance segmentation, and 0.1% mAP and 0.9% mAP in PASCAL VOC semantic segmentation. Moreover, attention map visualization and k-nearest neighbour retrieval indicate qualitative improvement from the proposed method.
Date of Conference: 15-18 May 2024
Date Added to IEEE Xplore: 23 July 2024
ISBN Information:
Print on Demand(PoD) ISSN: 2165-0608