Goggles detection is a new research topic with important academic and application value. It further distinguishes whether people wear goggles correctly on the basis of face detection. There are two major challenges in the research of goggles detection, which are the lack of large-scale datasets and the high requirement of model generalization since the styles of goggles vary greatly. In this paper, a large-scale and high-quality goggles dataset is released to deal with the former problem and facilitate future research, named as GogglesDet2023. For the latter problem, a plug-and-play Clustering Transformer Module (CTM) is proposed to alleviate the inadequacy of model generalization caused by the intra-class diversity and the inter-class similarity between goggles and ordinary glasses. We have applied the Clustering Transformer Module to several typical detectors, such as YOLOX, YOLOv11, Faster R-CNN and Deformable-DETR. Extensive experiments have proved that the proposed method can effectively reduce confusion over categories, improve model generalization, and eventually achieve superior performance on the GogglesDet2023 dataset. The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.
Data Availability
The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.
This work was financially supported by the National Key Research and Development Plan of China under Grant No. 2017YFC0805100. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of University of Science and Technology of China. The authors gratefully acknowledge all of these supports.
This work was financially supported by the Fundamental Research Funds for the Central Universities under Grant No. WK2320000065.
Chong Wang: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - Original Draft. Zhuozhi Cheng: Data Curation. Adeel Akram: Validation. Qixing Zhang: Funding Acquisition, Writing - Review & Editing. Yongming Zhang: Supervision.
The authors declare that there are no known financial or Conflict of interest associated with this article.
Wang, C., Cheng, Z., Akram, A. et al. Gogglesdet: goggles detection with clustering transformer and a new dataset. SIViP 19, 188 (2025). https://doi.org/10.1007/s11760-024-03778-x
