Skip to main content
Log in

Gogglesdet: goggles detection with clustering transformer and a new dataset

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Goggles detection is a new research topic with important academic and application value. It further distinguishes whether people wear goggles correctly on the basis of face detection. There are two major challenges in the research of goggles detection, which are the lack of large-scale datasets and the high requirement of model generalization since the styles of goggles vary greatly. In this paper, a large-scale and high-quality goggles dataset is released to deal with the former problem and facilitate future research, named as GogglesDet2023. For the latter problem, a plug-and-play Clustering Transformer Module (CTM) is proposed to alleviate the inadequacy of model generalization caused by the intra-class diversity and the inter-class similarity between goggles and ordinary glasses. We have applied the Clustering Transformer Module to several typical detectors, such as YOLOX, YOLOv11, Faster R-CNN and Deformable-DETR. Extensive experiments have proved that the proposed method can effectively reduce confusion over categories, improve model generalization, and eventually achieve superior performance on the GogglesDet2023 dataset. The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.

References

  1. Yang, S., Luo, P., Loy, C.-C., Tang, X.: Wider face: A face detection benchmark. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5525–5533 (2016)

  2. Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: proceedings of the European conference on computer vision (ECCV), pp. 797–813 (2018)

  3. Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: a survey. IEEE Trans. Patt. Anal. Mach. Intell. 45(4), 4396–415 (2022)

    MATH  Google Scholar 

  4. Shen, Z., Liu, J., He, Y., Zhang, X., Xu, R., Yu, H., Cui, P.: Towards out-of-distribution generalization: a survey. arXiv preprint arXiv:2108.13624 (2021)

  5. Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., Philip, S.Y.: Generalizing to unseen domains: a survey on domain generalization. IEEE Trans. Knowl. Data. Eng. 35(8), 8052–72 (2022)

    Google Scholar 

  6. Grubinger, T., Birlutiu, A., Schöner, H., Natschläger, T., Heskes, T.: Domain generalization based on transfer component analysis. In: Advances in Computational Intelligence: 13th International Work-Conference on Artificial Neural Networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings, Part I 13, pp. 325–334 (2015). Springer

  7. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE. Trans. Neural Net. 22(2), 199–210 (2010)

    Article  MATH  Google Scholar 

  8. Muandet, K., Balduzzi, D., Schölkopf, B (2013). Domain generalization via invariant feature representation. In: International Conference on Machine Learning, pp. 10 18

  9. Ghifary, M., Balduzzi, D., Kleijn, W.B., Zhang, M.: Scatter component analysis: a unified framework for domain adaptation and domain generalization. IEEE Trans. Patter. Anal. Mach. Intell. 39(7), 1414–1430 (2016)

  10. Hu, S., Zhang, K., Chen, Z., Chan, L.: Domain generalization via multidomain discriminant analysis. In: uncertainty in Artificial Intelligence, pp. 292–302 (2020). PMLR

  11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  12. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  13. Khanam, R., Hussain, M.: Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725 (2024)

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)

  15. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

  16. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)

  17. Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.M.: Ranked list loss for deep metric learning. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5207–5216 (2019)

  18. Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  19. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  21. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: computer Vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer

Download references

Acknowledgements

This work was financially supported by the National Key Research and Development Plan of China under Grant No. 2017YFC0805100. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of University of Science and Technology of China. The authors gratefully acknowledge all of these supports.

Funding

This work was financially supported by the Fundamental Research Funds for the Central Universities under Grant No. WK2320000065.

Author information

Authors and Affiliations

Authors

Contributions

Chong Wang: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - Original Draft. Zhuozhi Cheng: Data Curation. Adeel Akram: Validation. Qixing Zhang: Funding Acquisition, Writing - Review & Editing. Yongming Zhang: Supervision.

Corresponding author

Correspondence to Qixing Zhang.

Ethics declarations

Conflict of interest

The authors declare that there are no known financial or Conflict of interest associated with this article.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Cheng, Z., Akram, A. et al. Gogglesdet: goggles detection with clustering transformer and a new dataset. SIViP 19, 188 (2025). https://doi.org/10.1007/s11760-024-03778-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03778-x

Keywords

Navigation