Abstract
Goggles detection is a new research topic with important academic and application value. It further distinguishes whether people wear goggles correctly on the basis of face detection. There are two major challenges in the research of goggles detection, which are the lack of large-scale datasets and the high requirement of model generalization since the styles of goggles vary greatly. In this paper, a large-scale and high-quality goggles dataset is released to deal with the former problem and facilitate future research, named as GogglesDet2023. For the latter problem, a plug-and-play Clustering Transformer Module (CTM) is proposed to alleviate the inadequacy of model generalization caused by the intra-class diversity and the inter-class similarity between goggles and ordinary glasses. We have applied the Clustering Transformer Module to several typical detectors, such as YOLOX, YOLOv11, Faster R-CNN and Deformable-DETR. Extensive experiments have proved that the proposed method can effectively reduce confusion over categories, improve model generalization, and eventually achieve superior performance on the GogglesDet2023 dataset. The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03778-x/MediaObjects/11760_2024_3778_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03778-x/MediaObjects/11760_2024_3778_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03778-x/MediaObjects/11760_2024_3778_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03778-x/MediaObjects/11760_2024_3778_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03778-x/MediaObjects/11760_2024_3778_Fig5_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03778-x/MediaObjects/11760_2024_3778_Fig6_HTML.jpg)
Similar content being viewed by others
Data Availability
The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.
References
Yang, S., Luo, P., Loy, C.-C., Tang, X.: Wider face: A face detection benchmark. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5525–5533 (2016)
Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: proceedings of the European conference on computer vision (ECCV), pp. 797–813 (2018)
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: a survey. IEEE Trans. Patt. Anal. Mach. Intell. 45(4), 4396–415 (2022)
Shen, Z., Liu, J., He, Y., Zhang, X., Xu, R., Yu, H., Cui, P.: Towards out-of-distribution generalization: a survey. arXiv preprint arXiv:2108.13624 (2021)
Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., Philip, S.Y.: Generalizing to unseen domains: a survey on domain generalization. IEEE Trans. Knowl. Data. Eng. 35(8), 8052–72 (2022)
Grubinger, T., Birlutiu, A., Schöner, H., Natschläger, T., Heskes, T.: Domain generalization based on transfer component analysis. In: Advances in Computational Intelligence: 13th International Work-Conference on Artificial Neural Networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings, Part I 13, pp. 325–334 (2015). Springer
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE. Trans. Neural Net. 22(2), 199–210 (2010)
Muandet, K., Balduzzi, D., Schölkopf, B (2013). Domain generalization via invariant feature representation. In: International Conference on Machine Learning, pp. 10 18
Ghifary, M., Balduzzi, D., Kleijn, W.B., Zhang, M.: Scatter component analysis: a unified framework for domain adaptation and domain generalization. IEEE Trans. Patter. Anal. Mach. Intell. 39(7), 1414–1430 (2016)
Hu, S., Zhang, K., Chen, Z., Chan, L.: Domain generalization via multidomain discriminant analysis. In: uncertainty in Artificial Intelligence, pp. 292–302 (2020). PMLR
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Khanam, R., Hussain, M.: Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725 (2024)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)
Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.M.: Ranked list loss for deep metric learning. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5207–5216 (2019)
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: computer Vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer
Acknowledgements
This work was financially supported by the National Key Research and Development Plan of China under Grant No. 2017YFC0805100. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of University of Science and Technology of China. The authors gratefully acknowledge all of these supports.
Funding
This work was financially supported by the Fundamental Research Funds for the Central Universities under Grant No. WK2320000065.
Author information
Authors and Affiliations
Contributions
Chong Wang: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - Original Draft. Zhuozhi Cheng: Data Curation. Adeel Akram: Validation. Qixing Zhang: Funding Acquisition, Writing - Review & Editing. Yongming Zhang: Supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no known financial or Conflict of interest associated with this article.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, C., Cheng, Z., Akram, A. et al. Gogglesdet: goggles detection with clustering transformer and a new dataset. SIViP 19, 188 (2025). https://doi.org/10.1007/s11760-024-03778-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03778-x