Gogglesdet: goggles detection with clustering transformer and a new dataset

Wang, Chong; Cheng, Zhuozhi; Akram, Adeel; Zhang, Qixing; Zhang, Yongming

doi:10.1007/s11760-024-03778-x

Gogglesdet: goggles detection with clustering transformer and a new dataset

Original Paper
Published: 03 January 2025

Volume 19, article number 188, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Chong Wang¹,
Zhuozhi Cheng²,
Adeel Akram^1,2,
Qixing Zhang^1,2,3 &
…
Yongming Zhang¹

148 Accesses
Explore all metrics

Abstract

Goggles detection is a new research topic with important academic and application value. It further distinguishes whether people wear goggles correctly on the basis of face detection. There are two major challenges in the research of goggles detection, which are the lack of large-scale datasets and the high requirement of model generalization since the styles of goggles vary greatly. In this paper, a large-scale and high-quality goggles dataset is released to deal with the former problem and facilitate future research, named as GogglesDet2023. For the latter problem, a plug-and-play Clustering Transformer Module (CTM) is proposed to alleviate the inadequacy of model generalization caused by the intra-class diversity and the inter-class similarity between goggles and ordinary glasses. We have applied the Clustering Transformer Module to several typical detectors, such as YOLOX, YOLOv11, Faster R-CNN and Deformable-DETR. Extensive experiments have proved that the proposed method can effectively reduce confusion over categories, improve model generalization, and eventually achieve superior performance on the GogglesDet2023 dataset. The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Wearable Face Recognition System on Google Glass for Assisting Social Interactions

Glasses Detection Using Convolutional Neural Networks

Where’s Wally: A Gigapixel Image Study for Face Recognition in Crowds

Data Availability

The code and the dataset are public available at https://github.com/WCUSTC/GogglesDET.

References

Yang, S., Luo, P., Loy, C.-C., Tang, X.: Wider face: A face detection benchmark. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5525–5533 (2016)
Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: a context-assisted single shot face detector. In: proceedings of the European conference on computer vision (ECCV), pp. 797–813 (2018)
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: a survey. IEEE Trans. Patt. Anal. Mach. Intell. 45(4), 4396–415 (2022)
MATH Google Scholar
Shen, Z., Liu, J., He, Y., Zhang, X., Xu, R., Yu, H., Cui, P.: Towards out-of-distribution generalization: a survey. arXiv preprint arXiv:2108.13624 (2021)
Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., Philip, S.Y.: Generalizing to unseen domains: a survey on domain generalization. IEEE Trans. Knowl. Data. Eng. 35(8), 8052–72 (2022)
Google Scholar
Grubinger, T., Birlutiu, A., Schöner, H., Natschläger, T., Heskes, T.: Domain generalization based on transfer component analysis. In: Advances in Computational Intelligence: 13th International Work-Conference on Artificial Neural Networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings, Part I 13, pp. 325–334 (2015). Springer
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE. Trans. Neural Net. 22(2), 199–210 (2010)
Article MATH Google Scholar
Muandet, K., Balduzzi, D., Schölkopf, B (2013). Domain generalization via invariant feature representation. In: International Conference on Machine Learning, pp. 10 18
Ghifary, M., Balduzzi, D., Kleijn, W.B., Zhang, M.: Scatter component analysis: a unified framework for domain adaptation and domain generalization. IEEE Trans. Patter. Anal. Mach. Intell. 39(7), 1414–1430 (2016)
Hu, S., Zhang, K., Chen, Z., Chan, L.: Domain generalization via multidomain discriminant analysis. In: uncertainty in Artificial Intelligence, pp. 292–302 (2020). PMLR
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Khanam, R., Hussain, M.: Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725 (2024)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)
Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.M.: Ranked list loss for deep metric learning. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5207–5216 (2019)
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: computer Vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer

Download references

Acknowledgements

This work was financially supported by the National Key Research and Development Plan of China under Grant No. 2017YFC0805100. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of University of Science and Technology of China. The authors gratefully acknowledge all of these supports.

Funding

This work was financially supported by the Fundamental Research Funds for the Central Universities under Grant No. WK2320000065.

Author information

Authors and Affiliations

SKLFS, University of Science and Technology of China, Hefei, 230026, Anhui, China
Chong Wang, Adeel Akram, Qixing Zhang & Yongming Zhang
IAT, University of Science and Technology of China, Hefei, 230031, Anhui, China
Zhuozhi Cheng, Adeel Akram & Qixing Zhang
iFIRE TEK Co., Ltd., Hefei, 230031, Anhui, China
Qixing Zhang

Authors

Chong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuozhi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Adeel Akram
View author publications
You can also search for this author in PubMed Google Scholar
Qixing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yongming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Chong Wang: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - Original Draft. Zhuozhi Cheng: Data Curation. Adeel Akram: Validation. Qixing Zhang: Funding Acquisition, Writing - Review & Editing. Yongming Zhang: Supervision.

Corresponding author

Correspondence to Qixing Zhang.

Ethics declarations

Conflict of interest

The authors declare that there are no known financial or Conflict of interest associated with this article.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, C., Cheng, Z., Akram, A. et al. Gogglesdet: goggles detection with clustering transformer and a new dataset. SIViP 19, 188 (2025). https://doi.org/10.1007/s11760-024-03778-x

Download citation

Received: 19 June 2023
Revised: 22 November 2024
Accepted: 10 December 2024
Published: 03 January 2025
DOI: https://doi.org/10.1007/s11760-024-03778-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gogglesdet: goggles detection with clustering transformer and a new dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Wearable Face Recognition System on Google Glass for Assisting Social Interactions

Glasses Detection Using Convolutional Neural Networks

Where’s Wally: A Gigapixel Image Study for Face Recognition in Crowds

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Gogglesdet: goggles detection with clustering transformer and a new dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Wearable Face Recognition System on Google Glass for Assisting Social Interactions

Glasses Detection Using Convolutional Neural Networks

Where’s Wally: A Gigapixel Image Study for Face Recognition in Crowds

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation