SCDet: decoupling discriminative representation for dark object detection via supervised contrastive learning

Lin, Tongxu; Huang, Guoheng; Yuan, Xiaochen; Zhong, Guo; Huang, Xiaocong; Pun, Chi-Man

doi:10.1007/s00371-023-03039-x

SCDet: decoupling discriminative representation for dark object detection via supervised contrastive learning

Original article
Published: 12 August 2023

Volume 40, pages 3357–3369, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Tongxu Lin¹,
Guoheng Huang ORCID: orcid.org/0000-0002-3640-3229¹,
Xiaochen Yuan²,
Guo Zhong³,
Xiaocong Huang¹ &
…
Chi-Man Pun⁴

451 Accesses
1 Citation
Explore all metrics

Abstract

Despite the significant progress made in object detection algorithms, their potential to operate effectively under the low-light environment remains to be fully explored. Recent methods realize dark object detection on the entire representation of dark images; however, they do not further consider the potential entanglement between dark disturbance and discriminative information in dark images, and thus, the learned representation may be sub-optimal. Towards this issue, we propose supervised contrastive detection (SCDet), a novel unified framework to learn the potential composition of dark images, and decouple the discriminative component for facilitating dark object detection. Specifically, we introduce the dense decoupling contrastive (DDC) pretext task to investigate the feature consistency based on a dark transformation, allowing the learned representation to be independent of the potential entanglement to realize decoupling. Moreover, to further drive the decoupled representation to be discriminative instead of a collapse solution for dark object detection, we incorporate the supervision detection task as an extra optimization objective, resulting in the joint optimization pattern. The two tasks are complementary to each other: the DDC task regularizes the detection to learn more decoupling-friendly representation, while the supervision detection task guides the discriminative representation decoupling. As a result, the SCDet achieves dark object detection by decoding the decoupled discriminative representation of dark images. Extensive experiments on four datasets demonstrate the effectiveness of our method in both synthetic and real-world scenarios. Code is available at https://github.com/TxLin7/SCDet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

PE-YOLO: Pyramid Enhancement Network for Dark Object Detection

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Boosting Object Detection in Foggy Scenes via Dark Channel Map and Union Training Strategy

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv preprint arXiv:1804.02767
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Zhang, Y., Zhang, J., Guo, X.: Kindling the darkness: a practical low-light image enhancer. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1632–1640 (2019)
Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., Cong, R.: Zero-reference deep curve estimation for low-light image enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1780–1789 (2020)
Lv, F., Lu, F., Wu, J., Lim, C.: MBLLEN: low-light image/video enhancement using CNNs. In: BMVC, vol. 220, p. 4 (2018)
Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light enhancement (2018). arXiv preprint arXiv:1808.04560
Kim, Y.-T.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans. Consum. Electron. 43(1), 1–8 (1997)
Article Google Scholar
Lee, C., Lee, C., Kim, C.-S.: Contrast enhancement based on layered difference representation of 2D histograms. IEEE Trans. Image Process. 22(12), 5372–5384 (2013)
Article Google Scholar
Kim, T., Jeong, M., Kim, S., Choi, S., Kim, C.: Diversify and match: a domain adaptive representation learning paradigm for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12456–12465 (2019)
Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3339–3348 (2018)
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5001–5009 (2018)
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Loh, Y.P., Chan, C.S.: Getting to know low-light images with the exclusively dark dataset. Comput. Vis. Image Underst. 178, 30–42 (2019)
Article Google Scholar
Morawski, I., Chen, Y.-A., Lin, Y.-S., Dangi, S., He, K., Hsu, W.H.: GENISP: neural ISP for low-light machine cognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 630–639 (2022)
Neumann, L., Karg, M., Zhang, S., Scharfenberger, C., Piegert, E., Mistr, S., Prokofyeva, O., Thiel, R., Vedaldi, A., Zisserman, A., : Nightowls: a pedestrians at night dataset. In: Asian Conference on Computer Vision, pp. 691–705. Springer (2018)
Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W.J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S.: Advancing image understanding in poor visibility environments: a collective benchmark study. IEEE Trans. Image Process. 29, 5737–5752 (2020)
Article Google Scholar
Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3291–3300 (2018)
Lore, K.G., Akintayo, A., Sarkar, S.: LLNet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recogn. 61, 650–662 (2017)
Article Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Liu, T., Chen, Z., Yang, Y., Wu, Z., Li, H.: Lane detection in low-light conditions using an efficient data enhancement: light conditions style transfer. In: 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 1394–1399. IEEE (2020)
Liu, S., Feng, C., Wang, X., Wang, H., Zhu, R., Li, Y., Lei, L.: Deep-flexisp: a three-stage framework for night photography rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1211–1220 (2022)
Punnappurath, A., Abuolaim, A., Abdelhamed, A., Levinshtein, A., Brown, M.S.: Day-to-night image synthesis for training nighttime neural isps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10769–10778 (2022)
Zhang, Y., Qin, H., Wang, X., Li, H.: Rethinking noise synthesis and modeling in raw denoising. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4593–4601 (2021)
Wei, K., Fu, Y., Zheng, Y., Yang, J.: Physics-based noise modeling for extreme low-light photography. IEEE Trans. Pattern Anal. Mach. Intell. 44, 8520–8537 (2021)
Google Scholar
Stark, J.A.: Adaptive image contrast enhancement using generalizations of histogram equalization. IEEE Trans. Image Process. 9(5), 889–896 (2000)
Article Google Scholar
Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J.B., Zuiderveld, K.: Adaptive histogram equalization and its variations. Comput. Vis. Graph. Process. 39(3), 355–368 (1987)
Article Google Scholar
Ibrahim, H., Kong, N.S.P.: Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 53(4), 1752–1758 (2007)
Article Google Scholar
Rahman, Z.-U., Jobson, D.J., Woodell, G.A.: Retinex processing for automatic image enhancement. J. Electron. Imaging 13(1), 100–110 (2004)
Article Google Scholar
Fu, X., Zeng, D., Huang, Y., Liao, Y., Ding, X., Paisley, J.: A fusion-based enhancing method for weakly illuminated images. Signal Process. 129, 82–96 (2016)
Article Google Scholar
Fu, X., Zeng, D., Huang, Y., Zhang, X.-P., Ding, X.: A weighted variational model for simultaneous reflectance and illumination estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2782–2790 (2016)
Guo, X., Li, Y., Ling, H.: Lime: low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 26(2), 982–993 (2016)
Article MathSciNet Google Scholar
Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., Wang, Z.: Enlightengan: deep light enhancement without paired supervision. IEEE Trans. Image Process. 30, 2340–2349 (2021)
Article Google Scholar
Yang, W., Wang, S., Fang, Y., Wang, Y., Liu, J.: From fidelity to perceptual quality: a semi-supervised approach for low-light image enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3063–3072 (2020)
Zhang, J., Wang, H., Wu, X., Zuo, W.: Invertible network for unpaired low-light image enhancement. Vis. Comput. 1–12 (2023)
Yu, N., Li, J., Hua, Z.: Fla-net: multi-stage modular network for low-light image enhancement. Vis. Comput. 39, 1251–1270 (2022)
Google Scholar
Liu, W., Ren, G., Yu, R., Guo, S., Zhu, J., Zhang, L.: Image-adaptive yolo for object detection in adverse weather conditions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1792–1800 (2022)
Liu, T., Chen, Z., Yang, Y., Wu, Z., Li, H.: Lane detection in low-light conditions using an efficient data enhancement: light conditions style transfer. In: 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 1394–1399. IEEE (2020)
Zhang, B., Chen, T., Wang, B., Wu, X., Zhang, L., Fan, J.: Densely semantic enhancement for domain adaptive region-free detectors. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1339–1352 (2021)
Article Google Scholar
Sasagawa, Y., Nagahara, H.: Yolo in the dark-domain adaptation method for merging multiple models. In: European Conference on Computer Vision, pp. 345–359. Springer (2020)
Hu, W., Wang, T., Wang, Y., Chen, Z., Huang, G.: LE–MSFE–DDNET: a defect detection network based on low-light enhancement and multi-scale feature extraction. Vis. Comput. 38(11), 3731–3745 (2022)
Article Google Scholar
Wang, S., Yang, J., Chen, D., Huang, J., Zhang, Y., Liu, W., Zheng, Z., Li, Y.: Litecortexnet: toward efficient object detection at night. Vis. Comput. 38(9–10), 3073–3085 (2022)
Article Google Scholar
Hong, Y., Wei, K., Chen, L., Fu, Y.: Crafting object detection in very low light. In: BMVC, vol. 1, p. 3 (2021)
Cui, Z., Qi, G.-J., Gu, L., You, S., Zhang, Z., Harada, T.: Multitask aet with orthogonal tangent regularity for dark object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2553–2562 (2021)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 33, 9912–9924 (2020)
Google Scholar
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020)
Google Scholar
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 33, 18661–18673 (2020)
Google Scholar
Zhang, L., Chen, X., Zhang, J., Dong, R., Ma, K.: Contrastive deep supervision (2022). arXiv preprint arXiv:2207.05306
Liu, S., Li, Z., Sun, J.: Self-emd: self-supervised object detection without imagenet (2020). arXiv preprint arXiv:2011.13677
Yang, C., Wu, Z., Zhou, B., Lin, S.: Instance localization for self-supervised detection pretraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3987–3996 (2021)
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3024–3033 (2021)
Xie, E., Ding, J., Wang, W., Zhan, X., Xu, H., Sun, P., Li, Z., Luo, P.: Detco: unsupervised contrastive learning for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8392–8401 (2021)
Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., Hu, H.: Propagate yourself: exploring pixel-level consistency for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16684–16693 (2021)
Schwartz, E., Giryes, R., Bronstein, A.M.: Deepisp: toward learning an end-to-end image processing pipeline. IEEE Trans. Image Process. 28(2), 912–923 (2018)
Article MathSciNet Google Scholar
Heide, F., Steinberger, M., Tsai, Y.-T., Rouf, M., Pająk, D., Reddy, D., Gallo, O., Liu, J., Heidrich, W., Egiazarian, K.: Flexisp: a flexible camera image processing framework. ACM Trans. Graph. (ToG) 33(6), 1–13 (2014)
Article Google Scholar
Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., Barron, J.T.: Unprocessing images for learned raw denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11036–11045 (2019)
Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data. IEEE Trans. Image Process. 17(10), 1737–1754 (2008)
Article MathSciNet Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Yi-de, M., Qing, L., Zhi-Bai, Q.: Automated image segmentation using improved PCNN model based on cross-entropy. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, pp. 743–746. IEEE (2004)
Huang, S.-C., Le, T.-H., Jaw, D.-W.: DSNet: joint semantic learning for object detection in inclement weather conditions. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2623–2633 (2020)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Key Areas Research and Development Program of Guangzhou Grant 2023B01J0029, Science and technology research in key areas in Foshan under Grant 2020001006832, the Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109007 and 2019B010153002, the Science and technology projects of Guangzhou under Grant 202007040006, the Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069, the Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515012534, and the National Statistical Science Research Project of China (No. 2022LY096).

Author information

Authors and Affiliations

Guangdong University of Technology, Guangzhou, China
Tongxu Lin, Guoheng Huang & Xiaocong Huang
Macao Polytechnic University, Macao, China
Xiaochen Yuan
Guangdong University of Foreign Studies, Guangzhou, China
Guo Zhong
University of Macau, Macao, China
Chi-Man Pun

Authors

Tongxu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Guoheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Guo Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaocong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Man Pun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoheng Huang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Human and animals rights

This article does not include any research conducted by the author on human participants.

Informed consent

Obtaining informed consent from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, T., Huang, G., Yuan, X. et al. SCDet: decoupling discriminative representation for dark object detection via supervised contrastive learning. Vis Comput 40, 3357–3369 (2024). https://doi.org/10.1007/s00371-023-03039-x

Download citation

Accepted: 29 June 2023
Published: 12 August 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00371-023-03039-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SCDet: decoupling discriminative representation for dark object detection via supervised contrastive learning

Abstract

Access this article

Similar content being viewed by others

PE-YOLO: Pyramid Enhancement Network for Dark Object Detection

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Boosting Object Detection in Foggy Scenes via Dark Channel Map and Union Training Strategy

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SCDet: decoupling discriminative representation for dark object detection via supervised contrastive learning

Abstract

Access this article

Similar content being viewed by others

PE-YOLO: Pyramid Enhancement Network for Dark Object Detection

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Boosting Object Detection in Foggy Scenes via Dark Channel Map and Union Training Strategy

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation