Skip to main content
Log in

SSoB: searching a scene-oriented architecture for underwater object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Underwater object detection (UOD) suffers from low detection accuracy because of environmental degradations, such as haze-like effects, color distortions, and imaging noises. Therefore, we commit to resolving the issue of object detection with compounded environmental degradations that greatly challenges existing deep learning-based detectors. We propose a neural architecture search -based deep learning network to realize the UOD task, which can automatically discover the scene-oriented feature representation. Our network is accomplished through a unified macro-detector and a novel mixed anti-aliasing block (MAaB)-based search space. The macro-detector targets to learn intrinsic feature representations automatically from underwater images containing various environmental degradations and complete the subsequent detection tasks. The novel MAaB-based search space is proposed toward complex underwater scenes. The candidate operator MAaB has multiple kernels and anti-aliased convolutions in a single block for boosting the contextual representation capacity and the robustness of degraded factors. Finally, we use the differential search strategy guides the whole learning process to obtain the scene-friendly results. Extensive experiments demonstrate that our method outperforms the state-of-the-art detectors by a large margin. More importantly, in cases where environmental degradation is severely disturbed, our method is also superior to other popular detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.urpc.org.cn/index.html.

  2. https://github.com/open-mmlab/mmdetection.

References

  1. Pang, Y., Wu, C., Wu, H., Yu, X.: Over-sampling strategy-based class-imbalanced salient object detection and its application in underwater scene. Vis, Comput (2022)

    Google Scholar 

  2. Mhala, N.C., Pais, A.R.: A secure visual secret sharing (vss) scheme with cnn-based image enhancement for underwater images. Vis. Comput. 37, 2097 (2021)

    Article  Google Scholar 

  3. Liang, P., Dong, P., Wang, F., Ma, P., Bai, J., Wang, B., Li, C.: Learning to remove sandstorm for image enhancement. Vis, Comput (2022)

    Google Scholar 

  4. Lin, R., Liu, J., Liu, R., Fan, X.: Global structure-guided learning framework for underwater image enhancement. Vis, Comput (2021)

    Google Scholar 

  5. Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: a novel composite backbone network architecture for object detection. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI. pp. 11 653–11 (2020)

  6. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. (2018), pp. 4510–4520

  7. Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: Fbnet: hardware-aware efficient convnet design via differentiable neural architecture search. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 10 734–10 (2019)

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 770–778 (2016)

  10. Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 5987–5995. (2017)

  11. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021. In: CoRR. vol. abs/2107.08430, (2021). [Online]. Available: https://arxiv.org/abs/2107.08430

  12. Fan, B., Chen, W., Cong, Y., Tian, J.: Dual refinement underwater object detection network. In: Computer Vision - ECCV - 16th European Conference, Glasgow, UK, August 23–28,: Proceedings. Part XX 12365(2020), 275–291 (2020)

  13. Lin, W., Zhong, J., Liu, S., Li, T.H., Li, G.: ROIMIX: proposal-fusion among multiple images for underwater object detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. pp. 2588–2592 (2020)

  14. Jiang, L., Wang, Y., Jia, Q., Xu, S., Liu, Y., Fan, X., Li, H., Liu, R., Xue, X., Wang, R.: Underwater species detection using channel sharpening attention. In: ACM Multimedia Conference, pp. 4259–4267 (2021)

  15. Liu, C., Wang, Z., Wang, S., Tang, T., Tao, Y., Yang, C., Li, H., Liu, X., Fan, X.: A new dataset, poisson gan and aquanet for underwater object grabbing. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2831–2844 (2022)

    Article  Google Scholar 

  16. Chen, Y., Yang, T., Zhang, X., Meng, G., Xiao, X., Sun, J.: Detnas: backbone search for object detection. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS. pp. 6638–6648. (2019)

  17. Ghiasi, G., Lin, T., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 7036–7045. (2019)

  18. Wang, N., Gao, Y., Chen, H., Wang, P., Tian, Z. Shen, C., Zhang, Y.: NAS-FCOS: fast neural architecture search for object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. pp. 11 940–11. (2020)

  19. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR. (2019)

  20. Guo, J., Han, K., Wang, Y., Zhang, C., Yang, Z., Wu, H., Chen, X., Xu, C.: Hit-detector: Hierarchical trinity architecture search for object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. pp. 11 402. (2020)

  21. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 779–788 (2016)

  22. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision - ECCV 2016–14th European Conference, Amsterdam, The Netherlands, October 11–14,: Proceedings. Part I 9905(2016), pp. 21–37. (2016)

  23. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)

    Article  Google Scholar 

  24. Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 840–849. (2019)

  25. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS. pp. 147–155. (2019)

  26. Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)

    Article  MATH  Google Scholar 

  27. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, NeurIPS

  28. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 36–944. (2017)

  29. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)

    Article  Google Scholar 

  30. Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 7363–7372. (2019)

  31. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 6154–6162. (2018)

  32. Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 2965–2974. (2019)

  33. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision - ECCV - 16th European Conference, Glasgow, UK, August 23–28. Proceedings, Part I(12346), 213–229 (2020)

  34. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows In: IEEE/CVF International Conference on Computer Vision, ICCV. pp. 9992–10 002. (2021)

  35. Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: IEEE/CVF International Conference on Computer Vision, ICCV. pp. 548–558 (2021)

  36. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 580–587 (2014)

  37. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: 2nd International Conference on Learning Representations, ICLR. (2014)

  38. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR

  39. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR. (2015)

  40. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR. vol. abs/1804.02767, (2018). [Online]. Available: http://arxiv.org/abs/1804.02767

  41. Liu, R., Ma, L., Zhang, J., Fan, X., Luo, Z.: Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 10 561–10 570. (2021)

  42. Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G., Tian, Q., Xiong, H.: PC-DARTS: partial channel connections for memory-efficient architecture search. In: 8th International Conference on Learning Representations, ICLR. (2020)

  43. Ma, L., Jin, D., Liu, R., Fan, X., Luo, Z.: Joint over and under exposures correction by aggregated retinex propagation for image enhancement. IEEE Signal Process. Lett. 27, 1210–1214 (2020)

    Article  Google Scholar 

  44. Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Efficient architecture search by network transformation. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI. S. A. McIlraith and K. Q. Weinberger, Eds., pp. 2787–2794. (2018)

  45. Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L., Fei-Fei, L., Yuille, A.L., Huang, J., Murphy, K., “Progressive neural architecture search,” in Computer Vision - ECCV 2018–15th European Conference, Munich, Germany, September 8–14,: Proceedings. Part I 11205(2018), 19–35 (2018)

  46. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 8697–8710. (2018)

  47. Real, E. Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI. pp. 4780–4789. (2019)

  48. Yang, Z., Wang, Y., Chen, X., Shi, B., Xu, C., Xu, C., Tian, Q., Xu, C.: CARS: continuous evolution for efficient neural architecture search. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. pp. 1826–1835.(2020)

  49. Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J.: Single path one-shot neural architecture search with uniform sampling. In: Computer Vision - ECCV - 16th European Conference, Glasgow, UK, August 23–28,: Proceedings. Part XVI 12361(2020), 544–560 (2020)

  50. Xue, C., Yan, J., Yan, R., Chu, S.M., Hu, Y., Lin, Y.: Transferable automl by model sharing over grouped datasets. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 9002–9011. (2019)

  51. Du, X., Lin, T., Jin, P., Ghiasi, G., Tan, M., Cui, Y., Le, Q.V., Song, X.: Sinenet: Learning scale-permuted backbone for recognition and localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. pp. 11 589–11 598. (2020)

  52. Xu, H., Yao, L., Li, Z., Liang, X., Zhang, W.: Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: IEEE/CVF International Conference on Computer Vision, ICCV. pp. 6648–6657. (2019)

  53. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  54. Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. In: British Machine Vision Conference, BMVC. p. 285. (2018)

  55. Zhang, R.: Making convolutional networks shift-invariant again. In: Proceedings of the 36th International Conference on Machine Learning, ICML. vol. 97, pp. 7324–7334. (2019)

  56. Tan, M., Le, Q.V.: Mixconv: mixed depthwise convolutional kernels. In: 30th British Machine Vision Conference, BMVC. p. 74. (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Fan.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, ”SSoB: Searching a Scene-Oriented Architecture for Underwater Object Detection.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, W., Fu, C., Liu, R. et al. SSoB: searching a scene-oriented architecture for underwater object detection. Vis Comput 39, 5199–5208 (2023). https://doi.org/10.1007/s00371-022-02654-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02654-4

Keywords

Navigation