Abstract
Long-term monitoring and recognition of underwater organism objects are of great significance in marine ecology, fisheries science and many other disciplines. Traditional techniques in this field, including manual fishing-based ones and sonar-based ones, are usually flawed. Specifically, the method based on manual fishing is time-consuming and unsuitable for scientific researches, while the sonar-based one, has the defects of low acoustic image accuracy and large echo errors. In recent years, the rapid development of deep learning and its excellent performance in computer vision tasks make vision-based solutions feasible. However, the researches in this area are still relatively insufficient in mainly two aspects. First, to our knowledge, there is still a lack of large-scale datasets of underwater organism images with accurate annotations. Second, in consideration of the limitation on hardware resources of underwater devices, an underwater organism detection algorithm that is both accurate and lightweight enough to be able to infer in real time is still lacking. As an attempt to fill in the aforementioned research gaps to some extent, we established the Multiple Kinds of Underwater Organisms (MKUO) dataset with accurate bounding box annotations of taxonomic information, which consists of 10,043 annotated images, covering eighty-four underwater organism categories. Based on our benchmark dataset, we evaluated a series of existing object detection algorithms to obtain their accuracy and complexity indicators as the baseline for future reference. In addition, we also propose a novel lightweight module, namely Sparse Ghost Module, designed especially for object detection networks. By substituting the standard convolution with our proposed one, the network complexity can be significantly reduced and the inference speed can be greatly improved without obvious detection accuracy loss. To make our results reproducible, the dataset and the source code are available online at https://cslinzhang.github.io/MKUO-and-Sparse-Ghost-Module/.
- [1] . 2014. Local inter-session variability modelling for object classification. In IEEE Winter Conference on Applications of Computer Vision. 309–316.Google ScholarCross Ref
- [2] . 1992. Bias in hydroacoustic estimates of fish abundance due to acoustic shadowing: Evidence from day–night surveys of vertically migrating fish. Canadian Journal of Fisheries and Aquatic Sciences 49, 10 (1992), 2179–2189.Google ScholarCross Ref
- [3] . 2012. Automated annotation of coral reef survey images. In IEEE Conference on Computer Vision and Pattern Recognition. 1170–1177.Google ScholarCross Ref
- [4] . 2016. Improving automated annotation of benthic survey images using wide-band fluorescence. Scientific Reports 6, 1 (2016), 1–11.Google ScholarCross Ref
- [5] . 2012. Supporting ground-truth annotation of image datasets using clustering. In International Conference on Pattern Recognition. 1542–1545.Google Scholar
- [6] . 2020. A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone. Aquacultural Engineering 91 (2020), 102117:1–9.Google ScholarCross Ref
- [7] . 2022. Habitat mapping camera (HABCAM). https://habcam.whoi.edu/data-and-visualization/Google Scholar
- [8] . 2017. Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1800–1807.Google ScholarCross Ref
- [9] . 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: Labeled Fishes in the Wild. In IEEE Winter Applications and Computer Vision Workshops. 57–62.Google Scholar
- [10] . 2013. Automatic scallop detection in benthic environments. In IEEE Workshop on Applications of Computer Vision. 160–167.Google Scholar
- [11] . 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarCross Ref
- [12] . 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.Google ScholarDigital Library
- [13] . 2021. TOOD: Task-aligned one-stage object detection. In IEEE International Conference on Computer Vision. 3490–3499.Google ScholarCross Ref
- [14] . 2022. Tasmania Coral Point Count. http://marine.acfr.usyd.edu.au/datasets/Google Scholar
- [15] . 2020. GhostNet: More features from cheap operations. In IEEE Conference on Computer Vision and Pattern Recognition. 1577–1586.Google ScholarCross Ref
- [16] . 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [17] . 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- [18] . 2014. LifeCLEF 2014: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 229–249.Google ScholarCross Ref
- [19] . 2015. LifeCLEF 2015: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 462–483.Google ScholarDigital Library
- [20] . 2015. Croatian fish dataset: Fine-grained classification of fish species in their natural habitat. In British Machine Vision Conference Workshops. 6.1–6.7.Google ScholarCross Ref
- [21] . 2020. Probabilistic anchor assignment with IoU prediction for object detection. In European Conference on Computer Vision. 355–371.Google ScholarDigital Library
- [22] . 2020. FoveaBox: Beyond anchor-based object detection. IEEE Transactions on Image Processing 29 (2020), 7389–7398.Google ScholarDigital Library
- [23] . 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.Google ScholarDigital Library
- [24] . 2021. MicroNet: Improving image recognition with extremely low FLOPs. In IEEE International Conference on Computer Vision. 458–467.Google ScholarCross Ref
- [25] . 2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems. 11711–11722.Google Scholar
- [26] . 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318–327.Google ScholarCross Ref
- [27] . 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740–755.Google ScholarCross Ref
- [28] . 1753. Species Plantarum: Exhibentes Plantas Rite Cognitas, Ad Genera Relatas, Cum Differentiis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, Secundum Systema Sexuale Digestas. Vol. 1. Holmiae, Impensis Laurentii Salvii. 572 pages.Google Scholar
- [29] . 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. 21–37.Google ScholarCross Ref
- [30] . 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In International Conference on Learning Representations. 1–13.Google Scholar
- [31] . 2021. NanoDet-Plus. https://github.com/RangiLyu/nanodet/releases/tag/v1.0.0-alpha-1/Google Scholar
- [32] . 2016. Automatic annotation of coral reefs using deep learning. In OCEANS 2016 MTS/IEEE Monterey. 1–5.Google Scholar
- [33] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun. 2021. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv: 2107.08430 (2021).Google Scholar
- [34] . 1995. Mapping the shape, size, and density of fish schools by echo integration and a high-resolution sonar. ICES Journal of Marine Science 52, 1 (1995), 11–20.Google ScholarCross Ref
- [35] . 2021. How much of the ocean have we explored? https://oceanservice.noaa.gov/facts/exploration.htmlGoogle Scholar
- [36] . 2020. GPT-3: Language Models are Few-Shot Learners. https://github.com/openai/gpt-3/Google Scholar
- [37] . 2021. Exploring the potential to use low cost imaging and an open source convolutional neural network detector to support stock assessment of the king scallop (Pecten maximus). Ecological Informatics 62 (2021), 101233:1–10.Google ScholarCross Ref
- [38] . 2019. Detection of marine animals in a new underwater dataset with varying visibility. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 18–26.Google Scholar
- [39] . 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- [40] . 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.Google ScholarDigital Library
- [41] . 2016. Fish species classification in unconstrained underwater environments based on deep learning. Limnology and Oceanography: Methods 14, 9 (2016), 570–585.Google ScholarCross Ref
- [42] . 2017. Automatic fish species classification in underwater videos: Exploiting pre-trained deep neural network models to compensate for limited labelled data. ICES Journal of Marine Science 75, 1 (2017), 374–389.Google ScholarCross Ref
- [43] . 2021. Automatic coral reef annotation, localization and pixel-wise parsing using mask R-CNN. In Working Notes of CLEF. 1359–1364.Google Scholar
- [44] . 2021. Sparse R-CNN: End-to-end object detection with learnable proposals. In IEEE Conference on Computer Vision and Pattern Recognition. 14449–14458.Google ScholarCross Ref
- [45] . 2021. YOLOv5. https://github.com/ultralytics/yolov5/Google Scholar
- [46] . 2018. The iNaturalist species classification and detection dataset. In IEEE Conference on Computer Vision and Pattern Recognition. 8769–8778.Google ScholarCross Ref
- [47] . 2016. Coral reef fish detection and recognition in underwater videos by supervised machine learning: Comparison between deep learning and HOG+SVM methods. In Advanced Concepts for Intelligent Vision Systems. 160–171.Google ScholarCross Ref
- [48] . 2020. CSPNet: A new backbone that can enhance learning capability of CNN. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1571–1580.Google ScholarCross Ref
- [49] . 2020. Side-aware boundary localization for more precise object detection. In European Conference on Computer Vision. 403–419.Google ScholarDigital Library
- [50] . 2020. Rethinking classification and localization for object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 10183–10192.Google ScholarCross Ref
- [51] . 2022. A comparison of deep learning approach for underwater object detection. Journal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 6, 2 (2022), 252–258.Google ScholarCross Ref
- [52] . 2020. Dynamic R-CNN: Towards high quality object detection via dynamic training. In European Conference on Computer Vision. 260–275.Google ScholarDigital Library
- [53] . 2023. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In International Conference on Learning Representations. 1–19.Google Scholar
- [54] . 2021. VarifocalNet: An IoU-aware dense object detector. In IEEE Conference on Computer Vision and Pattern Recognition. 8510–8519.Google ScholarCross Ref
- [55] . 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE Conference on Computer Vision and Pattern Recognition. 9756–9765.Google ScholarCross Ref
- [56] . 2023. Dense distinct query for end-to-end object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 7329–7338.Google ScholarCross Ref
- [57] . 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.Google ScholarCross Ref
- [58] . 2018. WildFish: A large benchmark for fish recognition in the wild. In ACM International Conference on Multimedia. 1301–1309.Google ScholarDigital Library
- [59] . 2021. WildFish++: A comprehensive fish benchmark for multimedia research. IEEE Transactions on Multimedia 23 (2021), 3603–3617.Google ScholarDigital Library
- [60] . 2009. Estimating fish abundance from acoustic surveys: Calculating variance due to acoustic backscatter and length distribution error. Canadian Journal of Fisheries and Aquatic Sciences 66, 12 (2009), 2081–2095.Google ScholarCross Ref
Index Terms
- An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks
Recommendations
Proposal-Refined Weakly Supervised Object Detection in Underwater Images
Image and GraphicsAbstractRecently, Convolutional Neural Networks (CNNs) have achieved great success in object detection due to their outstanding abilities of learning powerful features on large-scale training datasets. One of the critical factors of their success is the ...
Underwater object detection: architectures and algorithms – a comprehensive review
AbstractUnderwater object detection is an essential step in image processing and it plays a vital role in several applications such as the repair and maintenance of sub-aquatic structures and marine sciences. Many computer vision-based solutions have been ...
Detection and tracking of underwater object based on forward-scan sonar
ICIRA'12: Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part IUnderwater object detection is critical in a lot of applications in maintenance, repair of undersea structures, marine sciences, and homeland security. However, because optics camera is subject to the influence of light and turbidity, its visibility is ...
Comments