skip to main content
research-article

An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

Published:07 February 2024Publication History
Skip Abstract Section

Abstract

Long-term monitoring and recognition of underwater organism objects are of great significance in marine ecology, fisheries science and many other disciplines. Traditional techniques in this field, including manual fishing-based ones and sonar-based ones, are usually flawed. Specifically, the method based on manual fishing is time-consuming and unsuitable for scientific researches, while the sonar-based one, has the defects of low acoustic image accuracy and large echo errors. In recent years, the rapid development of deep learning and its excellent performance in computer vision tasks make vision-based solutions feasible. However, the researches in this area are still relatively insufficient in mainly two aspects. First, to our knowledge, there is still a lack of large-scale datasets of underwater organism images with accurate annotations. Second, in consideration of the limitation on hardware resources of underwater devices, an underwater organism detection algorithm that is both accurate and lightweight enough to be able to infer in real time is still lacking. As an attempt to fill in the aforementioned research gaps to some extent, we established the Multiple Kinds of Underwater Organisms (MKUO) dataset with accurate bounding box annotations of taxonomic information, which consists of 10,043 annotated images, covering eighty-four underwater organism categories. Based on our benchmark dataset, we evaluated a series of existing object detection algorithms to obtain their accuracy and complexity indicators as the baseline for future reference. In addition, we also propose a novel lightweight module, namely Sparse Ghost Module, designed especially for object detection networks. By substituting the standard convolution with our proposed one, the network complexity can be significantly reduced and the inference speed can be greatly improved without obvious detection accuracy loss. To make our results reproducible, the dataset and the source code are available online at https://cslinzhang.github.io/MKUO-and-Sparse-Ghost-Module/.

REFERENCES

  1. [1] Anantharajah K., Ge Z., McCool C., Denman S., Fookes C., Corke P., Tjondronegoro D., and Sridharan S.. 2014. Local inter-session variability modelling for object classification. In IEEE Winter Conference on Applications of Computer Vision. 309316.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Appenzeller A. R. and Leggett W. C.. 1992. Bias in hydroacoustic estimates of fish abundance due to acoustic shadowing: Evidence from day–night surveys of vertically migrating fish. Canadian Journal of Fisheries and Aquatic Sciences 49, 10 (1992), 21792189.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Beijbom O., Edmunds P. J., Kline D. I., Mitchell B. G., and Kriegman D.. 2012. Automated annotation of coral reef survey images. In IEEE Conference on Computer Vision and Pattern Recognition. 11701177.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Beijbom O., Treibitz T., Kline D. I., Eyal G., Khen A., Neal B., Loya Y., Mitchell B. G., and Kriegman D.. 2016. Improving automated annotation of benthic survey images using wide-band fluorescence. Scientific Reports 6, 1 (2016), 111.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Boom B. J., Huang P. X., He J., and Fisher R. B.. 2012. Supporting ground-truth annotation of image datasets using clustering. In International Conference on Pattern Recognition. 15421545.Google ScholarGoogle Scholar
  6. [6] Cai K., Miao X., Wang W., Pang H., Liu Y., and Song J.. 2020. A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone. Aquacultural Engineering 91 (2020), 102117:1–9.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Center Northeast Fisheries Science. 2022. Habitat mapping camera (HABCAM). https://habcam.whoi.edu/data-and-visualization/Google ScholarGoogle Scholar
  8. [8] Chollet F.. 2017. Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 18001807.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Cutter G., Stierhoff K., and Zeng J.. 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: Labeled Fishes in the Wild. In IEEE Winter Applications and Computer Vision Workshops. 5762.Google ScholarGoogle Scholar
  10. [10] Dawkins M., Stewart C., Gallager S., and York A.. 2013. Automatic scallop detection in benthic environments. In IEEE Workshop on Applications of Computer Vision. 160167.Google ScholarGoogle Scholar
  11. [11] Deng J., Dong W., Socher R., Li L., Li K., and Li F.. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248255.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Everingham M., Gool L. Van, Williams C. K. I., Winn J., and Zisserman A.. 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Feng C., Zhong Y., Gao Y., Scott M. R., and Huang W.. 2021. TOOD: Task-aligned one-stage object detection. In IEEE International Conference on Computer Vision. 34903499.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Robotics Australian Centre for Field. 2022. Tasmania Coral Point Count. http://marine.acfr.usyd.edu.au/datasets/Google ScholarGoogle Scholar
  15. [15] Han K., Wang Y., Tian Q., Guo J., Xu C., and Xu C.. 2020. GhostNet: More features from cheap operations. In IEEE Conference on Computer Vision and Pattern Recognition. 15771586.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] He K., Zhang X., Ren S., and Sun J.. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Howard A. G., Zhu M., Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., and Adam H.. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  18. [18] Joly A., Goëau H., Glotin H., Spampinato C., Bonnet P., Vellinga W., Planqué R., Rauber A., Palazzo S., Fisher B., and Müller H.. 2014. LifeCLEF 2014: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 229249.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Joly A., Goëau H., Glotin H., Spampinato C., Bonnet P., Vellinga W., Planqué R., Rauber A., Palazzo S., Fisher B., and Müller H.. 2015. LifeCLEF 2015: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 462483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Jäger J., Simon M., Denzler J., Wolff V., Fricke-Neuderth K., and Kruschel C.. 2015. Croatian fish dataset: Fine-grained classification of fish species in their natural habitat. In British Machine Vision Conference Workshops. 6.1–6.7.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kim K. and Lee H. S.. 2020. Probabilistic anchor assignment with IoU prediction for object detection. In European Conference on Computer Vision. 355371.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Kong T., Sun F., Liu H., Jiang Y., Li L., and Shi J.. 2020. FoveaBox: Beyond anchor-based object detection. IEEE Transactions on Image Processing 29 (2020), 73897398.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Krizhevsky A., Sutskever I., and Hinton G. E.. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 8490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Li Y., Chen Y., Dai X., Chen D., Liu M., Yuan L., Liu Z., Zhang L., and Vasconcelos N.. 2021. MicroNet: Improving image recognition with extremely low FLOPs. In IEEE International Conference on Computer Vision. 458467.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Lin J., Chen W., Lin Y., Cohn J., Gan C., and Han S.. 2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems. 1171111722.Google ScholarGoogle Scholar
  26. [26] Lin T., Goyal P., Girshick R., He K., and Dollár P.. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318327.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Lin T., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., and Zitnick C. L.. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740755.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Linnaeus C. V.. 1753. Species Plantarum: Exhibentes Plantas Rite Cognitas, Ad Genera Relatas, Cum Differentiis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, Secundum Systema Sexuale Digestas. Vol. 1. Holmiae, Impensis Laurentii Salvii. 572 pages.Google ScholarGoogle Scholar
  29. [29] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A. C.. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. 2137.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A. C.. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In International Conference on Learning Representations. 113.Google ScholarGoogle Scholar
  31. [31] Lyu R.. 2021. NanoDet-Plus. https://github.com/RangiLyu/nanodet/releases/tag/v1.0.0-alpha-1/Google ScholarGoogle Scholar
  32. [32] Mahmood A., Bennamoun M., An S., Sohel F., Boussaid F., Hovey R., Kendrick G., and Fisher R. B.. 2016. Automatic annotation of coral reefs using deep learning. In OCEANS 2016 MTS/IEEE Monterey. 15.Google ScholarGoogle Scholar
  33. [33] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun. 2021. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv: 2107.08430 (2021).Google ScholarGoogle Scholar
  34. [34] Misund O. A., Aglen A., and Frønæs E.. 1995. Mapping the shape, size, and density of fish schools by echo integration and a high-resolution sonar. ICES Journal of Marine Science 52, 1 (1995), 1120.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Oceanic National and Administration Atmospheric. 2021. How much of the ocean have we explored? https://oceanservice.noaa.gov/facts/exploration.htmlGoogle ScholarGoogle Scholar
  36. [36] OpenAI. 2020. GPT-3: Language Models are Few-Shot Learners. https://github.com/openai/gpt-3/Google ScholarGoogle Scholar
  37. [37] Ovchinnikova K., James M. A., Mendo T., Dawkins M., Crall J., and Boswarva K.. 2021. Exploring the potential to use low cost imaging and an open source convolutional neural network detector to support stock assessment of the king scallop (Pecten maximus). Ecological Informatics 62 (2021), 101233:1–10.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Pedersen M., Haurum J. Bruslund, Gade R., and Moeslund T. B.. 2019. Detection of marine animals in a new underwater dataset with varying visibility. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1826.Google ScholarGoogle Scholar
  39. [39] Redmon J. and Farhadi A.. 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  40. [40] Ren S., He K., Girshick R., and Sun J.. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 11371149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Salman A., Jalal A., Shafait F., Mian A., Shortis M., Seager J., and Harvey E.. 2016. Fish species classification in unconstrained underwater environments based on deep learning. Limnology and Oceanography: Methods 14, 9 (2016), 570585.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Siddiqui S. A., Salman A., Malik M. I., Shafait F., Mian A., Shortis M. R., and Harvey E. S.. 2017. Automatic fish species classification in underwater videos: Exploiting pre-trained deep neural network models to compensate for limited labelled data. ICES Journal of Marine Science 75, 1 (2017), 374389.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Soukup L.. 2021. Automatic coral reef annotation, localization and pixel-wise parsing using mask R-CNN. In Working Notes of CLEF. 13591364.Google ScholarGoogle Scholar
  44. [44] Sun P., Zhang R., Jiang Y., Kong T., Xu C., Zhan W., Tomizuka M., Li L., Yuan Z., Wang C., and Luo P.. 2021. Sparse R-CNN: End-to-end object detection with learnable proposals. In IEEE Conference on Computer Vision and Pattern Recognition. 1444914458.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Ultralytics. 2021. YOLOv5. https://github.com/ultralytics/yolov5/Google ScholarGoogle Scholar
  46. [46] Horn G. Van, Aodha O. Mac, Song Y., Cui Y., Sun C., Shepard A., Adam H., Perona P., and Belongie S.. 2018. The iNaturalist species classification and detection dataset. In IEEE Conference on Computer Vision and Pattern Recognition. 87698778.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Villon S., Chaumont M., Subsol G., Villeger S., Claverie T., and Mouillot D.. 2016. Coral reef fish detection and recognition in underwater videos by supervised machine learning: Comparison between deep learning and HOG+SVM methods. In Advanced Concepts for Intelligent Vision Systems. 160171.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Wang C., Liao H. M., Wu Y., Chen P., Hsieh J., and Yeh I.. 2020. CSPNet: A new backbone that can enhance learning capability of CNN. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 15711580.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Wang J., Zhang W., Cao Y., Chen K., Pang J., Gong T., Shi J., Loy C. C., and Lin D.. 2020. Side-aware boundary localization for more precise object detection. In European Conference on Computer Vision. 403419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wu Y., Chen Y., Yuan L., Liu Z., Wang L., Li H., and Fu Y.. 2020. Rethinking classification and localization for object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 1018310192.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wulandari N., Ardiyanto I., and Nugroho H. A.. 2022. A comparison of deep learning approach for underwater object detection. Journal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 6, 2 (2022), 252258.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhang H., Chang H., Ma B., Wang N., and Chen X.. 2020. Dynamic R-CNN: Towards high quality object detection via dynamic training. In European Conference on Computer Vision. 260275.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhang H., Li F., Liu S., Zhang L., Su H., Zhu J., Ni L. M., and Shum H.. 2023. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In International Conference on Learning Representations. 119.Google ScholarGoogle Scholar
  54. [54] Zhang H., Wang Y., Dayoub F., and Sünderhauf N.. 2021. VarifocalNet: An IoU-aware dense object detector. In IEEE Conference on Computer Vision and Pattern Recognition. 85108519.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Zhang S., Chi C., Yao Y., Lei Z., and Li S. Z.. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE Conference on Computer Vision and Pattern Recognition. 97569765.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zhang S., Wang X., Wang J., Pang J., Lyu C., Zhang W., Luo P., and Chen K.. 2023. Dense distinct query for end-to-end object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 73297338.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Zhang X., Zhou X., Lin M., and Sun J.. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition. 68486856.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zhuang P., Wang Y., and Qiao Y.. 2018. WildFish: A large benchmark for fish recognition in the wild. In ACM International Conference on Multimedia. 13011309.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Zhuang P., Wang Y., and Qiao Y.. 2021. WildFish++: A comprehensive fish benchmark for multimedia research. IEEE Transactions on Multimedia 23 (2021), 36033617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Zwolinski J., Fernandes P. G., Marques V., and Stratoudakis Y.. 2009. Estimating fish abundance from acoustic surveys: Calculating variance due to acoustic backscatter and length distribution error. Canadian Journal of Fisheries and Aquatic Sciences 66, 12 (2009), 20812095.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 5
      May 2024
      650 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3613634
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 February 2024
      • Online AM: 11 January 2024
      • Accepted: 8 January 2024
      • Revised: 19 November 2023
      • Received: 23 February 2023
      Published in tomm Volume 20, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)195
      • Downloads (Last 6 weeks)58

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text