research-article

An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

Authors:
Jiafeng Huang

School of Software Engineering, Tongji University, China

School of Software Engineering, Tongji University, China

0009-0002-0451-1628
View Profile

,
Tianjun Zhang

School of Software Engineering, Tongji University, China

School of Software Engineering, Tongji University, China

0000-0003-2059-0709
View Profile

,
Shengjie Zhao

School of Software Engineering, Tongji University, China

School of Software Engineering, Tongji University, China

0000-0002-4301-394X
View Profile

,
Lin Zhang

School of Software Engineering, Tongji University, China

School of Software Engineering, Tongji University, China

0000-0002-4360-5523
View Profile

,
Yicong Zhou

Department of Computer and Information Science, University of Macau, China

Department of Computer and Information Science, University of Macau, China

0000-0002-4487-6384
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20 Issue 5Article No.: 147pp 1–23https://doi.org/10.1145/3640465

Published:07 February 2024Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Long-term monitoring and recognition of underwater organism objects are of great significance in marine ecology, fisheries science and many other disciplines. Traditional techniques in this field, including manual fishing-based ones and sonar-based ones, are usually flawed. Specifically, the method based on manual fishing is time-consuming and unsuitable for scientific researches, while the sonar-based one, has the defects of low acoustic image accuracy and large echo errors. In recent years, the rapid development of deep learning and its excellent performance in computer vision tasks make vision-based solutions feasible. However, the researches in this area are still relatively insufficient in mainly two aspects. First, to our knowledge, there is still a lack of large-scale datasets of underwater organism images with accurate annotations. Second, in consideration of the limitation on hardware resources of underwater devices, an underwater organism detection algorithm that is both accurate and lightweight enough to be able to infer in real time is still lacking. As an attempt to fill in the aforementioned research gaps to some extent, we established the Multiple Kinds of Underwater Organisms (MKUO) dataset with accurate bounding box annotations of taxonomic information, which consists of 10,043 annotated images, covering eighty-four underwater organism categories. Based on our benchmark dataset, we evaluated a series of existing object detection algorithms to obtain their accuracy and complexity indicators as the baseline for future reference. In addition, we also propose a novel lightweight module, namely Sparse Ghost Module, designed especially for object detection networks. By substituting the standard convolution with our proposed one, the network complexity can be significantly reduced and the inference speed can be greatly improved without obvious detection accuracy loss. To make our results reproducible, the dataset and the source code are available online at https://cslinzhang.github.io/MKUO-and-Sparse-Ghost-Module/.

REFERENCES

[1] Anantharajah K., Ge Z., McCool C., Denman S., Fookes C., Corke P., Tjondronegoro D., and Sridharan S.. 2014. Local inter-session variability modelling for object classification. In IEEE Winter Conference on Applications of Computer Vision. 309–316.Google ScholarCross Ref
[2] Appenzeller A. R. and Leggett W. C.. 1992. Bias in hydroacoustic estimates of fish abundance due to acoustic shadowing: Evidence from day–night surveys of vertically migrating fish. Canadian Journal of Fisheries and Aquatic Sciences 49, 10 (1992), 2179–2189.Google ScholarCross Ref
[3] Beijbom O., Edmunds P. J., Kline D. I., Mitchell B. G., and Kriegman D.. 2012. Automated annotation of coral reef survey images. In IEEE Conference on Computer Vision and Pattern Recognition. 1170–1177.Google ScholarCross Ref
[4] Beijbom O., Treibitz T., Kline D. I., Eyal G., Khen A., Neal B., Loya Y., Mitchell B. G., and Kriegman D.. 2016. Improving automated annotation of benthic survey images using wide-band fluorescence. Scientific Reports 6, 1 (2016), 1–11.Google ScholarCross Ref
[5] Boom B. J., Huang P. X., He J., and Fisher R. B.. 2012. Supporting ground-truth annotation of image datasets using clustering. In International Conference on Pattern Recognition. 1542–1545.Google Scholar
[6] Cai K., Miao X., Wang W., Pang H., Liu Y., and Song J.. 2020. A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone. Aquacultural Engineering 91 (2020), 102117:1–9.Google ScholarCross Ref
[7] Center Northeast Fisheries Science. 2022. Habitat mapping camera (HABCAM). https://habcam.whoi.edu/data-and-visualization/Google Scholar
[8] Chollet F.. 2017. Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1800–1807.Google ScholarCross Ref
[9] Cutter G., Stierhoff K., and Zeng J.. 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: Labeled Fishes in the Wild. In IEEE Winter Applications and Computer Vision Workshops. 57–62.Google Scholar
[10] Dawkins M., Stewart C., Gallager S., and York A.. 2013. Automatic scallop detection in benthic environments. In IEEE Workshop on Applications of Computer Vision. 160–167.Google Scholar
[11] Deng J., Dong W., Socher R., Li L., Li K., and Li F.. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarCross Ref
[12] Everingham M., Gool L. Van, Williams C. K. I., Winn J., and Zisserman A.. 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.Google ScholarDigital Library
[13] Feng C., Zhong Y., Gao Y., Scott M. R., and Huang W.. 2021. TOOD: Task-aligned one-stage object detection. In IEEE International Conference on Computer Vision. 3490–3499.Google ScholarCross Ref
[14] Robotics Australian Centre for Field. 2022. Tasmania Coral Point Count. http://marine.acfr.usyd.edu.au/datasets/Google Scholar
[15] Han K., Wang Y., Tian Q., Guo J., Xu C., and Xu C.. 2020. GhostNet: More features from cheap operations. In IEEE Conference on Computer Vision and Pattern Recognition. 1577–1586.Google ScholarCross Ref
[16] He K., Zhang X., Ren S., and Sun J.. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[17] Howard A. G., Zhu M., Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., and Adam H.. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
[18] Joly A., Goëau H., Glotin H., Spampinato C., Bonnet P., Vellinga W., Planqué R., Rauber A., Palazzo S., Fisher B., and Müller H.. 2014. LifeCLEF 2014: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 229–249.Google ScholarCross Ref
[19] Joly A., Goëau H., Glotin H., Spampinato C., Bonnet P., Vellinga W., Planqué R., Rauber A., Palazzo S., Fisher B., and Müller H.. 2015. LifeCLEF 2015: Multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages. 462–483.Google ScholarDigital Library
[20] Jäger J., Simon M., Denzler J., Wolff V., Fricke-Neuderth K., and Kruschel C.. 2015. Croatian fish dataset: Fine-grained classification of fish species in their natural habitat. In British Machine Vision Conference Workshops. 6.1–6.7.Google ScholarCross Ref
[21] Kim K. and Lee H. S.. 2020. Probabilistic anchor assignment with IoU prediction for object detection. In European Conference on Computer Vision. 355–371.Google ScholarDigital Library
[22] Kong T., Sun F., Liu H., Jiang Y., Li L., and Shi J.. 2020. FoveaBox: Beyond anchor-based object detection. IEEE Transactions on Image Processing 29 (2020), 7389–7398.Google ScholarDigital Library
[23] Krizhevsky A., Sutskever I., and Hinton G. E.. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.Google ScholarDigital Library
[24] Li Y., Chen Y., Dai X., Chen D., Liu M., Yuan L., Liu Z., Zhang L., and Vasconcelos N.. 2021. MicroNet: Improving image recognition with extremely low FLOPs. In IEEE International Conference on Computer Vision. 458–467.Google ScholarCross Ref
[25] Lin J., Chen W., Lin Y., Cohn J., Gan C., and Han S.. 2020. MCUNet: Tiny deep learning on IoT devices. In Advances in Neural Information Processing Systems. 11711–11722.Google Scholar
[26] Lin T., Goyal P., Girshick R., He K., and Dollár P.. 2020. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318–327.Google ScholarCross Ref
[27] Lin T., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., and Zitnick C. L.. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740–755.Google ScholarCross Ref
[28] Linnaeus C. V.. 1753. Species Plantarum: Exhibentes Plantas Rite Cognitas, Ad Genera Relatas, Cum Differentiis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, Secundum Systema Sexuale Digestas. Vol. 1. Holmiae, Impensis Laurentii Salvii. 572 pages.Google Scholar
[29] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A. C.. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. 21–37.Google ScholarCross Ref
[30] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A. C.. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In International Conference on Learning Representations. 1–13.Google Scholar
[31] Lyu R.. 2021. NanoDet-Plus. https://github.com/RangiLyu/nanodet/releases/tag/v1.0.0-alpha-1/Google Scholar
[32] Mahmood A., Bennamoun M., An S., Sohel F., Boussaid F., Hovey R., Kendrick G., and Fisher R. B.. 2016. Automatic annotation of coral reefs using deep learning. In OCEANS 2016 MTS/IEEE Monterey. 1–5.Google Scholar
[33] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun. 2021. YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv: 2107.08430 (2021).Google Scholar
[34] Misund O. A., Aglen A., and Frønæs E.. 1995. Mapping the shape, size, and density of fish schools by echo integration and a high-resolution sonar. ICES Journal of Marine Science 52, 1 (1995), 11–20.Google ScholarCross Ref
[35] Oceanic National and Administration Atmospheric. 2021. How much of the ocean have we explored? https://oceanservice.noaa.gov/facts/exploration.htmlGoogle Scholar
[36] OpenAI. 2020. GPT-3: Language Models are Few-Shot Learners. https://github.com/openai/gpt-3/Google Scholar
[37] Ovchinnikova K., James M. A., Mendo T., Dawkins M., Crall J., and Boswarva K.. 2021. Exploring the potential to use low cost imaging and an open source convolutional neural network detector to support stock assessment of the king scallop (Pecten maximus). Ecological Informatics 62 (2021), 101233:1–10.Google ScholarCross Ref
[38] Pedersen M., Haurum J. Bruslund, Gade R., and Moeslund T. B.. 2019. Detection of marine animals in a new underwater dataset with varying visibility. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 18–26.Google Scholar
[39] Redmon J. and Farhadi A.. 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
[40] Ren S., He K., Girshick R., and Sun J.. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.Google ScholarDigital Library
[41] Salman A., Jalal A., Shafait F., Mian A., Shortis M., Seager J., and Harvey E.. 2016. Fish species classification in unconstrained underwater environments based on deep learning. Limnology and Oceanography: Methods 14, 9 (2016), 570–585.Google ScholarCross Ref
[42] Siddiqui S. A., Salman A., Malik M. I., Shafait F., Mian A., Shortis M. R., and Harvey E. S.. 2017. Automatic fish species classification in underwater videos: Exploiting pre-trained deep neural network models to compensate for limited labelled data. ICES Journal of Marine Science 75, 1 (2017), 374–389.Google ScholarCross Ref
[43] Soukup L.. 2021. Automatic coral reef annotation, localization and pixel-wise parsing using mask R-CNN. In Working Notes of CLEF. 1359–1364.Google Scholar
[44] Sun P., Zhang R., Jiang Y., Kong T., Xu C., Zhan W., Tomizuka M., Li L., Yuan Z., Wang C., and Luo P.. 2021. Sparse R-CNN: End-to-end object detection with learnable proposals. In IEEE Conference on Computer Vision and Pattern Recognition. 14449–14458.Google ScholarCross Ref
[45] Ultralytics. 2021. YOLOv5. https://github.com/ultralytics/yolov5/Google Scholar
[46] Horn G. Van, Aodha O. Mac, Song Y., Cui Y., Sun C., Shepard A., Adam H., Perona P., and Belongie S.. 2018. The iNaturalist species classification and detection dataset. In IEEE Conference on Computer Vision and Pattern Recognition. 8769–8778.Google ScholarCross Ref
[47] Villon S., Chaumont M., Subsol G., Villeger S., Claverie T., and Mouillot D.. 2016. Coral reef fish detection and recognition in underwater videos by supervised machine learning: Comparison between deep learning and HOG+SVM methods. In Advanced Concepts for Intelligent Vision Systems. 160–171.Google ScholarCross Ref
[48] Wang C., Liao H. M., Wu Y., Chen P., Hsieh J., and Yeh I.. 2020. CSPNet: A new backbone that can enhance learning capability of CNN. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1571–1580.Google ScholarCross Ref
[49] Wang J., Zhang W., Cao Y., Chen K., Pang J., Gong T., Shi J., Loy C. C., and Lin D.. 2020. Side-aware boundary localization for more precise object detection. In European Conference on Computer Vision. 403–419.Google ScholarDigital Library
[50] Wu Y., Chen Y., Yuan L., Liu Z., Wang L., Li H., and Fu Y.. 2020. Rethinking classification and localization for object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 10183–10192.Google ScholarCross Ref
[51] Wulandari N., Ardiyanto I., and Nugroho H. A.. 2022. A comparison of deep learning approach for underwater object detection. Journal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 6, 2 (2022), 252–258.Google ScholarCross Ref
[52] Zhang H., Chang H., Ma B., Wang N., and Chen X.. 2020. Dynamic R-CNN: Towards high quality object detection via dynamic training. In European Conference on Computer Vision. 260–275.Google ScholarDigital Library
[53] Zhang H., Li F., Liu S., Zhang L., Su H., Zhu J., Ni L. M., and Shum H.. 2023. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In International Conference on Learning Representations. 1–19.Google Scholar
[54] Zhang H., Wang Y., Dayoub F., and Sünderhauf N.. 2021. VarifocalNet: An IoU-aware dense object detector. In IEEE Conference on Computer Vision and Pattern Recognition. 8510–8519.Google ScholarCross Ref
[55] Zhang S., Chi C., Yao Y., Lei Z., and Li S. Z.. 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE Conference on Computer Vision and Pattern Recognition. 9756–9765.Google ScholarCross Ref
[56] Zhang S., Wang X., Wang J., Pang J., Lyu C., Zhang W., Luo P., and Chen K.. 2023. Dense distinct query for end-to-end object detection. In IEEE Conference on Computer Vision and Pattern Recognition. 7329–7338.Google ScholarCross Ref
[57] Zhang X., Zhou X., Lin M., and Sun J.. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.Google ScholarCross Ref
[58] Zhuang P., Wang Y., and Qiao Y.. 2018. WildFish: A large benchmark for fish recognition in the wild. In ACM International Conference on Multimedia. 1301–1309.Google ScholarDigital Library
[59] Zhuang P., Wang Y., and Qiao Y.. 2021. WildFish++: A comprehensive fish benchmark for multimedia research. IEEE Transactions on Multimedia 23 (2021), 3603–3617.Google ScholarDigital Library
[60] Zwolinski J., Fernandes P. G., Marques V., and Stratoudakis Y.. 2009. Estimating fish abundance from acoustic surveys: Calculating variance due to acoustic backscatter and length distribution error. Canadian Journal of Fisheries and Aquatic Sciences 66, 12 (2009), 2081–2095.Google ScholarCross Ref

Index Terms

An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Vision for robotics

Recommendations

Proposal-Refined Weakly Supervised Object Detection in Underwater Images
Image and Graphics
Abstract
Recently, Convolutional Neural Networks (CNNs) have achieved great success in object detection due to their outstanding abilities of learning powerful features on large-scale training datasets. One of the critical factors of their success is the ...
Read More
Underwater object detection: architectures and algorithms – a comprehensive review
Abstract
Underwater object detection is an essential step in image processing and it plays a vital role in several applications such as the repair and maintenance of sub-aquatic structures and marine sciences. Many computer vision-based solutions have been ...
Read More
Detection and tracking of underwater object based on forward-scan sonar
ICIRA'12: Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part I

Underwater object detection is critical in a lot of applications in maintenance, repair of undersea structures, marine sciences, and homeland security. However, because optics camera is subject to the influence of light and turbidity, its visibility is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 5
May 2024
650 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3613634
Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 February 2024
- Online AM: 11 January 2024
- Accepted: 8 January 2024
- Revised: 19 November 2023
- Received: 23 February 2023
Published in tomm Volume 20, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Benchmark dataset
object detection
lightweight module.
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 195
  Total Downloads
- Downloads (Last 12 months)195
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Proposal-Refined Weakly Supervised Object Detection in Underwater Images

Underwater object detection: architectures and algorithms – a comprehensive review

Detection and tracking of underwater object based on forward-scan sonar

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

An Underwater Organism Image Dataset and a Lightweight Module Designed for Object Detection Networks

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Proposal-Refined Weakly Supervised Object Detection in Underwater Images

Underwater object detection: architectures and algorithms – a comprehensive review

Detection and tracking of underwater object based on forward-scan sonar

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media