Elsevier

Ecological Informatics

Volume 72, December 2022, 101923
Ecological Informatics

YoloXT: A object detection algorithm for marine benthos

https://doi.org/10.1016/j.ecoinf.2022.101923Get rights and content

Highlights

  • We propose a new marine benthos object detection method YoloXT.

  • Deformable Coordinate Attention is proposed to better feature extraction.

  • Design a new Feature Pyramid S2win Transformer network to detect diverse targets.

  • Introduce the Optimal Anchor Assignment to redundant noise and imbalance samples.

Abstract

In recent years, the marine economy has developed rapidly, and human demand for marine resources has increased greatly. At present, target detection technology has a wide range of applications and prospects in seabed observation and ocean engineering. However, the accuracy and robustness of existing target detection methods are low due to the complex underwater environment, poor lighting, and poor quality of undersea images and videos. To solve these problems, this paper proposes YoloXT, a new quantitative identification method for marine benthos. YoloXT introduces the DECA (Deformable Coordinate Attention) module, which expands the spatial awareness in feature extraction and can learn image features more effectively. Meanwhile FPST-PAN (Feature Pyramid S2win Transformer, Improved Path Aggregation Network) was proposed to deal with the problem of marine benthic target diversity. It further integrates deep and shallow features through multi-scale skip-connection and Transformer and improves the model's ability to deal with complex and changeable marine environments. Finally, the positive and negative sample assignment strategy OAA (Optimal Anchor Assignment) applied to the detection head is proposed. It effectively avoids the problem of unbalanced distribution of positive and negative samples caused by traditional sample assignment methods and marine benthos image noise. Experiments on the IOC-URPC dataset show that the mAP of YoloXT is 3.9% higher than that of YoloX, reaching 70.9%. YoloXT has demonstrated excellent performance in quantitative identification task of marine organisms, which can effectively contribute to the exploitation and conservation of marine resources. The source code is publicly available at https://github.com/F1veZhang/YOLOXT.

Introduction

At present, human activities have caused excessive disturbance to the ocean, and the study of marine organisms is of great importance to the ecological conservation of the ocean. Imaging technologies for marine biodiversity have advanced rapidly in recent decades (Bicknell et al., 2016), and use of HOV and ROV has further improved the quality of underwater images and videos (Cutter et al., 2015). At the same time, with the support of artificial intelligence technology, underwater robots are combined with different types of imaging devices, providing more possibilities for the monitoring and research of marine life (Hai et al., 2020). However, due to the specificity and complexity of marine ecology, many images or videos cannot all be processed manually, and there is an urgent need for technologies that can automatically identify and mark underwater targets. The marine benthos target detection method YoloXT proposed in this paper is capable of locating, classifying and counting organisms in underwater images. YoloXT can be further used in the fields of species conservation, water environment monitoring, and fisheries resources research.

Early target detection methods include three steps of candidate region generation, feature extraction and classifier classification, such as DPM (Deformable Part Model) (Felzenszwalb et al., 2008) proposed in 2008. These methods use sliding windows to predict bounding boxes, but it is not effective and is very time-consuming. There are two main reasons for its poor effect. One is that the sliding window region selection strategy is not targeted, and the other is that the extracted features are less robust to target diversity. After 2010, deep learning began to be applied in the field of image processing, which effectively avoided the defects of traditional target detection methods. At present, deep learning target detection methods can be divided into anchor-based and anchor-free methods. Anchor-based target detection methods can be further divided into two-stage and one-stage methods. The two-stage method represented by RCNN (Girshick et al., 2014), Fast r-cnn (Girshick, 2015), Faster r-cnn (Ren et al., 2015) divides the target detection task into two parts: first generate the candidate frames, then refine and classification. The one-stage method represented by the Yolo series (Bochkovskiy et al., 2020; Redmon et al., 2016; Redmon and Farhadi, 2018) directly completes the tasks of target classification and position regression. Compared with the two-stage method, the one-stage method has a simpler network structure and faster detection speed. However, due to the class imbalance problem, its detection accuracy is often lower than that of the two-stage method. After 2018, anchor-free target detection methods began to develop rapidly. Cornernet (Law and Deng, 2018) regards the target detection task as a key point detection task and obtains the predicted frame by detecting two key points in the upper left and lower right corners of the target frame. CenterNet (Duan et al., 2019) directly regresses the size of the target frame and obtains the predicted frame according to the size of the target frame and the position of the center point. Compared with the anchor-based method, the anchor-free method saves the number of parameters, improves the training and prediction efficiency, and alleviates the imbalance between positive and negative samples.

A large research literature on underwater image content-based processing has been generated in recent years. Delphine Mallet (Mallet and Pelletier, 2014) comprehensively introduced underwater video technology used to study or monitor coastal biodiversity from 1952 to 2012. The applications, advantages and disadvantages of UVC, remote sensing, acoustics, experimental capture, and underwater video technologies are analyzed. Meng-Che Chuang (Chuang et al., 2016) proposed an underwater fish recognition framework for the task of live fish detection, which consists of a fully unsupervised feature learning technique and an error-resilient classifier. This framework works well when the fish species are not very different, but it is difficult to cope with the task of monitoring multiple marine life. Hongwei Qin (Qin et al., 2016) propose a framework for underwater live fish recognition in unconstrained natural environment. This framework is only based on simple cascaded deep networks for modeling, without designing data augmentation or model ensemble structure, there is still a lot of room for improvement. Jenni Raitoharju (Raitoharju et al., 2018) studied automatic identification methods for large benthic invertebrates and proposed a database for evaluating and testing automatic identification methods, providing a benchmark for the detection of invertebrates. Huang (Hai et al., 2019) applied Faster r-cnn to a high-performance autonomous underwater vehicle (AUV) to verify its effectiveness in marine fish detection and its adaptability to changes in marine environments with good water quality. Aiming at the challenges of unconstrained underwater ocean images (brightness difference, background confusion, etc.), Salman (Salman et al., 2019) applies a Gaussian mixture network to determine the background pixel distribution, and then effectively separates foreground fish from background objects for complex fish detection. Under the condition of good water quality and regular target shape, the above methods can achieve good fixed-point observation, but the effect of the real images are often not ideal. Aiming at the problem of turbid water quality and irregular crab shape in crab ponds, Shuo Cao (Cao et al., 2020) combined SSD and FPN to construct a real-time detection model. It can automatically estimate the biomass of live crabs in water and achieved good results in detecting single species of crabs. Jinde Zhu (Zhu et al., 2022) proposed a marine biometric identification method based on Yolov4, which is used for the identification of various marine organisms. However, this architecture is only suitable for the detection of marine organisms in aquaculture ponds and is not suitable for target recognition and localization of real seabed images. At present, most of the target detection of marine organisms is for a single species, and there are few studies on the identification of various marine benthos. The identification and quantitative analysis of marine benthos face the difficulties of complex environment, serious occlusion, small targets, and many deformations. Existing methods are difficult to meet the demand, and better solutions are needed.

Based on the above research, a new method YoloXT is proposed in this paper for the identification and quantitative analysis of a variety of marine benthos. Our work aims to improve the feature representation capability of the network to capture semantic information more accurately in images, fuse multi-scale features to obtain comprehensive contextual semantic information, and improve the problem of positive and negative sample imbalance caused by complex background interference problems in the undersea environment. YoloXT is a one-stage anchor-free algorithm, and our main contributions are as follows:

  • 1.

    Adding a new attention module DECA (Deformable Coordinate Attention), this module can expand the spatial perception range of feature extraction, effectively learn low-resolution feature maps, and improve detection accuracy.

  • 2.

    To deal with the diversity of targets, FPST-PAN (Feature Pyramid S2win Transformer, Improved Path Aggregation Network) is proposed to replace the FPN (Lin et al., 2017) and PAN (Liu et al., 2018) structures in YoloX (Ge et al., 2021a), which integrates shallow and deep features, so that the network can better detect targets of different scales.

  • 3.

    A new positive and negative sample allocation strategy OAA is proposed. By applying it to the detection head, redundant noise and imbalance of positive and negative samples can be effectively avoided.

Experiments are conducted on the dataset IOC-URPC, YoloXT achieves a predicted mAP of 70.9%. After verification, the method is suitable for target detection of marine benthos and has great application value.

Section snippets

Related work

Convolutional neural network is a multi-layer supervised learning neural network, and convolution and pooling layers are the core modules of feature extraction. Traditional CNN performs convolution processing on the image, then performs a pooling operation, and finally up-samples the smaller feature map to the original image size for prediction. However, some information is lost in the process of down-sampling and up-sampling of the images, which are major drawbacks of traditional convolution.

Method

Marine benthos is quite different, and the quality of underwater images is poor. False detection and missed detection are common problems of traditional marine benthos target detection methods. The standard convolutional neural network has strong inductive bias, fast convergence speed, and good generalization effect, but it is difficult to deal with complex underwater images. If the feature extraction ability of a convolutional network is improved only by increasing the depth of the layers, it

Dataset

Two datasets were used for the experiments, one is the 15,046 images of seamount organisms provided by the Institute of Oceanography (IOC) of the Chinese Academy of Sciences, and the other is the dataset of the Target Recognition Group of 2020 China Underwater Robot Professional Competition (URPC 2020). In this study, only the URPC 2020 dataset and associated experimental results are presented as the IOC dataset is not yet publicly available. URPC 2020 contains a lot of images of the seafloor,

Discussion

Due to the difficulty in obtaining underwater images, the cumbersome labeling process, and the lack of experimental data, the development of marine benthos detection technology is limited. There are few existing methods for identifying marine benthos, and most studies have not carried out quantitative detection and analysis. The YoloXT proposed in this study can make up for this deficiency. This study proposes a new channel attention DECA and incorporates it into backbone. By using DECA, YoloXT

Conclusion

Deep-sea target detection technology is one of the hotspots in the field of marine science, and the quantitative detection of benthos is of great significance to the study of marine ecology. This paper proposes a new quantitative detection method for marine benthos, called YoloXT. Aiming at the problems of dark light, low contrast and blurred images in the real underwater environment, an image enhancement method suitable for the real underwater environment is used to preprocess the image to

Author contributions

Jianyi Zhang: Conceptualization, Methodology, Design of experiment,Implementation of code, Writing - Original Draft (Focuses on drawing diagrams and tables). Yongpan Wang: Data collection, Design of experiment, Writing - Original Draft (Focus on textual descriptions). Xianchong Xu: Design of experiment, Implementation of code.

Declaration of Competing Interest

We declare that we have no financial and personal relations with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Acknowledgments

The research was supported by the Center for Ocean Mega-Science, Chinese Academy of Sciences (KEXUE2019GZ04) and GuangHe Fund of Dawning Information Industry Co., Ltd. (GHFUNd202107021586).

References (53)

  • M.C. Chuang et al.

    A feature learning and object recognition framework for underwater fish images

    IEEE Trans. Image Process.

    (2016)
  • G. Cutter et al.

    Automated detection of rockfish in unconstrained underwater videos using haar cascades and a new image dataset: labeled fishes in the wild

  • J. Dai et al.

    Deformable convolutional networks

  • A. Dosovitskiy et al.

    An image is worth 16x16 words: Transformers for image recognition at scale

    International Conference on Learning Representation(ICLR)

    (2021)
  • K. Duan et al.

    Centernet: Keypoint triplets for object detection

  • P. Felzenszwalb et al.

    A discriminatively trained, multiscale, deformable part model

  • Z. Ge et al.

    Yolox: Exceeding yolo series in 2021

    arXiv Preprint

    (2021)
  • Z. Ge et al.

    Ota: Optimal transport assignment for object detection

  • R. Girshick

    Fast r-cnn

  • R. Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

  • X. Glorot et al.

    Deep sparse rectifier neural networks

  • Priya Goyal et al.

    Accurate, large minibatch sgd: Training imagenet in 1 hour

    arXiv Preprint

    (2017)
  • H.A. Hai et al.

    Faster r-cnn for marine organisms detection and recognition using data augmentation

    Neurocomputing

    (2019)
  • H.A. Hai et al.

    A review on underwater autonomous environmental perception and target grasp, the challenge of robotic organism capture

    Ocean Eng.

    (2020)
  • K. He et al.

    Spatial pyramid pooling in deep convolutional networks for visual recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • Q. Hou et al.

    Coordinate attention for efficient mobile network design

  • Cited by (0)

    1

    Co-first author

    2

    Jianyi Zhang, Yongpan Wang, Xianchong Xu have contributed equally to this thesis.

    View full text