YoloXT: A object detection algorithm for marine benthos
Introduction
At present, human activities have caused excessive disturbance to the ocean, and the study of marine organisms is of great importance to the ecological conservation of the ocean. Imaging technologies for marine biodiversity have advanced rapidly in recent decades (Bicknell et al., 2016), and use of HOV and ROV has further improved the quality of underwater images and videos (Cutter et al., 2015). At the same time, with the support of artificial intelligence technology, underwater robots are combined with different types of imaging devices, providing more possibilities for the monitoring and research of marine life (Hai et al., 2020). However, due to the specificity and complexity of marine ecology, many images or videos cannot all be processed manually, and there is an urgent need for technologies that can automatically identify and mark underwater targets. The marine benthos target detection method YoloXT proposed in this paper is capable of locating, classifying and counting organisms in underwater images. YoloXT can be further used in the fields of species conservation, water environment monitoring, and fisheries resources research.
Early target detection methods include three steps of candidate region generation, feature extraction and classifier classification, such as DPM (Deformable Part Model) (Felzenszwalb et al., 2008) proposed in 2008. These methods use sliding windows to predict bounding boxes, but it is not effective and is very time-consuming. There are two main reasons for its poor effect. One is that the sliding window region selection strategy is not targeted, and the other is that the extracted features are less robust to target diversity. After 2010, deep learning began to be applied in the field of image processing, which effectively avoided the defects of traditional target detection methods. At present, deep learning target detection methods can be divided into anchor-based and anchor-free methods. Anchor-based target detection methods can be further divided into two-stage and one-stage methods. The two-stage method represented by RCNN (Girshick et al., 2014), Fast r-cnn (Girshick, 2015), Faster r-cnn (Ren et al., 2015) divides the target detection task into two parts: first generate the candidate frames, then refine and classification. The one-stage method represented by the Yolo series (Bochkovskiy et al., 2020; Redmon et al., 2016; Redmon and Farhadi, 2018) directly completes the tasks of target classification and position regression. Compared with the two-stage method, the one-stage method has a simpler network structure and faster detection speed. However, due to the class imbalance problem, its detection accuracy is often lower than that of the two-stage method. After 2018, anchor-free target detection methods began to develop rapidly. Cornernet (Law and Deng, 2018) regards the target detection task as a key point detection task and obtains the predicted frame by detecting two key points in the upper left and lower right corners of the target frame. CenterNet (Duan et al., 2019) directly regresses the size of the target frame and obtains the predicted frame according to the size of the target frame and the position of the center point. Compared with the anchor-based method, the anchor-free method saves the number of parameters, improves the training and prediction efficiency, and alleviates the imbalance between positive and negative samples.
A large research literature on underwater image content-based processing has been generated in recent years. Delphine Mallet (Mallet and Pelletier, 2014) comprehensively introduced underwater video technology used to study or monitor coastal biodiversity from 1952 to 2012. The applications, advantages and disadvantages of UVC, remote sensing, acoustics, experimental capture, and underwater video technologies are analyzed. Meng-Che Chuang (Chuang et al., 2016) proposed an underwater fish recognition framework for the task of live fish detection, which consists of a fully unsupervised feature learning technique and an error-resilient classifier. This framework works well when the fish species are not very different, but it is difficult to cope with the task of monitoring multiple marine life. Hongwei Qin (Qin et al., 2016) propose a framework for underwater live fish recognition in unconstrained natural environment. This framework is only based on simple cascaded deep networks for modeling, without designing data augmentation or model ensemble structure, there is still a lot of room for improvement. Jenni Raitoharju (Raitoharju et al., 2018) studied automatic identification methods for large benthic invertebrates and proposed a database for evaluating and testing automatic identification methods, providing a benchmark for the detection of invertebrates. Huang (Hai et al., 2019) applied Faster r-cnn to a high-performance autonomous underwater vehicle (AUV) to verify its effectiveness in marine fish detection and its adaptability to changes in marine environments with good water quality. Aiming at the challenges of unconstrained underwater ocean images (brightness difference, background confusion, etc.), Salman (Salman et al., 2019) applies a Gaussian mixture network to determine the background pixel distribution, and then effectively separates foreground fish from background objects for complex fish detection. Under the condition of good water quality and regular target shape, the above methods can achieve good fixed-point observation, but the effect of the real images are often not ideal. Aiming at the problem of turbid water quality and irregular crab shape in crab ponds, Shuo Cao (Cao et al., 2020) combined SSD and FPN to construct a real-time detection model. It can automatically estimate the biomass of live crabs in water and achieved good results in detecting single species of crabs. Jinde Zhu (Zhu et al., 2022) proposed a marine biometric identification method based on Yolov4, which is used for the identification of various marine organisms. However, this architecture is only suitable for the detection of marine organisms in aquaculture ponds and is not suitable for target recognition and localization of real seabed images. At present, most of the target detection of marine organisms is for a single species, and there are few studies on the identification of various marine benthos. The identification and quantitative analysis of marine benthos face the difficulties of complex environment, serious occlusion, small targets, and many deformations. Existing methods are difficult to meet the demand, and better solutions are needed.
Based on the above research, a new method YoloXT is proposed in this paper for the identification and quantitative analysis of a variety of marine benthos. Our work aims to improve the feature representation capability of the network to capture semantic information more accurately in images, fuse multi-scale features to obtain comprehensive contextual semantic information, and improve the problem of positive and negative sample imbalance caused by complex background interference problems in the undersea environment. YoloXT is a one-stage anchor-free algorithm, and our main contributions are as follows:
- 1.
Adding a new attention module DECA (Deformable Coordinate Attention), this module can expand the spatial perception range of feature extraction, effectively learn low-resolution feature maps, and improve detection accuracy.
- 2.
To deal with the diversity of targets, FPST-PAN (Feature Pyramid S2win Transformer, Improved Path Aggregation Network) is proposed to replace the FPN (Lin et al., 2017) and PAN (Liu et al., 2018) structures in YoloX (Ge et al., 2021a), which integrates shallow and deep features, so that the network can better detect targets of different scales.
- 3.
A new positive and negative sample allocation strategy OAA is proposed. By applying it to the detection head, redundant noise and imbalance of positive and negative samples can be effectively avoided.
Experiments are conducted on the dataset IOC-URPC, YoloXT achieves a predicted mAP of 70.9%. After verification, the method is suitable for target detection of marine benthos and has great application value.
Section snippets
Related work
Convolutional neural network is a multi-layer supervised learning neural network, and convolution and pooling layers are the core modules of feature extraction. Traditional CNN performs convolution processing on the image, then performs a pooling operation, and finally up-samples the smaller feature map to the original image size for prediction. However, some information is lost in the process of down-sampling and up-sampling of the images, which are major drawbacks of traditional convolution.
Method
Marine benthos is quite different, and the quality of underwater images is poor. False detection and missed detection are common problems of traditional marine benthos target detection methods. The standard convolutional neural network has strong inductive bias, fast convergence speed, and good generalization effect, but it is difficult to deal with complex underwater images. If the feature extraction ability of a convolutional network is improved only by increasing the depth of the layers, it
Dataset
Two datasets were used for the experiments, one is the 15,046 images of seamount organisms provided by the Institute of Oceanography (IOC) of the Chinese Academy of Sciences, and the other is the dataset of the Target Recognition Group of 2020 China Underwater Robot Professional Competition (URPC 2020). In this study, only the URPC 2020 dataset and associated experimental results are presented as the IOC dataset is not yet publicly available. URPC 2020 contains a lot of images of the seafloor,
Discussion
Due to the difficulty in obtaining underwater images, the cumbersome labeling process, and the lack of experimental data, the development of marine benthos detection technology is limited. There are few existing methods for identifying marine benthos, and most studies have not carried out quantitative detection and analysis. The YoloXT proposed in this study can make up for this deficiency. This study proposes a new channel attention DECA and incorporates it into backbone. By using DECA, YoloXT
Conclusion
Deep-sea target detection technology is one of the hotspots in the field of marine science, and the quantitative detection of benthos is of great significance to the study of marine ecology. This paper proposes a new quantitative detection method for marine benthos, called YoloXT. Aiming at the problems of dark light, low contrast and blurred images in the real underwater environment, an image enhancement method suitable for the real underwater environment is used to preprocess the image to
Author contributions
Jianyi Zhang: Conceptualization, Methodology, Design of experiment,Implementation of code, Writing - Original Draft (Focuses on drawing diagrams and tables). Yongpan Wang: Data collection, Design of experiment, Writing - Original Draft (Focus on textual descriptions). Xianchong Xu: Design of experiment, Implementation of code.
Declaration of Competing Interest
We declare that we have no financial and personal relations with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Acknowledgments
The research was supported by the Center for Ocean Mega-Science, Chinese Academy of Sciences (KEXUE2019GZ04) and GuangHe Fund of Dawning Information Industry Co., Ltd. (GHFUNd202107021586).
References (53)
- et al.
Real-time robust detector for underwater live crabs based on deep learning
Comput. Electron. Agric.
(2020) - et al.
A case study of utilizing YOLOT based quantitative detection algorithm for marine benthos
Ecol. Inform
(2022) - et al.
Underwater video techniques for observing coastal marine biodiversity: a review of sixty years of publications (1952–2012)
Fish. Res.
(2014) - et al.
Deepfish: accurate underwater live fish recognition with a deep architecture
Neurocomputing
(2016) - et al.
Benchmark database for fine-grained image classification of benthic macroinvertebrates
Image Vis. Comput.
(2018) - et al.
Real-time fish detection in complex backgrounds using probabilistic background modelling
Ecol. Inform.
(2019) - et al.
Focal and efficient iou loss for accurate bounding box regression
Neurocomputing
(2022) - et al.
Asymmetric loss for multi-label classification
Proceedings of the IEEE/CVF International Conference on Computer Vision
(2021) - et al.
Camera technology for monitoring marine biodiversity and human impact
Front. Ecol. Environ.
(2016) - et al.
Yolov4: optimal speed and accuracy of object detection
arXiv Preprint
(2020)