Abstract
The underwater image obtained is difficult to satisfy human visual perception because of the particle scattering and water absorption phenomena when visible light propagates underwater. In underwater images, light absorption easily leads to image distortion and reduction of image contrast and brightness. Therefore, this work aims to improve the quality of underwater image processing, reduce the distortion rate of underwater images, and further improve the efficiency of underwater image extraction, processing, and tracking. This work combines intelligent blockchain technology in emerging multimedia industries with existing image processing technology to improve the target detection capability of image processing algorithms. Firstly, the theory of visual saliency analysis (VSA) is studied. The steps of image processing using VSA are analyzed. Based on the original Itti model, the visual significance detection step is optimized. Then, the theoretical basis and operation steps of particle swarm optimization (PSO) algorithm in intelligent blockchain technology are studied. VSA theory is combined with PSO to design underwater image processing algorithms and target detection optimization algorithms for underwater images. The experimental results show that: (1) the method has a higher F value and lower Mean Absolute Error. (2) Compared with the original image, the restored image entropy through this method is greatly improved, and the information in the image increases. Therefore, this method has good performance. Besides, this method performs well in image definition, color, and brightness. The quality of the restored image through this method is better than that of other algorithms. (3) Compared with similar algorithms, the relative errors of this method are reduced by 2.56%, 3.24% and 3.89%, respectively. The results show that the method has high accuracy. The research results can provide a reference for future underwater image processing and target detection research. In addition, the designed underwater image processing and target detection and tracking algorithms can improve the detection efficiency and accuracy of underwater targets and help to accurately obtain underwater target images.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The ocean is the origin of life, a crucial space for human survival, and a precious treasure for sustainable development. With the progress of society, the understanding of the strategic position and value of the ocean is deepening. The mineral resources in the ocean mainly include seabed oil, combustible ice, and natural gas [1]. Among these resources, oil and natural gas production has reached one-fourth of global production. Also, the ocean is rich in nickel, cobalt, and other mineral resources, with much higher content than land [2]. However, humans have only exploited about 5% of the ocean. Hence, it is of great significance to further explore and exploit the ocean. Nowadays, faced with the pressure of the continuous shortage of land resources, human beings focus more on the development and rational application of marine resources. Due to the marine environment's particularity, the ocean's exploration and development are often beyond human capacity. In order to replace people to complete underwater operations, various underwater robots come into being [3]. In order to improve people's understanding of underwater images and expand the exploration of the marine field, the underwater target detection and tracking algorithm is studied and optimized based on the existing research to help improve the quality and efficiency of underwater target detection.
Underwater robot technology has gradually become a hotspot and trend of research. Underwater robots can replace humans in underwater operations, such as marine fishery exploration, seawater sampling, and resource exploration. It can also break through the limitations of human underwater operations and can operate at great depths and in harsh environments for long periods. For example, China's manned submersible "Striver" has carried out deep-sea operations many times, and the maximum diving depth is more than 10,000 m [4]. With the deeper exploration of the ocean, the role of underwater robots will become more important. Underwater robots have a great place to use. At present, the most widely used unmanned underwater vehicles are divided into cabled remote-control vehicles and uncabled autonomous underwater vehicles. A cable-operated submersible is a device controlled from the surface with operating tools such as thrusters. The surface powers underwater televisions and underwater manipulators. On the other hand, a cable-free Autonomous Underwater Vehicle (AUV) has its own energy and certain intelligence and can automatically complete navigation planning, obstacle avoidance, and operation implementation according to the underwater environment and operation tasks in water [5]. The environment sensing layer of AUV is the basis of intelligent underwater vehicles. Therefore, underwater target detection based on AUV has important research significance. Accurate identification of underwater targets is a difficulty in underwater target detection by AUV, which is also an important research content in computer vision [6]. Particle swarm optimization (PSO) algorithm mainly includes optical, acoustic and magnetic detection technologies. A single sensor is affected by various factors. These factors will reduce the system's reliability and identification accuracy, leading to misjudgment and identification of underwater targets. Multi-sensor information fusion can effectively improve the system's robustness, expand the observation range, enhance the data credibility, and improve the system recognition ability [7]. The underwater target detection system based on the PSO algorithm is based on the image. It adopts image preprocessing, segmentation, morphology processing, feature extraction and other technologies to complete target recognition and realize machine vision. However, the detection objects in the underwater scene usually differ from the background, such as obstacles on the route, companions in the cooperative underwater robot, and targets in the underwater fishing operation [8]. Unlike general feature extraction methods, the target detection method based on visual saliency analysis (VSA) compares the target with the image background. Hence, it is more suitable for underwater scenes with relatively simple backgrounds [9]. Jian et al. [10] surveyed and summarized the existing relatively mature and representative underwater image processing models, and divided the underwater image processing models into seven categories: enhancement, defogging, noise reduction, segmentation, significant target detection, color constancy and restoration. Then, they objectively evaluated the current and future development trends of underwater image processing [10]. Qian et al. [11] converted the original low illumination image from the color space of red, green and blue to the color space of hue saturation intensity. Then, the image's overall brightness was adaptively improved by using the bilateral gamma correction function and the cuckoo search algorithm. In addition, a brightness-preserving double histogram construction based on a visual saliency algorithm was proposed to perform brightness conservation and contrast analysis for low-illumination color images. Finally, the processed color space was converted to obtain the enhanced image [11]. Kannan [12] tried to identify objects in underwater images using an adaptive Gaussian mixture model. The Gaussian mixture model performs accurate object segmentation with a predefined number of clusters. The initialization of parameter set by optimization techniques such as genetic algorithm, PSO and differential evolution was analyzed. Differential evolution is famous for its accurate decision-making in fewer iterations, and has been proven to be more suitable for initializing the number of clusters of the Gaussian mixture model. It is further used for object recognition and applies the internal distance shape-matching technology [12]. To sum up, previous studies have shown that in the field of underwater target detection and tracking, more scholars use visual saliency theory alone or smart blockchain technology or PSO technology alone, and fewer researchers use the two technologies together for research. Besides, using visual saliency alone for analysis cannot apply to more complex images, and there are large restrictions on use. Therefore, this work combines the two methods and theories. On the basis of ensuring the quality of underwater image extraction, the theory and application scope of research methods are expanded.
Underwater image target detection and tracking technology is an important part of underwater light vision technology. Image segmentation has always been a classic problem in image processing, especially for underwater images. The tracking and recognition of the target image is the key step of the whole target recognition system. According to the specific content of underwater image target detection and tracking, this work takes the underwater image target fusion detection and tracking method in the navigation process as the starting point. Based on visual saliency theory and smart blockchain technology, image target detection algorithm and autonomous tracking are studied in combination with PSO. The research results expand the applicable field of the underwater target detection algorithm, improve the quality and efficiency of underwater target detection, and provide further technical support and suggestions for the autonomous underwater detection of the follow-up robot. Meanwhile, it provides technical support for subsequent marine exploration and marine resource collection.
2 Experimental methods and procedures
2.1 Visual saliency analysis
VSA simulates the human visual attention mechanism, which ignores irrelevant areas in the image and focuses attention on the object of interest. Visual saliency detection highlights the salient object in the visual scene and obtains the binarized mask image of the object after segmentation [13]. Visual saliency detection is shown in Fig. 1:
Underwater scenes are usually simple; the background is mostly water or seabed sand. In the underwater target detection work, the target is generally different from the background. Therefore, the VSA method can be used to detect objects in simple underwater scene images [14].
According to the information processing mechanism, there are two kinds of visual saliency detection. One is the bottom-up approach, which is driven by data. Objects that strongly contrast an area in the visual field and the surrounding area will be noticed [15]. This approach has no task guidance and is usually guided by underlying visual features such as color, intensity, orientation, and texture. The other is a top-down approach, which is mission-driven. This is related to preset targets, such as expected information, color, and object features [16].
The classic Itti model belongs to the bottom-up way, and the extracted features are color, brightness, and orientation. This work analyzes the visual saliency detection process based on the Itti model [17]. Itti visual saliency model is a visual attention model designed based on the visual nervous system of early primates. The model first uses the Gaussian sampling method to construct the Gaussian pyramid of image color, brightness and direction. Then, it uses the Gaussian pyramid to calculate the brightness feature map, color feature map and direction feature map of the image. Finally, the brightness, color and direction saliency maps can be obtained by combining the feature maps of different scales, and the final visual saliency map can be obtained by adding. The specific structure of the Itti model is shown in Fig. 2:
In Fig. 2, the model extracts primary visual features: color, brightness, and orientation of an input image. It uses central-periphery operations at multiple scales to produce feature maps that embody saliency measures. After these feature maps are combined, the most salient spatial position in the image is obtained using the competition mechanism of the winner taking all in biology. As such, the position of attention is selected. Finally, the return inhibition method completes the focus shift [18]. The model mainly includes feature extraction, saliency map generation and attention focus shift. (1) Feature extraction: the extracted image of the primary visual features is obtained from the external input image information in the feature extraction module. The primary visual features mainly include color, motion, orientation, and brightness. (2) Saliency map generation: images in natural environments contain many redundancies, and the brain can effectively remove these redundancies and focus on useful information. When the visual attention calculation model completes the image processing, it also needs to effectively eliminate the redundant information in the natural image to further prepare for the extraction of salient information. The information contained in a still image can be divided into two types: salient information and background information. Significant information can be obtained by removing the redundant information in the image. After obtaining the feature saliency maps of brightness, color and direction, features of these saliency maps are fused to obtain the final comprehensive saliency map. (3) Attention focus shift: after the final saliency map is generated, each target displayed in the saliency map attracts attention focus through the competition mechanism. The competition of each target in the saliency map is realized by the Winner-Take-All (WTA) competition mechanism. The winner detected in the WTA mechanism is the focus of attention with higher saliency. Since the target to be noticed is always the most significant among all the targets participating in the competition and will always win in the competition, if there is no specific control mechanism, the focus will always point to the same target, while other targets will not get the chance to be noticed, and the focus of attention cannot be shifted. Attention and transfer of focus can be achieved through the detection mechanism of prohibiting return, the principle of transferring nearby, and the determination of the size of the attention area [19].
Firstly, feature extraction is carried out. A nine-layer Gaussian pyramid is constructed, and the size of the Gaussian filter is 5 × 5. The original image is used as the 0th layer for layer-by-layer filtering to obtain the next layer. Then features are extracted for each layer [20]. The specific calculation is shown in Eqs. (1)–(5):
R, G, B and Y are the image's red, green, blue and yellow color characteristics. r. g, and b are the intensity values of the image's red, green and blue channels. I is the brightness characteristic. Then, the feature map is further generated. According to the "center-periphery difference", the difference between the images of different layers of the pyramid is calculated to obtain the brightness, color and direction feature images. Before calculating the difference value, the method of reducing the difference value of the image is adopted to unify the size of the image [21]. The calculation method is as follows.
Θ is channel-wise multiplication. RG (c,s) and BY (c,s) are the red-green color feature map and the blue-yellow color feature map. I (c,s) is the brightness feature map. \(\text{O (c,\,s,}\,{\uptheta})\) is the direction feature map. c is the number of central layers. s is the number of surrounding layers. \(\uptheta \) is the operation of unifying the image size by using the method of conducting interpolation to small images and then subtracting [22].
Finally, a saliency map is generated. First, the feature maps of each feature are normalized to obtain the luminance, color, and orientation saliency maps. Then, the saliency map of each color feature is normalized to obtain the comprehensive saliency map.
In the preliminary detection, the acquisition of a saliency map mainly considers brightness, color, direction, and other factors. The depth factor also affects the saliency of underwater vision. Next, the preliminary detection results are combined with depth information to generate a comprehensive saliency map. The depth features to refine and optimize the saliency map to improve the quality of the saliency map [23].
2.2 Image recognition system based on smart blockchain technology
Blockchain technology is a new distributed infrastructure and computing paradigm. It uses the block and chain data structure to verify and store data, uses distributed node consensus algorithm to generate and update data, uses cryptography to ensure the security of data transmission and access, and uses smart contracts composed of automated script code to program and operate data. In short, in the blockchain system, the transaction data generated by each participant will be packaged into a data block after some time. The data blocks are arranged in chronological order to form a chain of data blocks. All participants have the same data chain that cannot be tampered with unilaterally [24]. Any information modification can only be carried out with the consent of the agreed proportion of subjects. Besides, only new information can be added, and the old information cannot be deleted or modified. It can achieve information sharing and consistent decision-making among multiple subjects, and ensure that the identity of each subject and the transaction information between subjects are not tampered with, open and transparent. Compared with traditional networks, blockchain has two core characteristics. The first is that data are not easy to tamper with, and the second is decentralization. Based on these two characteristics, the information recorded by the blockchain is more authentic and reliable [25].
The proposed image recognition system based on smart blockchain technology adopts the form of Web, and adopts the architecture mode of front and back separation. The front end is implemented using the Vue framework. Vue's core is the view layer, which is easy to use and integrated with third-party libraries or existing projects. This is a major Vue componentization feature, making it easy to create single-page applications. The back-end blockchain includes blockchain services and tamper detection services. A block is equivalent to a storage unit. With Bitcoin as an example, Bitcoin transaction records are stored in blocks. Bitcoin system generates a block every 10 min, and each data block generally includes two parts: block header and block body. The block header stores the hash values of the previous and current blocks' hash values, which are linked to each other, similar to the "linked list" in the data structure [26].
The blockchain service of this system adopts the blockchain architecture of Hyperledger Fabric. Due to hardware limitations, this work chooses to build the nodes of blockchain services in five Dockers on a Linux server to simulate multiple blockchain server nodes. These nodes include two Peer nodes, a Certificate Authority (CA) node, an Orderer node, and a couchdb node [27].
2.3 Particle swarm optimization
PSO is based on Swarm Intelligence (SI), like the ant colony algorithm. Its idea comes from artificial life and evolutionary computing theory. Kennedy and Eberhart first proposed PSO to optimize the problem by simulating the social behavior of birds [28]. In a sense, PSO, like evolutionary algorithms, is based on the population, and each individual has an adaptation function value. The adjustment of each individual (called particle) in PSO is also similar to the crossover operator in evolutionary algorithms, but it stems from the simulation of social behavior rather than the idea of survival of the fittest [29]. Unlike evolutionary algorithms, in PSO, each benefits from its previous motion history, whereas in evolutionary algorithms, no such mechanism exists. Moreover, PSO is simple to implement, and there are few parameters to adjust [30].
In a PSO system, a population containing a certain number of individuals (often referred to as particles) moves in the search space, where each particle represents a potential solution to a particular optimization problem. The optimal position influences the position of each particle in the population in its own movement (individual experience). The optimal position of the particle in its neighborhood (neighborhood experience) [31]. When the neighborhood of a particle is the whole particle population, the optimal position of the neighborhood corresponds to the optimal global particle. The algorithm is called the global PSO algorithm. Accordingly, if a small neighborhood is used in the algorithm, it is usually called the local PSO algorithm. The global PSO converges fast but is easy to fall into a local minimum. By comparison, local PSOs can usually search for a better solution, but it is slow. Additionally, a problem-related adaptation function is needed to evaluate the performance of each particle in different optimization problems [32].
Suppose the adaptation function by f. In that case, the individual optimal position yi of particle i can be adjusted according to Eq. (10):
\({x}_{i}\) is the current position of particle i. \({y}_{i}\) is the individual optimal position of particle i [33].
It is set that the particle neighborhood size is l and the particle population size is s. When l < s, the PSO algorithm is the local version of the PSO algorithm. When l = s, that is, the neighborhood of the particle is the whole population, the PSO algorithm at this time is the global version of the PSO algorithm, and the optimal location of the population \(\widehat{y}\) can be obtained from the following equation.
Neighborhood topologies influence the performance of the PSO algorithm. Usually, each structure has its advantages, and different structures need to be selected for different practical problems. Star and ring are two commonly used topological structures [34]. The specific steps of the PSO algorithm are shown in Figure 3:
Figure 3 shows the steps of the PSO algorithm. (1) Initialize particle swarm velocity and position, inertia factor, acceleration constant, the maximum number of iterations, and minimum error of algorithm termination. (2) Evaluate the initial fitness value of each particle. That is to substitute it into the objective function. (3) The initial adaptation value is taken as the local optimum of the current particle (dependent variable), and the position is taken as the position of the current local optimum (independent variable). (4) Take the best local optimum (initial adaptation value) among all particles as the current global optimum and as the current global optimum value (the strongest one). (5) Replace the velocity update with updating particles' flying speed. The amplitude processing is limited to exceeding the maximum particle flying speed of the particle. (6) Then, substitute the displacement update expression to update the position of each particle. (7) Compare whether the corresponding adaptive value is better than each particle's historical local optimal value. If it is good, the current adaptive value is the optimal local value of the particle, and the corresponding position is the optimal local position. (8) Find out the optimal global value in the current particle swarm, and take the corresponding position as the optimal global position. (9) Steps 5 ~ 8 are repeated until the set minimum error is met or the maximum number of iterations is reached. (10) The optimum, position, the local optimum, and the local position of other particles are outputted [35].
In the star structure, only one particle is selected as the center and is connected to the other particles in the group, while all the other particles are connected to the center only. For the ring topology, the particles are circularly distributed. Each particle is connected to one particle on the left and another on the right. In this work, von Neumann's topology establishes a new PSO model. In a von Neumann structure, each particle is connected to its neighborhood [36].
Suppose the current position of moving particles happens to be the optimal global position. In that case, the velocity iteration of particles will only depend on the iteration speed. This will lead to "premature." Therefore, some researchers proposed an improved method to ensure PSO convergence to the local optimum. The strategy is to update the optimal global particle in a new way and reset the particle's position to the global extremum point. The other particles are still iteratively updated according to the original equation. Compared with the original PSO algorithm, the convergence speed of this method is greatly improved. The algorithm principle is as follows:
-
(1)
Randomly initialize the whole particle population.
-
(2)
Run the algorithm until it converges to a local optimum, and save the position of this point.
-
(3)
Repeat steps (1) and (2) until stopping criteria are met [37].
In step (2), the convergence speed of the algorithm is controlled according to the change rate of the objective function, and the calculation of the change rate reads:
If \({f}_{ratio}\) is less than a self-defined threshold, then the counter is incremented by one bit. When the counter finally reaches a certain threshold, the particle population is considered to have converged at this time. The iterative equation of particle velocity is as follows:
\(v\) is the velocity vector, and \({y}_{id}^{k}\) is the optimal historical position of the i-th particle to the k-th time. \(w\) is the inertia factor, which is non-negative. The larger the value is, the stronger the global optimization ability is. The smaller the value is, the stronger the local optimization ability is.
\({k}_{1}\), \({k}_{2}\) and \({k}_{3}\) are self-set parameters. Equation (14) is the calculation equation of \({y}_{jd}^{k}\).
The parameter values of PSO are analyzed. (1) Population size N: population size N affects the algorithm's search ability and calculation amount. PSO has low requirements for population size, and it can achieve good solution results when taking a size of 20–40. However, for more difficult problems or specific categories of problems, the number of particles can be 100 or 200. (2) Particle length D: the particle length D is determined by the optimization problem itself, which is the length of the solution. The optimization problem itself determines the particle range R, and each dimension can be set with different ranges. (3) Maximum speed: each dimension of the maximum speed can generally take 10%–20% of the search space of the corresponding dimension. (4) Inertia weight: inertia weight controls the influence of the previous speed on the current speed and is adopted to balance the exploration and development ability of the algorithm. Generally, it is set to decrease linearly from 0.9 to 0.4, and there is also a nonlinear decrease setting scheme. It can be set in the way of fuzzy control, or take random values between [0.5, 1.0]. It is set to 0.729 and \({k}_{1}\), \({k}_{2}\), and \({k}_{3}\) are set to 1.49445, which is conducive to the algorithm's convergence. (5) Termination condition: the termination condition determines the end of the algorithm operation, which is determined by the specific application and the problem itself. The maximum number of cycles is set to 500, 1000, and 5000, or the maximum number of function evaluations. The algorithm can be solved to obtain an acceptable solution as the termination condition, or it can be terminated if the algorithm has not improved in a long period of iteration.
2.4 Underwater target image detection and analysis using PSO algorithm
Image segmentation is a key step from image processing to image analysis. Because of its simple principle, the threshold rule has become the simplest and most commonly used technique in image segmentation. In the research and application of images, people are usually only interested in some parts or some regions of the image. These parts are often referred to as the target or foreground (other parts are referred to as the background). Generally, they correspond to specific, unique areas in the image. Detection objects must be separated and extracted from the image to be further used. In a broad sense, image segmentation groups and clusters image pixels according to the similarity criteria of some image features or feature sets (including pixel gray level, color, and texture). The image plane is divided into several non-overlapping regions with some consistency. The features of pixels in the same region are similar. However, when there is non-consistency, there are abrupt changes in pixel features between different regions.
The key of threshold segmentation is selecting the optimal threshold quickly and effectively. The proposed method utilizes the PSO algorithm to optimize the two threshold segmentation methods. The specific segmentation steps are shown in Fig. 4:
In Fig. 4, threshold segmentation steps using PSO are as follows. (1) Set the iteration number t to 0. The size of the population S is defined as m, and it is randomly initialized such that the position \({\text{p}}_{t}^{0}\) of each particle satisfies some predefined conditions. (2) According to the specific optimization problem, an appropriate objective function F() is established. The adaptive function value F(\({p}_{i}^{t}\)) is estimated for each particle. (3) Set the particle's position with the optimal adaptation function value in the population as the optimal gBest. (4) Adjust the movement speed of each particle. (5) Adjust the position of each particle. (6) Let t = t + 1 to carry out a new round of iteration. (7) Go back to step (2) and perform the operation again until the stopping criterion is met.
At the same time, in image recognition, the feature extraction of the target and the classifier's design is the key to the whole recognition process. An important feature of the computer system in image recognition is invariance to the image's translation, rotation, and proportion transformation. Thus, invariance recognition is an important work in image recognition. The recognition result is insensitive to the target's position, orientation, size, and deformation in a reasonable range. There are two ways to realize invariance recognition. (1) The invariance recognition ability of the classifier. (2) The extracted features are invariant. Generally, both methods focus on extracted features' invariance, namely feature invariants. Besides, the classifier will also affect the recognition rate. The Artificial Neural Network (ANN), also known as Parallel Distributed Processing, is a network formed by the interconnection of many artificial neurons similar to natural nerve cells. ANN solves problems completely different from traditional statistical methods. It stimulates the human brain's thinking and connects many neurons into a complex network. The network is trained with known samples, similar to human brain learning. The ANN stores nonlinear relationships between variables, similar to the memory function of the human brain. The stored network information classifies or predicts unknown samples, similar to the associative function of the human brain. It is an intelligent data processing method. Its ability to deal with nonlinear relational data is unmatched by other methods.
ANN comprises many neurons with nonlinear mapping connected by weight coefficients. The information of the network is distributed and stored in the connection weight coefficient, which makes the network have good parallel processing ability, nonlinear processing ability, and robustness. The basic processing unit of ANN, the neuron's model, is shown in Fig. 5:
In Fig. 5, the scalar input P is multiplied by the weight W to WP. Then, it is fed into the accumulator and added to the bias value \(\text{B}\) to obtain the value, often referred to as the net input. The input to the transfer function \(\text{F}\). The scalar output \(\text{A}\) is obtained by the operation of \(\text{F}\). At the same time, BP neural network. \(\text{F}\) is the input layer node. j denotes the hidden layer node. k represents the node of the output layer. This work targets image recognition of underwater targets. Thus, each input node of the network represents a component data of the image feature vector. The output node represents the category number. BP algorithm is divided into two stages. In the first stage, the input information passes from the input layer to the output layer to calculate the output value of each unit layer by layer. In the second stage (backpropagation process), the output error of each unit in the hidden layer is calculated forward layer by layer. This error corrects the weight of the previous layer. The activation function of the network node adopts a hyperbolic tangent function, so the input–output relation of the network is defined through this activation function (the forward process). The PSO algorithm is combined with BP neural network, and the specific flow chart is shown in Fig. 6:
Figure 6 shows the recognition process combining the PSO algorithm and BP neural network. Firstly, the invariant matrix features of the input image are extracted, and the invariant matrix features of the one-dimensional input image of the target are taken as the recognition features. After feature extraction and vector standardization, the input image is sent to the neural network optimized by PSO for classification and discrimination. After classification and discrimination, output recognition results.
3 Results and discussion
3.1 Saliency target outcome analysis
The proposed method is tested on the Water-Net dataset. Four visual saliency algorithms: Context-Aware (CA), Histogram-based Contrast (HC), Graph-Regularized (GR), and Spectral Residual (SR), are selected for comparison. The F-measure value (an index to evaluate the degree of fit between the predicted saliency map and the true value of the saliency map) and the Mean Absolute Error (MAE) of each algorithm are calculated and compared in Fig. 7:
Figure 7 reveals that the F value of GR is 0.806, the F value of the algorithm proposed is 0.843, the F value of CA is 0.599, the F value of HC is 0.769, and the F value of SR is 0.777. The MAE value of CA is 0.223, the MAE value of HC is 0.23, the MAE value of GR is 0.217, the MAE value of SR is 0.271, and the MAE value of the algorithm proposed is 0.166. Therefore, both GR and the proposed method have a large F value and small MAE. Compared with other methods, they have greater advantages and the results are consistent with subjective evaluation. Compared with the GR method, the algorithm proposed has a higher F value and lower MAE value. Therefore, the proposed algorithm is superior to the GR method.
3.2 Analysis of underwater image sharpening
The underwater images used in the experiment are derived from the publicly available underwater image dataset containing 950 images. The proposed underwater image target detection algorithm is compared with the Dark Channel Prior (DCP) and Underwater DCP (UDCP) algorithms. Each algorithm is calculated to evaluate the two-dimensional image entropy, underwater color image quality, and image naturalness of the original image in Fig. 8:
Figure 8 shows the comparison of the restoration results of the two-dimensional entropy of underwater images. Compared with the original image, the DCP method test sample results are 9.4434, 9.2938, 9.5743, 9.9875, 9.4773, and 10.3662. The image entropy of the restored image by the DCP method is slightly larger than that of the original image, which has a certain effect, but is not obvious. Compared with the original image, the test sample results of the UDCP method are 9.789, 10.7861, 9.5438, 10.002, 10.7817 and 10.924, respectively. The test results of the algorithm designed are 10.885, 10.6195, 9.8634, 10.3524, 11.4143, and 11.0356. Hence, the proposed method’s image entropy has been greatly improved, and the amount of information in the image has increased. In particular, the image entropy of the designed method is higher, and the information value in the image is higher in contrast, so the designed method has more advantages. Through the evaluation of the quality of underwater color images, the average score of the original image is 11.95, the average score of the DCP method is 15.13, the average score of the UDCP method is 21.76, and the average score of the underwater target detection algorithm designed is 27.82. Therefore, the quality of the recovered image by the DCP method is slightly higher than the original image, and the definition of the recovered image by the UDCP method is greatly improved. The UDCP method is better than the DCP method. Meanwhile, this method is superior to the DCP method and UDCP method in terms of clarity, color and brightness. The evaluation of image naturalness is taken as an indicator. The average score of original image naturalness is 11.97, the average score of the DCP method is 12.47, the average score of the UDCP method is 11.41, and the average score of the method proposed is 11.27. Hence, UDCP and the method proposed have high image quality. In general, the effect of the method proposed is the best.
3.3 Image target recognition and tracking analysis
Different experimental samples are selected to study the classification error rate of the underwater image object detection algorithm proposed. At the same time, the gap and relative error between the proposed method designed and the common positioning method are compared at different positioning distances in Fig. 9:
The experimental results in Fig. 9 show that the relative errors of common underwater image target detection and tracking methods at 800 mm, 1000 mm and 1200 mm are 5.41%, 6.10% and 6.88%, respectively. By comparison, the relative errors of the methods proposed are 2.56%, 3.24% and 3.89%, respectively. The results show that the method has high accuracy. This is because it considers the influence of multiple underwater refraction, and obtains complete and more accurate image parameters after calibration to obtain more accurate positioning results.
4 Discussion
First, the test is conducted on the Water-Net dataset. Four visual saliency algorithms, CA, HC, GR and SR, are selected for comparison. The F-Measure value and MAE of each algorithm are calculated. The results show that compared with other algorithms, the algorithm proposed has an F value of 0.843 and an MAE value of 0.166. Thereby, the F value of this method is higher and the MAE is lower. Next, the underwater image target detection algorithm designed is compared with DCP and UDCP algorithms, and the evaluation results of each algorithm for the two-dimensional entropy of the original image, the quality of the underwater color image and the naturalness of the image are calculated. It is found that the average score of image quality extracted by the underwater target extraction and tracking algorithm designed is 27.82, and the average score of image naturalness is 11.27. Therefore, the overall effect of the method designed is better. Finally, the classification error rate of target recognition is analyzed. The results show that the relative errors of the method designed are 2.56%, 3.24% and 3.89%, respectively, suggesting a high accuracy. To sum up, the designed underwater target detection and tracking algorithm has better extraction accuracy, and has a certain degree of assurance of the image quality.
5 Conclusion
With the rapid progress of multimedia information technology and network technology, image processing and pattern recognition technology research is increasingly extensive. Image processing and pattern recognition is a frontier subject with important theoretical research and practical application value, especially in the field of underwater vision. The development of smart blockchain technology can also help the image recognition algorithm to further improve recognition accuracy. Combined with the PSO algorithm, this work designs an underwater image target detection and tracking algorithm based on VSA and smart blockchain technology. Meanwhile, the PSO algorithm is employed to optimize the underwater image target detection to a greater extent, and the existing underwater image database is used for simulation experiments. The results show that the algorithm designed i has a higher F value, lower MAE and better performance than other VSA algorithms. The average score of image quality extracted by the designed underwater target extraction and tracking algorithm is 27.82, and the average score of image naturalness is 11.27. Compared with other algorithms, the designed algorithm performs better in the overall extraction of underwater images and has better positioning accuracy in the target location of underwater images. The research innovation is that the research results can provide references and suggestions for target detection and tracking in subsequent underwater images. By combining the visual saliency algorithm with PSO, the quality and efficiency of underwater target extraction are optimized. The research limitation is that the research time is short and the sample size is limited. There are still some deficiencies in the scope and depth of the investigation, which need to be further expanded. Besides, underwater image target recognition technology is also keeping pace with the times, and new technologies will be constantly updated and used. Later, the theory and practice will be deeply combined for further research.
Data and materials availability
The data used to support the findings of this study are included within the article and Supplemental Files.
References
Zeng, L., Sun, B., Zhu, D.: Underwater target detection based on Faster R-CNN and adversarial occlusion network. Eng. Appl. Artif. Intel. 100, 104190 (2021)
Qi, J., Gong, Z., Xue, W., Liu, X., Yao, A., Zhong, P.: An unmixing-based network for underwater target detection from hyperspectral imagery. IEEE 14, 5470–5487 (2021)
Wei, X., Yu, L., Tian, S., Feng, P., Ning, X.: Underwater target detection with an attention mechanism and improved scale. Multimed. Tools Appl. 80(25), 33747–33761 (2021)
Qi, J., Gong, Z., Yao, A., Liu, X., Li, Y., Zhang, Y., Zhong, P.: Bathymetric-based band selection method for hyperspectral underwater target detection. Remote Sens. 13(19), 3798 (2021)
Wang, X., Zhu, Y., Li, D., Zhang, G.: Underwater target detection based on reinforcement learning and ant colony optimization. J. Ocean Univ. China 21(2), 323–330 (2022)
Zhang, D., Gao, L., Teng, T., Jia, Z.: Underwater moving target detection using track-before-detect method with low power and high refresh rate signal. Appl. Acoust. 174, 107750 (2021)
Shi, J., Zhuo, X., Zhang, C., Bian, Y.X., Shen, H.: Research on key technologies of underwater target detection. NPTA 11763, 1128–1137 (2021)
Zheng, Y., Yu, M., Liu, R., Liu, Y.: Underwater target detection based on deep neural network and image enhancement. J. Phys. Conf. Ser. 2029(1), 012145 (2021)
Zhang, L., Li, C., Sun, H.: Object detection/tracking toward underwater photographs by remotely operated vehicles (ROVs). Futur. Gener. Comput. Syst. 126, 163–168 (2022)
Jian, M., Liu, X., Luo, H., Lu, X., Yu, H., Dong, J.: Underwater image processing and analysis: a review. Signal Process. Image Commun. 91, 116088 (2021)
Qian, S., Shi, Y., Wu, H., Liu, J., Zhang, W.: An adaptive enhancement algorithm based on visual saliency for low illumination images. Appl. Intell. 52(2), 1770–1792 (2022)
Kannan, S.: Intelligent object recognition in underwater images using evolutionary-based Gaussian mixture model and shape matching. Signal Image Video Proc. 14(5), 877–885 (2020)
Li, X., Camerer, C.F.: Predictable effects of visual salience in experimental decisions and games. Q. J. Econ. 137(3), 1849–1900 (2022)
Krüger, A., Scharlau, I.: The time course of salience: not entirely caused by salience. Jpn. Psychol. Res. 86(1), 234–251 (2022)
Tay, D., Jannati, A., Green, J.J., McDonald, J.J.: Dynamic inhibitory control prevents salience-driven capture of visual attention. J. Exp. Psychol. Hum. Percept. Perform. 48(1), 37 (2022)
Rust, N.C., Cohen, M.R.: Priority coding in the visual system. Nature 23(6), 376–388 (2022)
Yutong, G., Khishe, M., Mohammadi, M., Rashidi, S., Nateri, M.S.: Evolving deep convolutional neural networks by extreme learning machine and fuzzy slime mould optimizer for real-time sonar image recognition. Int. J. Fuzzy Syst. 24(3), 1371–1389 (2022)
Beffara, B., Hadj-Bouziane, F., Hamed, S.B., Boehler, C.N., Chelazzi, L., Santandrea, E., Macaluso, E.: Dynamic causal interactions between occipital and parietal cortex explain how endogenous spatial attention and stimulus-driven salience jointly shape the distribution of processing priorities in 2D visual space. Neuroimage 255, 119206 (2022)
Lawrence, R.K., Pratt, J.: Salience matters: distractors may, or may not, speed target-absent searches. Atten. Percept. Psychophys. 84(1), 89–100 (2022)
Zhao, L., Bo, Q., Zhang, Z., Chen, Z., Wang, Y., Zhang, D.: Altered dynamic functional connectivity in early psychosis between the salience network and visual network. Neuroscience 491, 166–175 (2022)
Verma, G., Kumar, M.: Systematic review and analysis on underwater image enhancement methods, datasets, and evaluation metrics. J. Electron. Imaging 31(6), 060901 (2022)
Pahnehkolaei, S.M.A., Alfi, A., Machado, J.T.: Analytical stability analysis of the fractional-order particle swarm optimization algorithm. Chaos Solitons Fractals 155, 111658 (2022)
Cui, Y., Meng, X., Qiao, J.: A multi-objective particle swarm optimization algorithm based on two-archive mechanism. Appl. Soft Comput. 119, 108532 (2022)
Afroz, Z., Shafiullah, G.M., Urmee, T., Shoeb, M.A., Higgins, G.: Predictive modelling and optimization of HVAC systems using neural network and particle swarm optimization algorithm. Build. Environ. 209, 108681 (2022)
Zhang, J.: Processing and compression of underwater image based on deep learning. Optik 271, 170168 (2022)
Huang, H., Zuo, Z., Sun, B., Wu, P., Zhang, J.: DSA-SOLO: double split attention SOLO for side-scan sonar target segmentation. SN Appl. Sci. 12(18), 9365 (2022)
Hu, P., Pan, J.S., Chu, S.C., Sun, C.: Multi-surrogate assisted binary particle swarm optimization algorithm and its application for feature selection. Appl. Soft Comput. 121, 108736 (2022)
Li, L., Zhang, Y., Fung, J.C., Qu, H., Lau, A.K.: A coupled computational fluid dynamics and backpropagation neural network-based particle swarm optimizer algorithm for predicting and optimizing indoor air quality. Build. Environ. 207, 108533 (2022)
Hu, K., Weng, C., Zhang, Y., Jin, J., Xia, Q.: An overview of underwater vision enhancement: from traditional methods to recent deep learning. J. Mar. Sci. Eng. 10(2), 241 (2022)
Zhang, X., Wang, Z., Lu, Z.: Multi-objective load dispatch for microgrid with electric vehicles using modified gravitational search and particle swarm optimization algorithm. Appl. Energy 306, 118018 (2022)
Fernandes, P.B., Oliveira, R.C.L., Neto, J.F.: Trajectory planning of autonomous mobile robots applying a particle swarm optimization algorithm with peaks of diversity. Appl. Soft Comput. 116, 108108 (2022)
Li, N., Hou, G., Liu, Y., Pan, Z., Tan, L.: Single underwater image enhancement using integrated variational model. Digit. Signal Process. 129, 103660 (2022)
Supreeth, S., Patil, K.: Hybrid genetic algorithm and modified-particle swarm optimization algorithm (GA-MPSO) for predicting scheduling virtual machines in educational cloud platforms. Int. J. Emerg. Technol. 17(7), 208 (2022)
Han, F., Zheng, M., Ling, Q.: An improved multiobjective particle swarm optimization algorithm based on tripartite competition mechanism. Appl. Intell. 52(5), 5784–5816 (2022)
Gao, Q., Xu, H., Li, A.: The analysis of commodity demand predication in supply chain network based on particle swarm optimization algorithm. J. Comput. Appl. Math. 400, 113760 (2022)
Venker, C.E., Neumann, D., Aladé, F.: Visual perceptual salience and novel referent selection in children with and without autism spectrum disorder. Autism Dev. Lang. Impair. 7, 23969415221085476 (2022)
Gaspar, A., Oliva, D., Hinojosa, S., Aranguren, I., Zaldivar, D.: An optimized Kernel extreme learning machine for the classification of the autism spectrum disorder by using gaze tracking images. Appl. Soft Comput. 120, 108654 (2022)
Funding
National Natural Science Foundation of China Youth Fund under Grant 12004293. Aeronautical Science Foundation under Grant 2019ZH0T7001. Key Research and Development Program of Shaanxi under Grant No. 2023-YBGY-019 and 2024GX-YBXM-262.
Author information
Authors and Affiliations
Contributions
Zhang and Bi made substantial contributions to the conception and design of the work. Li was mainly responsible for data acquisition, analysis and interpretation. Bi wrote the main manuscript text. Zhang revised it critically for important intellectual content. All authors reviewed the manuscript, approved the version to be published and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, Y., Bi, Y. & Li, J. Underwater image processing and target detection from particle swarm optimization algorithm. SIViP 19, 132 (2025). https://doi.org/10.1007/s11760-024-03638-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03638-8