1 Introduction

The ocean is the origin of life, a crucial space for human survival, and a precious treasure for sustainable development. With the progress of society, the understanding of the strategic position and value of the ocean is deepening. The mineral resources in the ocean mainly include seabed oil, combustible ice, and natural gas [1]. Among these resources, oil and natural gas production has reached one-fourth of global production. Also, the ocean is rich in nickel, cobalt, and other mineral resources, with much higher content than land [2]. However, humans have only exploited about 5% of the ocean. Hence, it is of great significance to further explore and exploit the ocean. Nowadays, faced with the pressure of the continuous shortage of land resources, human beings focus more on the development and rational application of marine resources. Due to the marine environment's particularity, the ocean's exploration and development are often beyond human capacity. In order to replace people to complete underwater operations, various underwater robots come into being [3]. In order to improve people's understanding of underwater images and expand the exploration of the marine field, the underwater target detection and tracking algorithm is studied and optimized based on the existing research to help improve the quality and efficiency of underwater target detection.

Underwater robot technology has gradually become a hotspot and trend of research. Underwater robots can replace humans in underwater operations, such as marine fishery exploration, seawater sampling, and resource exploration. It can also break through the limitations of human underwater operations and can operate at great depths and in harsh environments for long periods. For example, China's manned submersible "Striver" has carried out deep-sea operations many times, and the maximum diving depth is more than 10,000 m [4]. With the deeper exploration of the ocean, the role of underwater robots will become more important. Underwater robots have a great place to use. At present, the most widely used unmanned underwater vehicles are divided into cabled remote-control vehicles and uncabled autonomous underwater vehicles. A cable-operated submersible is a device controlled from the surface with operating tools such as thrusters. The surface powers underwater televisions and underwater manipulators. On the other hand, a cable-free Autonomous Underwater Vehicle (AUV) has its own energy and certain intelligence and can automatically complete navigation planning, obstacle avoidance, and operation implementation according to the underwater environment and operation tasks in water [5]. The environment sensing layer of AUV is the basis of intelligent underwater vehicles. Therefore, underwater target detection based on AUV has important research significance. Accurate identification of underwater targets is a difficulty in underwater target detection by AUV, which is also an important research content in computer vision [6]. Particle swarm optimization (PSO) algorithm mainly includes optical, acoustic and magnetic detection technologies. A single sensor is affected by various factors. These factors will reduce the system's reliability and identification accuracy, leading to misjudgment and identification of underwater targets. Multi-sensor information fusion can effectively improve the system's robustness, expand the observation range, enhance the data credibility, and improve the system recognition ability [7]. The underwater target detection system based on the PSO algorithm is based on the image. It adopts image preprocessing, segmentation, morphology processing, feature extraction and other technologies to complete target recognition and realize machine vision. However, the detection objects in the underwater scene usually differ from the background, such as obstacles on the route, companions in the cooperative underwater robot, and targets in the underwater fishing operation [8]. Unlike general feature extraction methods, the target detection method based on visual saliency analysis (VSA) compares the target with the image background. Hence, it is more suitable for underwater scenes with relatively simple backgrounds [9]. Jian et al. [10] surveyed and summarized the existing relatively mature and representative underwater image processing models, and divided the underwater image processing models into seven categories: enhancement, defogging, noise reduction, segmentation, significant target detection, color constancy and restoration. Then, they objectively evaluated the current and future development trends of underwater image processing [10]. Qian et al. [11] converted the original low illumination image from the color space of red, green and blue to the color space of hue saturation intensity. Then, the image's overall brightness was adaptively improved by using the bilateral gamma correction function and the cuckoo search algorithm. In addition, a brightness-preserving double histogram construction based on a visual saliency algorithm was proposed to perform brightness conservation and contrast analysis for low-illumination color images. Finally, the processed color space was converted to obtain the enhanced image [11]. Kannan [12] tried to identify objects in underwater images using an adaptive Gaussian mixture model. The Gaussian mixture model performs accurate object segmentation with a predefined number of clusters. The initialization of parameter set by optimization techniques such as genetic algorithm, PSO and differential evolution was analyzed. Differential evolution is famous for its accurate decision-making in fewer iterations, and has been proven to be more suitable for initializing the number of clusters of the Gaussian mixture model. It is further used for object recognition and applies the internal distance shape-matching technology [12]. To sum up, previous studies have shown that in the field of underwater target detection and tracking, more scholars use visual saliency theory alone or smart blockchain technology or PSO technology alone, and fewer researchers use the two technologies together for research. Besides, using visual saliency alone for analysis cannot apply to more complex images, and there are large restrictions on use. Therefore, this work combines the two methods and theories. On the basis of ensuring the quality of underwater image extraction, the theory and application scope of research methods are expanded.

Underwater image target detection and tracking technology is an important part of underwater light vision technology. Image segmentation has always been a classic problem in image processing, especially for underwater images. The tracking and recognition of the target image is the key step of the whole target recognition system. According to the specific content of underwater image target detection and tracking, this work takes the underwater image target fusion detection and tracking method in the navigation process as the starting point. Based on visual saliency theory and smart blockchain technology, image target detection algorithm and autonomous tracking are studied in combination with PSO. The research results expand the applicable field of the underwater target detection algorithm, improve the quality and efficiency of underwater target detection, and provide further technical support and suggestions for the autonomous underwater detection of the follow-up robot. Meanwhile, it provides technical support for subsequent marine exploration and marine resource collection.

2 Experimental methods and procedures

2.1 Visual saliency analysis

VSA simulates the human visual attention mechanism, which ignores irrelevant areas in the image and focuses attention on the object of interest. Visual saliency detection highlights the salient object in the visual scene and obtains the binarized mask image of the object after segmentation [13]. Visual saliency detection is shown in Fig. 1:

Fig. 1
figure 1

Visual Saliency detection map a Underwater target map; b visual saliency processing; c Target recognition result map

Underwater scenes are usually simple; the background is mostly water or seabed sand. In the underwater target detection work, the target is generally different from the background. Therefore, the VSA method can be used to detect objects in simple underwater scene images [14].

According to the information processing mechanism, there are two kinds of visual saliency detection. One is the bottom-up approach, which is driven by data. Objects that strongly contrast an area in the visual field and the surrounding area will be noticed [15]. This approach has no task guidance and is usually guided by underlying visual features such as color, intensity, orientation, and texture. The other is a top-down approach, which is mission-driven. This is related to preset targets, such as expected information, color, and object features [16].

The classic Itti model belongs to the bottom-up way, and the extracted features are color, brightness, and orientation. This work analyzes the visual saliency detection process based on the Itti model [17]. Itti visual saliency model is a visual attention model designed based on the visual nervous system of early primates. The model first uses the Gaussian sampling method to construct the Gaussian pyramid of image color, brightness and direction. Then, it uses the Gaussian pyramid to calculate the brightness feature map, color feature map and direction feature map of the image. Finally, the brightness, color and direction saliency maps can be obtained by combining the feature maps of different scales, and the final visual saliency map can be obtained by adding. The specific structure of the Itti model is shown in Fig. 2:

Fig. 2
figure 2

Itti model a Modules contained in Itti; b Structure diagram of Itti model

In Fig. 2, the model extracts primary visual features: color, brightness, and orientation of an input image. It uses central-periphery operations at multiple scales to produce feature maps that embody saliency measures. After these feature maps are combined, the most salient spatial position in the image is obtained using the competition mechanism of the winner taking all in biology. As such, the position of attention is selected. Finally, the return inhibition method completes the focus shift [18]. The model mainly includes feature extraction, saliency map generation and attention focus shift. (1) Feature extraction: the extracted image of the primary visual features is obtained from the external input image information in the feature extraction module. The primary visual features mainly include color, motion, orientation, and brightness. (2) Saliency map generation: images in natural environments contain many redundancies, and the brain can effectively remove these redundancies and focus on useful information. When the visual attention calculation model completes the image processing, it also needs to effectively eliminate the redundant information in the natural image to further prepare for the extraction of salient information. The information contained in a still image can be divided into two types: salient information and background information. Significant information can be obtained by removing the redundant information in the image. After obtaining the feature saliency maps of brightness, color and direction, features of these saliency maps are fused to obtain the final comprehensive saliency map. (3) Attention focus shift: after the final saliency map is generated, each target displayed in the saliency map attracts attention focus through the competition mechanism. The competition of each target in the saliency map is realized by the Winner-Take-All (WTA) competition mechanism. The winner detected in the WTA mechanism is the focus of attention with higher saliency. Since the target to be noticed is always the most significant among all the targets participating in the competition and will always win in the competition, if there is no specific control mechanism, the focus will always point to the same target, while other targets will not get the chance to be noticed, and the focus of attention cannot be shifted. Attention and transfer of focus can be achieved through the detection mechanism of prohibiting return, the principle of transferring nearby, and the determination of the size of the attention area [19].

Firstly, feature extraction is carried out. A nine-layer Gaussian pyramid is constructed, and the size of the Gaussian filter is 5 × 5. The original image is used as the 0th layer for layer-by-layer filtering to obtain the next layer. Then features are extracted for each layer [20]. The specific calculation is shown in Eqs. (1)–(5):

$$\text{I}=\frac{\text{r}+\text{g}+\text{b}}{3}$$
(1)
$$\text{R}=\text{r}-\frac{\text{g}+\text{b}}{2}$$
(2)
$$\text{G}=\text{g}-\frac{\text{r}+\text{b}}{2}$$
(3)
$$\text{B}=\text{b}-\frac{\text{r}+\text{g}}{2}$$
(4)
$$\text{Y}=\text{r}+\text{g}-2(\left|\text{r}-\text{g}\right|+\text{b})$$
(5)

R, G, B and Y are the image's red, green, blue and yellow color characteristics. r. g, and b are the intensity values of the image's red, green and blue channels. I is the brightness characteristic. Then, the feature map is further generated. According to the "center-periphery difference", the difference between the images of different layers of the pyramid is calculated to obtain the brightness, color and direction feature images. Before calculating the difference value, the method of reducing the difference value of the image is adopted to unify the size of the image [21]. The calculation method is as follows.

$$ {\text{I}}\left( {{\text{c}},{\text{s}}} \right) = \left| {{\text{I}}\left( {\text{c}} \right)\Theta {\text{I}}\left( {\text{s}} \right)} \right| $$
(6)
$$ {\text{RG}}\left( {{\text{c}},{\text{s}}} \right) = \left| {\left( {{\text{R}}\left( {\text{c}} \right) - {\text{G}}\left( {\text{c}} \right)} \right)\Theta \left( {{\text{G}}\left( {\text{s}} \right) - {\text{R}}\left( {\text{s}} \right)} \right)} \right| $$
(7)
$$ {\text{BY}}\left( {{\text{c}},{\text{s}}} \right) = \left| {\left( {{\text{B}}\left( {\text{c}} \right) - {\text{Y}}\left( {\text{c}} \right)} \right)\Theta \left( {{\text{Y}}\left( {\text{s}} \right) - {\text{B}}\left( {\text{s}} \right)} \right)} \right| $$
(8)
$$ {\text{O}}\left( {{\text{c}},{\text{s}},{\uptheta }} \right) = \left| {{\text{O}}\left( {{\text{c}},{\uptheta }} \right)\Theta {\text{O}}\left( {{\text{s}},{\uptheta }} \right)} \right| $$
(9)

Θ is channel-wise multiplication. RG (c,s) and BY (c,s) are the red-green color feature map and the blue-yellow color feature map. I (c,s) is the brightness feature map. \(\text{O (c,\,s,}\,{\uptheta})\) is the direction feature map. c is the number of central layers. s is the number of surrounding layers. \(\uptheta \) is the operation of unifying the image size by using the method of conducting interpolation to small images and then subtracting [22].

Finally, a saliency map is generated. First, the feature maps of each feature are normalized to obtain the luminance, color, and orientation saliency maps. Then, the saliency map of each color feature is normalized to obtain the comprehensive saliency map.

In the preliminary detection, the acquisition of a saliency map mainly considers brightness, color, direction, and other factors. The depth factor also affects the saliency of underwater vision. Next, the preliminary detection results are combined with depth information to generate a comprehensive saliency map. The depth features to refine and optimize the saliency map to improve the quality of the saliency map [23].

2.2 Image recognition system based on smart blockchain technology

Blockchain technology is a new distributed infrastructure and computing paradigm. It uses the block and chain data structure to verify and store data, uses distributed node consensus algorithm to generate and update data, uses cryptography to ensure the security of data transmission and access, and uses smart contracts composed of automated script code to program and operate data. In short, in the blockchain system, the transaction data generated by each participant will be packaged into a data block after some time. The data blocks are arranged in chronological order to form a chain of data blocks. All participants have the same data chain that cannot be tampered with unilaterally [24]. Any information modification can only be carried out with the consent of the agreed proportion of subjects. Besides, only new information can be added, and the old information cannot be deleted or modified. It can achieve information sharing and consistent decision-making among multiple subjects, and ensure that the identity of each subject and the transaction information between subjects are not tampered with, open and transparent. Compared with traditional networks, blockchain has two core characteristics. The first is that data are not easy to tamper with, and the second is decentralization. Based on these two characteristics, the information recorded by the blockchain is more authentic and reliable [25].

The proposed image recognition system based on smart blockchain technology adopts the form of Web, and adopts the architecture mode of front and back separation. The front end is implemented using the Vue framework. Vue's core is the view layer, which is easy to use and integrated with third-party libraries or existing projects. This is a major Vue componentization feature, making it easy to create single-page applications. The back-end blockchain includes blockchain services and tamper detection services. A block is equivalent to a storage unit. With Bitcoin as an example, Bitcoin transaction records are stored in blocks. Bitcoin system generates a block every 10 min, and each data block generally includes two parts: block header and block body. The block header stores the hash values of the previous and current blocks' hash values, which are linked to each other, similar to the "linked list" in the data structure [26].

The blockchain service of this system adopts the blockchain architecture of Hyperledger Fabric. Due to hardware limitations, this work chooses to build the nodes of blockchain services in five Dockers on a Linux server to simulate multiple blockchain server nodes. These nodes include two Peer nodes, a Certificate Authority (CA) node, an Orderer node, and a couchdb node [27].

2.3 Particle swarm optimization

PSO is based on Swarm Intelligence (SI), like the ant colony algorithm. Its idea comes from artificial life and evolutionary computing theory. Kennedy and Eberhart first proposed PSO to optimize the problem by simulating the social behavior of birds [28]. In a sense, PSO, like evolutionary algorithms, is based on the population, and each individual has an adaptation function value. The adjustment of each individual (called particle) in PSO is also similar to the crossover operator in evolutionary algorithms, but it stems from the simulation of social behavior rather than the idea of survival of the fittest [29]. Unlike evolutionary algorithms, in PSO, each benefits from its previous motion history, whereas in evolutionary algorithms, no such mechanism exists. Moreover, PSO is simple to implement, and there are few parameters to adjust [30].

In a PSO system, a population containing a certain number of individuals (often referred to as particles) moves in the search space, where each particle represents a potential solution to a particular optimization problem. The optimal position influences the position of each particle in the population in its own movement (individual experience). The optimal position of the particle in its neighborhood (neighborhood experience) [31]. When the neighborhood of a particle is the whole particle population, the optimal position of the neighborhood corresponds to the optimal global particle. The algorithm is called the global PSO algorithm. Accordingly, if a small neighborhood is used in the algorithm, it is usually called the local PSO algorithm. The global PSO converges fast but is easy to fall into a local minimum. By comparison, local PSOs can usually search for a better solution, but it is slow. Additionally, a problem-related adaptation function is needed to evaluate the performance of each particle in different optimization problems [32].

Suppose the adaptation function by f. In that case, the individual optimal position yi of particle i can be adjusted according to Eq. (10):

$${y}_{i}\left(t+1\right)=\left\{\begin{array}{ll}{y}_{i}\left(t\right)\quad\qquad if\quad f({x}_{i}(t+1))\ge f({y}_{i}(t))\\ {x}_{i}\left(t+1\right) \quad if\quad f({x}_{i}(t+1))< f({y}_{i}(t))\end{array}\right.,$$
(10)

\({x}_{i}\) is the current position of particle i. \({y}_{i}\) is the individual optimal position of particle i [33].

It is set that the particle neighborhood size is l and the particle population size is s. When l < s, the PSO algorithm is the local version of the PSO algorithm. When l = s, that is, the neighborhood of the particle is the whole population, the PSO algorithm at this time is the global version of the PSO algorithm, and the optimal location of the population \(\widehat{y}\) can be obtained from the following equation.

$$\widehat{y}\left(t\right)\in \left\{{y}_{0},\dots ,{y}_{s}\right\}=min\left\{f\left({y}_{0}\right),\dots ,f({y}_{s})\right\}$$
(11)

Neighborhood topologies influence the performance of the PSO algorithm. Usually, each structure has its advantages, and different structures need to be selected for different practical problems. Star and ring are two commonly used topological structures [34]. The specific steps of the PSO algorithm are shown in Figure 3:

Fig. 3
figure 3

Steps of the PSO algorithm

Figure 3 shows the steps of the PSO algorithm. (1) Initialize particle swarm velocity and position, inertia factor, acceleration constant, the maximum number of iterations, and minimum error of algorithm termination. (2) Evaluate the initial fitness value of each particle. That is to substitute it into the objective function. (3) The initial adaptation value is taken as the local optimum of the current particle (dependent variable), and the position is taken as the position of the current local optimum (independent variable). (4) Take the best local optimum (initial adaptation value) among all particles as the current global optimum and as the current global optimum value (the strongest one). (5) Replace the velocity update with updating particles' flying speed. The amplitude processing is limited to exceeding the maximum particle flying speed of the particle. (6) Then, substitute the displacement update expression to update the position of each particle. (7) Compare whether the corresponding adaptive value is better than each particle's historical local optimal value. If it is good, the current adaptive value is the optimal local value of the particle, and the corresponding position is the optimal local position. (8) Find out the optimal global value in the current particle swarm, and take the corresponding position as the optimal global position. (9) Steps 5 ~ 8 are repeated until the set minimum error is met or the maximum number of iterations is reached. (10) The optimum, position, the local optimum, and the local position of other particles are outputted [35].

In the star structure, only one particle is selected as the center and is connected to the other particles in the group, while all the other particles are connected to the center only. For the ring topology, the particles are circularly distributed. Each particle is connected to one particle on the left and another on the right. In this work, von Neumann's topology establishes a new PSO model. In a von Neumann structure, each particle is connected to its neighborhood [36].

Suppose the current position of moving particles happens to be the optimal global position. In that case, the velocity iteration of particles will only depend on the iteration speed. This will lead to "premature." Therefore, some researchers proposed an improved method to ensure PSO convergence to the local optimum. The strategy is to update the optimal global particle in a new way and reset the particle's position to the global extremum point. The other particles are still iteratively updated according to the original equation. Compared with the original PSO algorithm, the convergence speed of this method is greatly improved. The algorithm principle is as follows:

  1. (1)

    Randomly initialize the whole particle population.

  2. (2)

    Run the algorithm until it converges to a local optimum, and save the position of this point.

  3. (3)

    Repeat steps (1) and (2) until stopping criteria are met [37].

In step (2), the convergence speed of the algorithm is controlled according to the change rate of the objective function, and the calculation of the change rate reads:

$${f}_{ratio}=\frac{f\left(\widehat{y}\left(t\right)\right)-f(\widehat{y}(t-1))}{f(\widehat{y}(t))}$$
(12)

If \({f}_{ratio}\) is less than a self-defined threshold, then the counter is incremented by one bit. When the counter finally reaches a certain threshold, the particle population is considered to have converged at this time. The iterative equation of particle velocity is as follows:

$${v}_{id}^{k+1}=w\bullet {v}_{id}^{k}+{k}_{1}\left({y}_{id}^{k}-{x}_{id}^{k}\right)+{k}_{2}\left({\widehat{y}}_{d}-{x}_{id}^{k}\right)+{k}_{3}({y}_{jd}^{k}-{x}_{id}^{k})$$
(13)

\(v\) is the velocity vector, and \({y}_{id}^{k}\) is the optimal historical position of the i-th particle to the k-th time. \(w\) is the inertia factor, which is non-negative. The larger the value is, the stronger the global optimization ability is. The smaller the value is, the stronger the local optimization ability is.

\({k}_{1}\), \({k}_{2}\) and \({k}_{3}\) are self-set parameters. Equation (14) is the calculation equation of \({y}_{jd}^{k}\).

$${y}_{jd}^{k}=\frac{f\left({x}_{i}^{k}\right)-f({y}_{j}^{k})}{\left|{y}_{jd}^{k}-{x}_{id}^{k}\right|}$$
(14)

The parameter values of PSO are analyzed. (1) Population size N: population size N affects the algorithm's search ability and calculation amount. PSO has low requirements for population size, and it can achieve good solution results when taking a size of 20–40. However, for more difficult problems or specific categories of problems, the number of particles can be 100 or 200. (2) Particle length D: the particle length D is determined by the optimization problem itself, which is the length of the solution. The optimization problem itself determines the particle range R, and each dimension can be set with different ranges. (3) Maximum speed: each dimension of the maximum speed can generally take 10%–20% of the search space of the corresponding dimension. (4) Inertia weight: inertia weight controls the influence of the previous speed on the current speed and is adopted to balance the exploration and development ability of the algorithm. Generally, it is set to decrease linearly from 0.9 to 0.4, and there is also a nonlinear decrease setting scheme. It can be set in the way of fuzzy control, or take random values between [0.5, 1.0]. It is set to 0.729 and \({k}_{1}\), \({k}_{2}\), and \({k}_{3}\) are set to 1.49445, which is conducive to the algorithm's convergence. (5) Termination condition: the termination condition determines the end of the algorithm operation, which is determined by the specific application and the problem itself. The maximum number of cycles is set to 500, 1000, and 5000, or the maximum number of function evaluations. The algorithm can be solved to obtain an acceptable solution as the termination condition, or it can be terminated if the algorithm has not improved in a long period of iteration.

2.4 Underwater target image detection and analysis using PSO algorithm

Image segmentation is a key step from image processing to image analysis. Because of its simple principle, the threshold rule has become the simplest and most commonly used technique in image segmentation. In the research and application of images, people are usually only interested in some parts or some regions of the image. These parts are often referred to as the target or foreground (other parts are referred to as the background). Generally, they correspond to specific, unique areas in the image. Detection objects must be separated and extracted from the image to be further used. In a broad sense, image segmentation groups and clusters image pixels according to the similarity criteria of some image features or feature sets (including pixel gray level, color, and texture). The image plane is divided into several non-overlapping regions with some consistency. The features of pixels in the same region are similar. However, when there is non-consistency, there are abrupt changes in pixel features between different regions.

The key of threshold segmentation is selecting the optimal threshold quickly and effectively. The proposed method utilizes the PSO algorithm to optimize the two threshold segmentation methods. The specific segmentation steps are shown in Fig. 4:

Fig. 4
figure 4

Step diagram of the threshold segmentation algorithm

In Fig. 4, threshold segmentation steps using PSO are as follows. (1) Set the iteration number t to 0. The size of the population S is defined as m, and it is randomly initialized such that the position \({\text{p}}_{t}^{0}\) of each particle satisfies some predefined conditions. (2) According to the specific optimization problem, an appropriate objective function F() is established. The adaptive function value F(\({p}_{i}^{t}\)) is estimated for each particle. (3) Set the particle's position with the optimal adaptation function value in the population as the optimal gBest. (4) Adjust the movement speed of each particle. (5) Adjust the position of each particle. (6) Let t = t + 1 to carry out a new round of iteration. (7) Go back to step (2) and perform the operation again until the stopping criterion is met.

At the same time, in image recognition, the feature extraction of the target and the classifier's design is the key to the whole recognition process. An important feature of the computer system in image recognition is invariance to the image's translation, rotation, and proportion transformation. Thus, invariance recognition is an important work in image recognition. The recognition result is insensitive to the target's position, orientation, size, and deformation in a reasonable range. There are two ways to realize invariance recognition. (1) The invariance recognition ability of the classifier. (2) The extracted features are invariant. Generally, both methods focus on extracted features' invariance, namely feature invariants. Besides, the classifier will also affect the recognition rate. The Artificial Neural Network (ANN), also known as Parallel Distributed Processing, is a network formed by the interconnection of many artificial neurons similar to natural nerve cells. ANN solves problems completely different from traditional statistical methods. It stimulates the human brain's thinking and connects many neurons into a complex network. The network is trained with known samples, similar to human brain learning. The ANN stores nonlinear relationships between variables, similar to the memory function of the human brain. The stored network information classifies or predicts unknown samples, similar to the associative function of the human brain. It is an intelligent data processing method. Its ability to deal with nonlinear relational data is unmatched by other methods.

ANN comprises many neurons with nonlinear mapping connected by weight coefficients. The information of the network is distributed and stored in the connection weight coefficient, which makes the network have good parallel processing ability, nonlinear processing ability, and robustness. The basic processing unit of ANN, the neuron's model, is shown in Fig. 5:

Fig. 5
figure 5

ANN structure diagram a Neuron structure diagram; b Structure diagram of Backpropagation (BP) neural network

In Fig. 5, the scalar input P is multiplied by the weight W to WP. Then, it is fed into the accumulator and added to the bias value \(\text{B}\) to obtain the value, often referred to as the net input. The input to the transfer function \(\text{F}\). The scalar output \(\text{A}\) is obtained by the operation of \(\text{F}\). At the same time, BP neural network. \(\text{F}\) is the input layer node. j denotes the hidden layer node. k represents the node of the output layer. This work targets image recognition of underwater targets. Thus, each input node of the network represents a component data of the image feature vector. The output node represents the category number. BP algorithm is divided into two stages. In the first stage, the input information passes from the input layer to the output layer to calculate the output value of each unit layer by layer. In the second stage (backpropagation process), the output error of each unit in the hidden layer is calculated forward layer by layer. This error corrects the weight of the previous layer. The activation function of the network node adopts a hyperbolic tangent function, so the input–output relation of the network is defined through this activation function (the forward process). The PSO algorithm is combined with BP neural network, and the specific flow chart is shown in Fig. 6:

Fig. 6
figure 6

Image recognition flow chart

Figure 6 shows the recognition process combining the PSO algorithm and BP neural network. Firstly, the invariant matrix features of the input image are extracted, and the invariant matrix features of the one-dimensional input image of the target are taken as the recognition features. After feature extraction and vector standardization, the input image is sent to the neural network optimized by PSO for classification and discrimination. After classification and discrimination, output recognition results.

3 Results and discussion

3.1 Saliency target outcome analysis

The proposed method is tested on the Water-Net dataset. Four visual saliency algorithms: Context-Aware (CA), Histogram-based Contrast (HC), Graph-Regularized (GR), and Spectral Residual (SR), are selected for comparison. The F-measure value (an index to evaluate the degree of fit between the predicted saliency map and the true value of the saliency map) and the Mean Absolute Error (MAE) of each algorithm are calculated and compared in Fig. 7:

Fig. 7
figure 7

Saliency target result graph a F value; b MAE

Figure 7 reveals that the F value of GR is 0.806, the F value of the algorithm proposed is 0.843, the F value of CA is 0.599, the F value of HC is 0.769, and the F value of SR is 0.777. The MAE value of CA is 0.223, the MAE value of HC is 0.23, the MAE value of GR is 0.217, the MAE value of SR is 0.271, and the MAE value of the algorithm proposed is 0.166. Therefore, both GR and the proposed method have a large F value and small MAE. Compared with other methods, they have greater advantages and the results are consistent with subjective evaluation. Compared with the GR method, the algorithm proposed has a higher F value and lower MAE value. Therefore, the proposed algorithm is superior to the GR method.

3.2 Analysis of underwater image sharpening

The underwater images used in the experiment are derived from the publicly available underwater image dataset containing 950 images. The proposed underwater image target detection algorithm is compared with the Dark Channel Prior (DCP) and Underwater DCP (UDCP) algorithms. Each algorithm is calculated to evaluate the two-dimensional image entropy, underwater color image quality, and image naturalness of the original image in Fig. 8:

Fig. 8
figure 8

Underwater image sharpening results a image two-dimensional entropy comparison; b color quality result map; c naturalness result map

Figure 8 shows the comparison of the restoration results of the two-dimensional entropy of underwater images. Compared with the original image, the DCP method test sample results are 9.4434, 9.2938, 9.5743, 9.9875, 9.4773, and 10.3662. The image entropy of the restored image by the DCP method is slightly larger than that of the original image, which has a certain effect, but is not obvious. Compared with the original image, the test sample results of the UDCP method are 9.789, 10.7861, 9.5438, 10.002, 10.7817 and 10.924, respectively. The test results of the algorithm designed are 10.885, 10.6195, 9.8634, 10.3524, 11.4143, and 11.0356. Hence, the proposed method’s image entropy has been greatly improved, and the amount of information in the image has increased. In particular, the image entropy of the designed method is higher, and the information value in the image is higher in contrast, so the designed method has more advantages. Through the evaluation of the quality of underwater color images, the average score of the original image is 11.95, the average score of the DCP method is 15.13, the average score of the UDCP method is 21.76, and the average score of the underwater target detection algorithm designed is 27.82. Therefore, the quality of the recovered image by the DCP method is slightly higher than the original image, and the definition of the recovered image by the UDCP method is greatly improved. The UDCP method is better than the DCP method. Meanwhile, this method is superior to the DCP method and UDCP method in terms of clarity, color and brightness. The evaluation of image naturalness is taken as an indicator. The average score of original image naturalness is 11.97, the average score of the DCP method is 12.47, the average score of the UDCP method is 11.41, and the average score of the method proposed is 11.27. Hence, UDCP and the method proposed have high image quality. In general, the effect of the method proposed is the best.

3.3 Image target recognition and tracking analysis

Different experimental samples are selected to study the classification error rate of the underwater image object detection algorithm proposed. At the same time, the gap and relative error between the proposed method designed and the common positioning method are compared at different positioning distances in Fig. 9:

Fig. 9
figure 9

Positioning error result diagram a Comparison diagram of measurement results; b Error result

The experimental results in Fig. 9 show that the relative errors of common underwater image target detection and tracking methods at 800 mm, 1000 mm and 1200 mm are 5.41%, 6.10% and 6.88%, respectively. By comparison, the relative errors of the methods proposed are 2.56%, 3.24% and 3.89%, respectively. The results show that the method has high accuracy. This is because it considers the influence of multiple underwater refraction, and obtains complete and more accurate image parameters after calibration to obtain more accurate positioning results.

4 Discussion

First, the test is conducted on the Water-Net dataset. Four visual saliency algorithms, CA, HC, GR and SR, are selected for comparison. The F-Measure value and MAE of each algorithm are calculated. The results show that compared with other algorithms, the algorithm proposed has an F value of 0.843 and an MAE value of 0.166. Thereby, the F value of this method is higher and the MAE is lower. Next, the underwater image target detection algorithm designed is compared with DCP and UDCP algorithms, and the evaluation results of each algorithm for the two-dimensional entropy of the original image, the quality of the underwater color image and the naturalness of the image are calculated. It is found that the average score of image quality extracted by the underwater target extraction and tracking algorithm designed is 27.82, and the average score of image naturalness is 11.27. Therefore, the overall effect of the method designed is better. Finally, the classification error rate of target recognition is analyzed. The results show that the relative errors of the method designed are 2.56%, 3.24% and 3.89%, respectively, suggesting a high accuracy. To sum up, the designed underwater target detection and tracking algorithm has better extraction accuracy, and has a certain degree of assurance of the image quality.

5 Conclusion

With the rapid progress of multimedia information technology and network technology, image processing and pattern recognition technology research is increasingly extensive. Image processing and pattern recognition is a frontier subject with important theoretical research and practical application value, especially in the field of underwater vision. The development of smart blockchain technology can also help the image recognition algorithm to further improve recognition accuracy. Combined with the PSO algorithm, this work designs an underwater image target detection and tracking algorithm based on VSA and smart blockchain technology. Meanwhile, the PSO algorithm is employed to optimize the underwater image target detection to a greater extent, and the existing underwater image database is used for simulation experiments. The results show that the algorithm designed i has a higher F value, lower MAE and better performance than other VSA algorithms. The average score of image quality extracted by the designed underwater target extraction and tracking algorithm is 27.82, and the average score of image naturalness is 11.27. Compared with other algorithms, the designed algorithm performs better in the overall extraction of underwater images and has better positioning accuracy in the target location of underwater images. The research innovation is that the research results can provide references and suggestions for target detection and tracking in subsequent underwater images. By combining the visual saliency algorithm with PSO, the quality and efficiency of underwater target extraction are optimized. The research limitation is that the research time is short and the sample size is limited. There are still some deficiencies in the scope and depth of the investigation, which need to be further expanded. Besides, underwater image target recognition technology is also keeping pace with the times, and new technologies will be constantly updated and used. Later, the theory and practice will be deeply combined for further research.