Elsevier

Neurocomputing

Volume 399, 25 July 2020, Pages 479-490
Neurocomputing

Reinforcement learning control for underactuated surface vessel with output error constraints and uncertainties

https://doi.org/10.1016/j.neucom.2020.03.021Get rights and content

Highlights

  • We present a reinforcement-learning-based trajectory tracking method for an underactuated surface vessel with model uncertainties and disturbances.

  • A novel reinforcement learning method based on Actor-Critic neural network structure is proposed to guarantee the convergence of tracking errors.

  • Error transformation function is introduced to tackle the output constraint problem which ensures the tracking errors stay in the constraint boundaries.

  • A novel critic function comprised of the primary critic signal and a second critic signal instead of the long-term cost function to supervise the tracking performance and tune the weights of the actor NN.

  • The Actor-Critic neural network structure, which does not rely on the system model and does not need off-line training, is used to approximate and compensate the system unknown nonlinear dynamics and disturbances, so as to improve the system performance.

Abstract

This study investigates the trajectory tracking control problem of an underactuated marine vessel in the presence of output constraints, model uncertainties and environmental disturbances. The error transformation technique can ensure that the tracking errors remain within the predefined constraint boundaries. The controller is designed in combination with the critic function and the reinforcement learning (RL) algorithm based on actor-critic neural networks. The RL method is applied to solve model uncertainties and disturbances, and the critic function modifies the control action to supervise the system performance. Based on Lyapunov’s direct method, a stability analysis is proposed to prove that the boundedness of system signals and the desired tracking performance can be guaranteed. Finally, the simulation illustrates the effectiveness and feasibility of the proposed controller.

Introduction

Over the past few decades, as the exploitation of marine resources has intensified, the demand for surface vessels has increased dramatically. Surface vehicles have been developed for marine reconnaissance, scientific research, environmental missions, ocean resource exploration, military uses, and other applications [1], [2]. Surface vessels are generally underactuated; thus, they are also called underactuated surface vessels (USVs). A USV mainly relies on two effective control inputs to control the vehicle’s three degree of freedom (DOF) motion, so the control difficulty is greater than that of a fully actuated vessel.

The precision tracking reference trajectory is an important guarantee that ensures the efficient completion of tasks. Therefore, trajectory tracking control is major area of focus in the field of USV research. Researchers have proposed several control schemes for the USV tracking problem [3], [4], [5], [6], [7]. In [8], [9], [10], [11], a series of algorithms were proposed to solve the external disturbance caused by complex marine conditions. To address the fact that the USV model cannot be accurately determined in actual navigation, researchers combined the disturbance observer [12], [13], [14], [15] or adaptive laws [16], [17]. In view of the strong approximation ability of a neural network (NN) to a nonlinear function, researchers applied it to USV tracking control and realized the estimation of the nonlinear part of the controller by utilizing the approximation characteristics [18], [19], [20], [21]. Peng et al. [22] developed a robust adaptive steering law with an NN to identify the unknown model dynamics as well as to reconstruct the system states corrupted by measurement noises [23]. A multi-layer NN combined with adaptive robust control techniques was employed to improve the USV’s robustness against unmodeled dynamics and environmental disturbances in [24]. Similarly, the backstepping method combined with the NN and adaptive law is also a reliable algorithm to solve the uncertainties and ocean disturbance problems and ensure the effective tracking of the reference trajectory [25], [26].

All the above studies have designed reliable trajectory tracking algorithms which attempt to solve problems such as actuator saturation, performance prediction, environmental disturbances, and model uncertainties using an NN and observer. However the performance optimization has not been fully studied. Optimal control can not only be employed to solve the nonlinear system tracking control problem but also to guarantee the tracking performance of the system to be optimized. In general, the optimal control solution of a nonlinear system should be obtained by solving the Hamilton-Jacobi-Bellman (HJB) equation. However, owing to its inherent nonlinearity, complexity, and intractability, it is almost impossible to directly solve the equation. Therefore, most optimal control problems are attributed to the treatment of the HJB equation [27], [28], [29]. Among them, the reinforcement learning (RL) strategy based on actor-critic neural networks (AC-NNs) control has been successfully applied to the optimal control and has become a popular method of solving the HJB problem [30].

Essentially, RL evaluates feedback signals from the environment for appropriate benefits. The AC-NNs-based RL algorithm integrates value-based (such as Q-learning) and activity-based (such as policy gradients) reinforcement learning algorithms to produce improved effects. It is generally composed of two neural networks: the actor NN and the critic NN. The critic NN is used to evaluate the current system performance and guide the next phase of the actor NN operation to improve performance. By modifying the critic NN, the actor NN generates output to control the system. For the spring-mass-damper trajectory tracking problem, [31] designed a self-learning optimal control strategy based on adaptive AC-NNs. In [32], an optimized control technique based on a backstepping algorithm combined with an AC-NN strategy was proposed by implementing tracking control for a class of strict-feedback systems. Pane et al. [33] and Tang et al. [34] provided the novel controller by means of an AC-NN scheme to solve the trajectory tracking problem for a nonlinear time-varying system. AC-NNs have also been extended to the field of trajectory tracking control of marine vehicles including USVs and underwater autonomous vehicles [35], [36]. On the basis of RL, Yin et al. [37] and Guo et al. [38] proposed an optimal control algorithm which can obtain the optimized control policy, and [39] presented a constant avoidance angle algorithm that makes the USV retain an avoidance angle. However, it must be noted that the output constraint problem was not considered in any of the above-mentioned works.

The output constraints problem is another issue worth considering in USV trajectory tracking control, which is necessary for safety during practical navigation [40], [41], [42]. A USV cannot act arbitrarily during navigation, especially when passing through the narrow gaps in the water, owing to the impact of hitting rocks, riverbank collision, and other factors. In fact, some tracking control algorithms deal with the output constraints problem by uniting a barrier Lyapunov function [43], [44], [45] with an error conversion function[46], prescribed performance[47], or other schemes. Similarly, the RL control method based on the AC-NN structure can also achieve this boundedness by combining with the output constraints technique [48], [49]. Therefore, this study is committed to developing a control algorithm based on an AC-NN structure, which is applied to address the USV trajectory tracking control problem considering output constraints, model uncertainties, and environmental disturbances. First, an error transformation technique is employed to ensure the tracking errors remain within the constraint boundaries. Next, a new critic function is introduced instead of the long-term cost function and the critic NN to supervise the system performance as well as tune the weight of the AC-NNs. Further, the actor NN is designed to approximate the system unknown function. Finally, the Lyapunov stability theory is applied to analyze the boundary of system signals.

The main contributions of this study can be summarized as follows:

  • (1)

    Unlike the traditional RL algorithm, a novel critic function comprised of the primary critic signal and a second critic signal is employed instead of the long-term cost function to supervise the tracking performance and tune the weights of the actor NN.

  • (2)

    An error transformation technique is presented to ensure the tracking errors remain within the constraint boundaries.

  • (3)

    The RL algorithm based on an AC-NN structure, which does not rely on the system model and does not need off-line training, is used to approximate and compensate the system unknown nonlinear dynamics and disturbances so as to improve the system performance.

The rest of this paper is organized as follows. The mathematical model of USV, the introduction of radial basic function NN and some useful assumptions or lemmas are presented in Section 2. Section 3 introduces the detailed design process. The numerical simulation results are given in Section 4. Section 5 concludes the paper.

Section snippets

Underactuated surface vehicle modeling

The USV frame is shown in Fig. 1. In this paper, the mathematical model of a USV subjected to environmental disturbances is described as follows:{x˙=ucosψvsinψy˙=usinψ+vcosψψ˙=ru˙=fu+guτu+duv˙=fv+dvr˙=fr+grτr+drwhere{fu=m22m11vrd11m11ufv=m11m22urd22m22vfr=m11m22m33uvd33m33rgu=1/m11gr=1/m33(x, y, ψ) defines the position coordinate and the orientation of the vehicle in the inertial reference frame(IRF), (u, v, r) represents the surge velocity, sway velocity, and yaw velocity, respectively, τ

Controller design

In this section, the error transformation technique is used first, to ensure that the tracking errors always stay in the predefined boundaries. Then, an actor-NN is constructed to approximate the model uncertainty by estimating the unknown constinuous nonlinear functions fi(x)(i=u,v,r). Following this, the critic-NN and a new critic function are created to supervise the system performance and modify the control action. Finally, the control inputs are designed based on the AC-NNs method to

Simulation results

In this section, some numerical simulation results are presented to verify the effectiveness of the proposed AC-NNs control method. We set the ideal USV trajectory as follows:{x˙d=udcosψdy˙d=udsinψdψ˙d=rdu˙d=2e1tv˙d=0r˙d=0.06e1t

Referring to [38], the system parameters were chosen asm11=1.96,m22=2.4,m33=0.043;d11=0.7225+1.3274|u|+5.8664v2d22=5.8612+36.2823|v|+8.05|r|d33=1.90000.0800|v|+0.750|r|

The main contribution of this paper is to propose a control algorithm based on AC-NNs with output

Conclusion

This paper proposes an adaptive tracking control method base on AC-NNs structure for USVs in the presence of unknown model nonlinearities, environmental disturbances and output constraints. By using error transformation technology to handle the error constraint problem, the controller based on AC-NNs ensures that the USV can accurately track the reference trajectory. The stability of the system is proved by Lyapunov stability theory, and the effectiveness of the controller is finally verified

CRediT authorship contribution statement

Zewei Zheng: Conceptualization, Investigation, Writing - review & editing. Linping Ruan: Methodology, Software, Validation, Writing - original draft, Writing - review & editing. Ming Zhu: Investigation, Resources. Xiao Guo: Project administration, Writing - review & editing.

Declaration of Competing Interest

None.

Acknowledgment

This work was supported by the Beijing Natural Science Foundation (No.4202038), the National Key R&D Program of China (No. 2018YFC1506401), the National Natural Science Foundation of China (No.61827901).

Zewei Zheng received the B.S. degree in automatic control from the Beijing Institute of Technology, Beijing, China, in 2006 and the Ph.D. degree in control theory and control engineering from Beihang University, Beijing, in 2012. He worked as a Post-Doctoral Fellow and an Assistant Professor at Beihang University from2012 to 2019. From 2016 to 2017, he was an academic visitor at Nanyang Technological University, Singapore. He is currently an Associate Professor at the School of Automation

References (52)

  • J. Sun et al.

    Disturbance observer-based robust missile autopilot design with full-state constraints via adaptive dynamic programming

    J. Franklin Inst.

    (2018)
  • I. Carlucho et al.

    Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning

    Rob. Auton. Syst.

    (2018)
  • Z. Yin et al.

    Control design of a marine vessel system using reinforcement learning

    Neurocomputing

    (2018)
  • X. Jin

    Fault-tolerant iterative learning control for mobile robots non-repetitive trajectory tracking with output constraints

    Automatica

    (2018)
  • J.H. Li et al.

    Point-to-point navigation of underactuated ships

    Automatica

    (2008)
  • C. Lin

    H reinforcement learning control of robot manipulators using fuzzy wavelet networks

    Fuzzy Sets Syst.

    (2009)
  • Y. Luo et al.

    Adaptive critic design-based robust neural network control for nonlinear distributed parameter systems with unknown dynamics

    Neurocomputing

    (2015)
  • X. Xiang et al.

    Survey on fuzzy-logic-based guidance and control of marine surface vehicles and underwater vehicles

    Int. J. Fuzzy Syst.

    (2018)
  • D. Alejandro et al.

    Trajectory tracking passivity-based control for marine vehicles subject to disturbances

    J. Franklin Inst. Eng. Appl. Math.

    (2017)
  • A.A. Pedro et al.

    Trajectory-tracking and path-following of underactuated autonomous vehicles with parametric modeling uncertainty

    IEEE Trans. Automat. Contr.

    (2007)
  • Z. Zheng et al.

    Error-constrained los path following of a surface vessel with actuator saturation and faults

    IEEE Trans. Syst. Man Cybern. Syst.

    (2018)
  • Z. Zheng et al.

    Finite-time path following control for a stratospheric airship with input saturation and error constraint

    Int. J. Control

    (2019)
  • K.D.Do

    Global robust adaptive path-tracking control of underactuated ships under stochastic disturbances

    Ocean Eng.

    (2016)
  • Z. Sun et al.

    Robust adaptive trajectory tracking control of underactuated surface vessel in fields of marine practice

    J. Mar. Sci. Technol.

    (2018)
  • N. Wang et al.

    Direct adaptive fuzzy tracking control of marine vehicles with fully unknown parametric dynamics and uncertainties

    IEEE Trans. Control Syst. Technol.

    (2016)
  • J.-H. Li

    Path tracking of underactuated ships with general form of dynamics

    Int. J. Control

    (2016)
  • Cited by (47)

    View all citing articles on Scopus

    Zewei Zheng received the B.S. degree in automatic control from the Beijing Institute of Technology, Beijing, China, in 2006 and the Ph.D. degree in control theory and control engineering from Beihang University, Beijing, in 2012. He worked as a Post-Doctoral Fellow and an Assistant Professor at Beihang University from2012 to 2019. From 2016 to 2017, he was an academic visitor at Nanyang Technological University, Singapore. He is currently an Associate Professor at the School of Automation Science and Electrical Engineering, Beihang University. His current research interests include nonlinear control system, motion control, and flight control.

    Linping Ruan received the B.Eng. degree from Dalian Maritime University, Dalian, China, in 2017. She is currently pursuing the M.Sc. degree in Beihang University. Her current research interests include nonlinear control system and motion control.

    Ming Zhu received the M.S. and Ph.D. degrees in aircraft design from Beihang University (BUAA), Beijing, China, in 1998 and 2006, respectively. He worked as a postdoctoral fellow at BUAA from 2006 to 2007. Currently, he is a Professor at the Institute of Unmanned System, BUAA. His research interests include design and control of unmanned vehicles.

    Xiao Guo received the B.S. degree in aerocraft design from the Beihang University, Beijing, China, in 2009 and the Ph.D. degree in aerocraft desgin engineering from Beihang University, Beijing, in 2013. He was a Post-Doctoral Fellow with Beihang University from 2013 to 2018. He is currently an Assistant Professor with the Research Institute of Frontier Science, Beihang University. His current research interests include Reforcement learning, machine learning, and flight control.

    View full text