Adaptive cruise control via adaptive dynamic programming with experience replay

Wang, Bin; Zhao, Dongbin; Cheng, Jin

doi:10.1007/s00500-018-3063-7

Adaptive cruise control via adaptive dynamic programming with experience replay

Methodologies and Application
Published: 13 February 2018

Volume 23, pages 4131–4144, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

906 Accesses
10 Citations
Explore all metrics

Abstract

The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay technology is proposed to design the ACC controller. Experience replay increases the data efficiency by recording the available driving data and repeatedly presenting them to the learning procedure of the acceleration controller in the ACC system. The learning framework that combines ADP with experience replay is described in detail. The distinguishing feature of the algorithm is that when estimating parameters of the critic network and the actor network with gradient rules, the gradients of historical data and current data are used to update parameters concurrently. It is proved with Lyapunov theory that the weight estimation errors of the actor network and the critic network are uniformly ultimately bounded under the novel weight update rules. The learning performance of the ACC controller implemented by this ADP algorithm is clearly demonstrated that experience replay can increase data efficiency significantly, and the approximate optimality and adaptability of the learned control policy are tested with typical driving scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomous Learning in a Pseudo-Episodic Physical Environment

Article Open access 08 February 2022

Kevin P. T. Haughn & Daniel J. Inman

Reinforcement Learning for Distributed Control and Multi-player Games

Reinforcement Learning

References

Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern C 42(2):201–212
Article Google Scholar
Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators, vol 39. CRC Press, Boca Raton
MATH Google Scholar
Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. In: 2010 49th IEEE conference decision and control (CDC), pp 3674–3679
Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5):851–860
Article MATH Google Scholar
dSPACE (2015) dSPACE ASM vehicle dynamics. http://www.dspace.com/en/inc/home/products/sw/automotive_simulation_models/produkte_asm/vehicle_dynamics_models.cfm. Accessed Oct 28, 2015
Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C 42(6):1291–1307
Article Google Scholar
Guvenc B, Kural E (2006) Adaptive cruise control simulator: a low-cost, multiple-driver-in-the-loop simulator. IEEE Control Syst Mag 26(3):42–55
Article Google Scholar
Kang F, Han S, Salgado R, Li J (2015) System probabilistic stability analysis of soil slopes using gaussian process regression with latin hypercube sampling. Comput Geotech 63:13–25
Article Google Scholar
Kang F, Xu Q, Li J (2016) Slope reliability analysis using surrogate models via new support vector machines with swarm intelligence. Appl Math Model 40(11):6105–6120
Article MathSciNet Google Scholar
Kiumarsi B, Lewis FL (2015) Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(1):140–151
Article MathSciNet Google Scholar
Kyongsu Y, Ilki M (2004) A driver-adaptive stop-and-go cruise control strategy. In: 2004 IEEE International Conference on Network Sensor Control, pp 601–606
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321
Google Scholar
Liu D, Wang D, Zhao D, Wei Q, Jin N (2012a) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
Article Google Scholar
Liu F, Sun J, Si J, Guo W, Mei S (2012b) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235
Article MATH Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Article MathSciNet MATH Google Scholar
Moon S, Moon I, Yi K (2009) Design, tuning, and evaluation of a full-range adaptive cruise control system with collision avoidance. Control Eng Pract 17(4):442–455
Article Google Scholar
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Department of Engineering, University of Cambridge
Shakouri P, Ordys A (2014) Nonlinear model predictive control approach in design of adaptive cruise control with automated switching to cruise control. Control Eng Pract 26:160–177
Article Google Scholar
Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge
MATH Google Scholar
Tapani A (2012) Vehicle trajectory effects of adaptive cruise control. J Intell Transp Syst 16(1):36–44
Article Google Scholar
Tsai CC, Hsieh SM, Chen CT (2010) Fuzzy longitudinal controller design and experimentation for adaptive cruise control and stop&go. J Intell Robot Syst 59(2):167–189
Article MATH Google Scholar
Wang B, Zhao D, Li C, Dai Y (2015) Design and implementation of an adaptive cruise control system based on supervised actor-critic learning. In: 2015 5th international conference on information science technology (ICIST), pp 243–248
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
MATH Google Scholar
Wawrzyński P, Tanwani AK (2013) Autonomous reinforcement learning with experience replay. Neural Netw 41:156–167
Article MATH Google Scholar
Xiao L, Gao F (2011) Practical string stability of platoon of adaptive cruise control vehicles. IEEE Trans Intell Transp Syst 12:1184–1194
Article Google Scholar
Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55:30–41
Article MATH Google Scholar
Zhang Q, Zhao D, Zhu Y (2016) Event-triggered H$\infty $ control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst Man Cybern Syst 47(7):1071–1081
Article Google Scholar
Zhao D, Bai X, Wang F, Xu J, Yu W (2011) DHP for coordinated freeway ramp metering. IEEE Trans Intell Transp Syst 12(4):990–999
Article Google Scholar
Zhao D, Zhang Z, Dai Y (2012) Self-teaching adaptive dynamic programming for go-moku. Neurocomputing 78(1):23–29
Article Google Scholar
Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099
Article Google Scholar
Zhao D, Xia Z, Wang D (2015a) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468
Article Google Scholar
Zhao D, Zhang Q, Wang D, Zhu Y (2015b) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865
Article Google Scholar

Download references

Acknowledgements

This work was supported partly by National Natural Science Foundation of China (Nos. 61603150, 61273136, 61573353 and 61533017), the National Key Research and Development Plan (No. 2016YFB0101000), and Doctoral Foundation of University of Jinan (No. XBS1605).

Author information

Authors and Affiliations

School of Electrical Engineering, University of Jinan, Jinan, 250022, China
Bin Wang & Jin Cheng
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Bin Wang & Dongbin Zhao

Authors

Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongbin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jin Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Appendices

A Proof of Lemma 1

The first difference of $L_1(t)$ is

$$\begin{aligned} \Delta L_1(t)=\,&\frac{\lambda _1}{\beta }tr\left[ \tilde{w}^T_\mathrm{c}(t+1) \tilde{w}_\mathrm{c}(t+1)-\tilde{w}^T_\mathrm{c}(t)\tilde{w}_\mathrm{c}(t)\right] \nonumber \\ =\,&\frac{\lambda _1}{\beta }tr\left[ (\tilde{w}_\mathrm{c}(t) \right. +\Delta \hat{w}_\mathrm{c}(t))^T(\tilde{w}_\mathrm{c}(t)+\Delta \hat{w}_\mathrm{c}(t))\nonumber \\&-\,\left. \tilde{w}^T_\mathrm{c}(t)\tilde{w}_\mathrm{c}(t)\right] \end{aligned}$$

(25)

with the basic property of matrix $tr(AB)=tr(BA)$, then

$$\begin{aligned} \Delta L_1(t)=\frac{\lambda _1}{\beta }tr\left[ 2\tilde{w}^T_\mathrm{c}(t)\Delta \hat{w}_\mathrm{c}(t)+\Delta \hat{w}^T_\mathrm{c}(t)\Delta \hat{w}_\mathrm{c}(t)\right] \end{aligned}$$

(26)

Substituting (9) into (26), we obtain

$$\begin{aligned} \Delta L_1(t)=\,&\lambda _1tr\left[ -2\gamma \tilde{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)e_\mathrm{c}^T(t)\right. \nonumber \\&-\,2\gamma \tilde{w}_\mathrm{c}^T(t) \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t) \nonumber \\&+\,\beta \gamma ^2\left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t)+\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t)\right) ^T \nonumber \\&\times \left. \left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t)+\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t)\right) \right] \nonumber \\ =&\lambda _1tr\left[ \mathfrak {R}_1(t)+\mathfrak {R}_2(t)\right] \end{aligned}$$

(27)

where

$$\begin{aligned} \mathfrak {R}_1(t)=\,&-2\gamma \tilde{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)e_\mathrm{c}^T(t)-2\gamma \tilde{w}_\mathrm{c}^T(t) \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\\ \mathfrak {R}_2(t)=&\beta \gamma ^2\left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t) +\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\right) ^T\\&\times \left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t)+\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\right) \end{aligned}$$

with (7) we have

$$\begin{aligned} e_\mathrm{c}(t)=\,&\gamma \hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)+r(t)-\hat{w}_\mathrm{c}^T(t-1)\phi _\mathrm{c}(t-1)\nonumber \\ =&\gamma \xi _\mathrm{c}(t)+\gamma w_\mathrm{c}^T(t)\phi _\mathrm{c}(t)+r(t)-\hat{w}_\mathrm{c}^T(t-1)\phi _\mathrm{c}(t-1) \end{aligned}$$

(28)

For simplicity, we remark

$$\begin{aligned} P(t)=\gamma w^{T}_\mathrm{c}\phi _\mathrm{c}(t)+r(t)-\hat{w}^T_\mathrm{c}(t-1)\phi _\mathrm{c}(t-1) \end{aligned}$$

(29)

then

$$\begin{aligned} \mathfrak {R}_1(t)=\,&-\,2\gamma \xi _\mathrm{c}(t)\left( \gamma \xi _\mathrm{c}(t)+P(t)\right) ^T\\&-\,2\gamma \tilde{w}_\mathrm{c}^T(t)\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t) \end{aligned}$$

$$\begin{aligned} \mathfrak {R}_2(t)=\,&\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2 \left( \gamma \xi _\mathrm{c}(t)+P(t)\right) \left( \gamma \xi _\mathrm{c}(t)+P(t)\right) ^T\\&+\,2\beta \gamma ^2\phi _\mathrm{c}(t)(\gamma \xi _\mathrm{c}(t)+P(t))^T\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t)\\&+\,\beta \gamma ^2\bigg \Vert \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2 \end{aligned}$$

Substituting $\mathfrak {R}_1(t)$ and $\mathfrak {R}_2(t)$ into (27), we obtain

$$\begin{aligned} \Delta L_1(t)=\,&-\lambda _1\gamma ^2\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \Vert \xi _\mathrm{c}(t)\Vert ^2\nonumber \\&-\,\lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \Vert \gamma \xi _\mathrm{c}(t)+P(t)\Vert ^2\nonumber \\&-\,\lambda _1\gamma ^2\Vert \tilde{w}_\mathrm{c}(t)\Vert ^2-\lambda _1 \left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \nonumber \\&\times \bigg \Vert \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2 -\lambda _1\beta \gamma ^2\Bigg \Vert \gamma \xi _\mathrm{c}(t)\phi _\mathrm{c}^T(t)\nonumber \\&-\,\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\Bigg \Vert ^2 +\lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \nonumber \\&\times \Vert P(t)\Vert ^2 +\lambda _1\bigg \Vert \gamma \tilde{w}_\mathrm{c}^T(t) -\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t) \bigg \Vert ^2\nonumber \\&+\,\lambda \beta \gamma ^2\bigg \Vert P(t)\phi _\mathrm{c}^T(t)-\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t) \bigg \Vert ^2 \end{aligned}$$

(30)

Applying the Cauchy–Schwarz inequality $\Vert x+y\Vert ^2\le 2\Vert x\Vert ^2+2\Vert y\Vert ^2$, we can derive (20).

The proof is completed.

B Proof of Lemma 2

The first difference of $L_2(t)$ is

$$\begin{aligned} \Delta L_2(t)=&\frac{\lambda _2\gamma ^2}{\alpha }tr \left[ \tilde{w}^T_\mathrm{a}(t+1) \tilde{w}_\mathrm{a}(t+1)-\tilde{w}^T_\mathrm{a}(t)\tilde{w}_\mathrm{a}(t)\right] \nonumber \\ =&\frac{\lambda _2\gamma ^2}{\alpha }tr\left[ (\tilde{w}_\mathrm{a}(t) +\Delta \hat{w}_\mathrm{a}(t))^T(\tilde{w}_\mathrm{a}(t)+\Delta \hat{w}_\mathrm{a}(t))\right. \nonumber \\&-\,\left. \tilde{w}^T_\mathrm{a}(t)\tilde{w}_\mathrm{a}(t)\right] \nonumber \\ =&\frac{\lambda _2\gamma ^2}{\alpha }tr\left[ 2\tilde{w}^T_\mathrm{a}(t) \Delta \hat{w}_\mathrm{a}(t)+\Delta \hat{w}^{T}_\mathrm{a}(t)\Delta \hat{w}_\mathrm{a}(t)\right] \end{aligned}$$

(31)

Substituting (15) into (31), we have

$$\begin{aligned} \Delta L_2(t)=&\lambda _2\gamma ^2\left[ \bigg \Vert \xi _\mathrm{a}(t)C(t) -\hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\bigg \Vert ^2\right. \nonumber \\&-\,\Vert \hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\Vert ^2-\Vert C(t)\Vert ^2\Vert \xi _\mathrm{a}(t)\Vert ^2+\Vert \tilde{w}_\mathrm{a}(t)\Vert ^2\nonumber \\&-\,\bigg \Vert \tilde{w}_\mathrm{a}^T(t)+\sum _{j=1}^{l} \phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\nonumber \\&+\,\bigg \Vert \sum _{j=1}^{l}\phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\nonumber \\&+\,\left. \alpha \bigg \Vert \phi _\mathrm{a}(t)C(t)e_\mathrm{a}^T(t){+}\sum _{j=1}^{l} \phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\right] \end{aligned}$$

(32)

By Cauchy–Schwarz inequality, we can derive (22).

The proof is completed.

C Proof of Theorem 1

Consider the following Lyapunov function candidate

$$\begin{aligned} L(t)=L_1(t)+L_2(t) \end{aligned}$$

where

$$\begin{aligned} L_1(t)=&\frac{\lambda _1}{\beta }tr[\tilde{w}^T_\mathrm{c}(t)\tilde{w}_\mathrm{c}(t)]\\ L_2(t)=&\frac{\lambda _2}{\alpha }tr[\tilde{w}^T_\mathrm{a}(t)\tilde{w}_\mathrm{a}(t)] \end{aligned}$$

The first difference of L(t) is

$$\begin{aligned} \Delta L(t)=&L(t+1)-L(t)=\Delta L_1(t)+\Delta L_2(t) \end{aligned}$$

(33)

By employing Lemma 1 and Lemma 2, we obtain

$$\begin{aligned} \Delta L(t) \le&-\gamma ^2\left[ \lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) -2\lambda _2\right] \Vert \xi _\mathrm{c}(t)\Vert ^2\nonumber \\&-\,\lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \bigg \Vert \gamma \xi ^T_\mathrm{c}(t)+\gamma w^{T}_\mathrm{c}(t)\phi _\mathrm{c}(t)\nonumber \\&+\,r(t)-\hat{w}^T_\mathrm{c}(t-1)\phi _\mathrm{c}(t-1)\bigg \Vert ^2\nonumber \\&-\,\lambda _1\beta \gamma ^2\bigg \Vert \gamma \phi _\mathrm{c}(t)\xi _\mathrm{c}^T(t) -\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2\nonumber \\&-\,\lambda _2\gamma ^2\Vert C(t)\Vert ^2\Vert \xi _\mathrm{a}(t)\Vert ^2\nonumber \\&-\,\lambda _2\gamma ^2\bigg \Vert \tilde{w}_\mathrm{a}^T(t)+\sum _{j=1}^{l} \phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\nonumber \\&-\,2\lambda _2\gamma ^2\left( 1-\alpha \Vert C(t)\Vert ^2\Vert \phi _\mathrm{a}(t)\Vert ^2\right) \bigg \Vert \hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\bigg \Vert ^2\nonumber \\&+\,\mathfrak {L}^2(t) \end{aligned}$$

(34)

where

$$\begin{aligned} \mathfrak {L}^2(t)=&\, \lambda _1\left( 1+\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \bigg \Vert \gamma w^{T}_\mathrm{c}(t)\phi _\mathrm{c}(t)+r(t)\nonumber \\&-\,\hat{w}^T_\mathrm{c}(t-1)\phi _\mathrm{c}(t-1)\bigg \Vert ^2\nonumber \\&+\,\lambda _1\left( 1+3\beta \gamma ^2\right) \bigg \Vert \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2\nonumber \\&+\,\lambda _1\gamma ^2\Vert \tilde{w}_\mathrm{c}(t)\Vert ^2 +\lambda _2\gamma ^2\Vert \tilde{w}_\mathrm{a}(t)\Vert ^2\nonumber \\&+\,2\lambda _2\gamma ^2\Vert w_\mathrm{c}^T\phi _\mathrm{c}(t)\Vert ^2\nonumber \\&+\,\lambda _2\gamma ^2\bigg \Vert \xi _\mathrm{a}(t)C(t)-\hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\bigg \Vert ^2\nonumber \\&+\,\lambda _2\gamma ^2(1+2\alpha )\bigg \Vert \sum _{j=1}^{l}\phi _{a_j} (t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2 \end{aligned}$$

(35)

With Assumption 1 and Cauchy–Schwarz inequality, we derive from (35)

$$\begin{aligned} \mathfrak {L}^2(t)\le&\, 3\lambda _1\left[ 2l^2+\beta \gamma ^2(1+6l^2 +\gamma ^2)\right] w_\mathrm{cm}^2\phi _\mathrm{cm}^4\nonumber \\&+\,\left[ 3\lambda _1+\gamma ^2(3\lambda _1+4\lambda _2)\right. \nonumber \\&+\,\left. \lambda _2\gamma ^{2}l^2(1+2\alpha )\phi _\mathrm{am}^2C_m^2\right] w_\mathrm{cm}^2\phi _\mathrm{cm}^2\nonumber \\&+\,3\lambda _1\left[ 1+l^2+\beta \gamma ^2(\phi _\mathrm{cm}^2+9l^2)\right] r_m^2 +\lambda _1\gamma ^2w_\mathrm{cm}^2\nonumber \\&+\,\gamma ^2(\lambda _1+2\lambda _2C_m^2\phi _\mathrm{am}^2)w_\mathrm{am}^2\triangleq \mathfrak {L}_m^2 \end{aligned}$$

(36)

where $r_m$ and $C_m$ are the upper bound of $\Vert r(t)\Vert $ and $\Vert C(t)\Vert $, respectively, that is $\Vert r\Vert \le r_m$, $\Vert C(t)\Vert \le C_m$.

Select the parameters to satisfy that

$$\begin{aligned} 0<\alpha \Vert C(t)\Vert ^2\Vert \phi _\mathrm{a}(t)\Vert ^2<1,\quad 0<\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2<1 \end{aligned}$$

(37)

and the parameters $\lambda _i>0\ (i=1,2)$ are chosen as

$$\begin{aligned} \frac{\lambda _1}{\lambda _2}>\frac{2}{1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2} \end{aligned}$$

(38)

Then, if (37) and (38) hold, for any

$$\begin{aligned} \Vert \xi _\mathrm{c}(t)\Vert >&\sqrt{\frac{\mathfrak {L}_m^2}{\gamma ^2\left[ \lambda _1 \left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) -2\lambda _2\right] }} \end{aligned}$$

(39)

$$\begin{aligned} \Vert \xi _\mathrm{a}(t)\Vert >&\sqrt{\frac{\mathfrak {L}_m^2}{\lambda _2\gamma ^2\Vert C(t)\Vert ^2}} \end{aligned}$$

(40)

the first difference $\Delta L(t)<0$ holds.

Note that $\Vert \xi _\mathrm{c}(t)\Vert \le \Vert \tilde{w}_\mathrm{c}(t)\Vert \Vert \phi _\mathrm{cm}\Vert $ and $\Vert \xi _\mathrm{a}(t)\Vert \le \Vert \tilde{w}_\mathrm{a}(t)\Vert \Vert \phi _\mathrm{am}\Vert $, then by (39) and (40), we have

$$\begin{aligned} \Vert \tilde{w}_\mathrm{c}(t)\Vert>&\frac{1}{\phi _\mathrm{cm}}\sqrt{\frac{\mathfrak {L}_m^2}{\gamma ^2\left[ \lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) -2\lambda _2\right] }}\triangleq \mathfrak {B}_\mathrm{c}\\ \Vert \tilde{w}_\mathrm{a}(t)\Vert >&\frac{1}{\phi _\mathrm{am}} \sqrt{\frac{\mathfrak {L}_m^2}{\lambda _2\gamma ^2\Vert C(t)\Vert ^2}}\triangleq \mathfrak {B}_\mathrm{a} \end{aligned}$$

With the standard Lyapunov extension theorem, we can conclude that the weight estimation errors of the critic network $\tilde{w}_\mathrm{c}(t)$ and the actor network $\tilde{w}_\mathrm{a}(t)$ are UUB by positive constants $\mathfrak {B}_\mathrm{c}$ and $\mathfrak {B}_\mathrm{a}$.

The proof is completed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, B., Zhao, D. & Cheng, J. Adaptive cruise control via adaptive dynamic programming with experience replay. Soft Comput 23, 4131–4144 (2019). https://doi.org/10.1007/s00500-018-3063-7

Download citation

Published: 13 February 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00500-018-3063-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive cruise control via adaptive dynamic programming with experience replay

Abstract

Access this article

Similar content being viewed by others

Autonomous Learning in a Pseudo-Episodic Physical Environment

Reinforcement Learning for Distributed Control and Multi-player Games

Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendices

Appendices

A Proof of Lemma 1

B Proof of Lemma 2

C Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive cruise control via adaptive dynamic programming with experience replay

Abstract

Access this article

Similar content being viewed by others

Autonomous Learning in a Pseudo-Episodic Physical Environment

Reinforcement Learning for Distributed Control and Multi-player Games

Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendices

Appendices

A Proof of Lemma 1

B Proof of Lemma 2

C Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation