Skip to main content
Log in

Adaptive cruise control via adaptive dynamic programming with experience replay

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay technology is proposed to design the ACC controller. Experience replay increases the data efficiency by recording the available driving data and repeatedly presenting them to the learning procedure of the acceleration controller in the ACC system. The learning framework that combines ADP with experience replay is described in detail. The distinguishing feature of the algorithm is that when estimating parameters of the critic network and the actor network with gradient rules, the gradients of historical data and current data are used to update parameters concurrently. It is proved with Lyapunov theory that the weight estimation errors of the actor network and the critic network are uniformly ultimately bounded under the novel weight update rules. The learning performance of the ACC controller implemented by this ADP algorithm is clearly demonstrated that experience replay can increase data efficiency significantly, and the approximate optimality and adaptability of the learned control policy are tested with typical driving scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern C 42(2):201–212

    Article  Google Scholar 

  • Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators, vol 39. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. In: 2010 49th IEEE conference decision and control (CDC), pp 3674–3679

  • Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5):851–860

    Article  MATH  Google Scholar 

  • dSPACE (2015) dSPACE ASM vehicle dynamics. http://www.dspace.com/en/inc/home/products/sw/automotive_simulation_models/produkte_asm/vehicle_dynamics_models.cfm. Accessed Oct 28, 2015

  • Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C 42(6):1291–1307

    Article  Google Scholar 

  • Guvenc B, Kural E (2006) Adaptive cruise control simulator: a low-cost, multiple-driver-in-the-loop simulator. IEEE Control Syst Mag 26(3):42–55

    Article  Google Scholar 

  • Kang F, Han S, Salgado R, Li J (2015) System probabilistic stability analysis of soil slopes using gaussian process regression with latin hypercube sampling. Comput Geotech 63:13–25

    Article  Google Scholar 

  • Kang F, Xu Q, Li J (2016) Slope reliability analysis using surrogate models via new support vector machines with swarm intelligence. Appl Math Model 40(11):6105–6120

    Article  MathSciNet  Google Scholar 

  • Kiumarsi B, Lewis FL (2015) Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(1):140–151

    Article  MathSciNet  Google Scholar 

  • Kyongsu Y, Ilki M (2004) A driver-adaptive stop-and-go cruise control strategy. In: 2004 IEEE International Conference on Network Sensor Control, pp 601–606

  • Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321

    Google Scholar 

  • Liu D, Wang D, Zhao D, Wei Q, Jin N (2012a) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634

    Article  Google Scholar 

  • Liu F, Sun J, Si J, Guo W, Mei S (2012b) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235

    Article  MATH  Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  • Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202

    Article  MathSciNet  MATH  Google Scholar 

  • Moon S, Moon I, Yi K (2009) Design, tuning, and evaluation of a full-range adaptive cruise control system with collision avoidance. Control Eng Pract 17(4):442–455

    Article  Google Scholar 

  • Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Department of Engineering, University of Cambridge

  • Shakouri P, Ordys A (2014) Nonlinear model predictive control approach in design of adaptive cruise control with automated switching to cruise control. Control Eng Pract 26:160–177

    Article  Google Scholar 

  • Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge

    MATH  Google Scholar 

  • Tapani A (2012) Vehicle trajectory effects of adaptive cruise control. J Intell Transp Syst 16(1):36–44

    Article  Google Scholar 

  • Tsai CC, Hsieh SM, Chen CT (2010) Fuzzy longitudinal controller design and experimentation for adaptive cruise control and stop&go. J Intell Robot Syst 59(2):167–189

    Article  MATH  Google Scholar 

  • Wang B, Zhao D, Li C, Dai Y (2015) Design and implementation of an adaptive cruise control system based on supervised actor-critic learning. In: 2015 5th international conference on information science technology (ICIST), pp 243–248

  • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    MATH  Google Scholar 

  • Wawrzyński P, Tanwani AK (2013) Autonomous reinforcement learning with experience replay. Neural Netw 41:156–167

    Article  MATH  Google Scholar 

  • Xiao L, Gao F (2011) Practical string stability of platoon of adaptive cruise control vehicles. IEEE Trans Intell Transp Syst 12:1184–1194

    Article  Google Scholar 

  • Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55:30–41

    Article  MATH  Google Scholar 

  • Zhang Q, Zhao D, Zhu Y (2016) Event-triggered H\(\infty \) control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst Man Cybern Syst 47(7):1071–1081

    Article  Google Scholar 

  • Zhao D, Bai X, Wang F, Xu J, Yu W (2011) DHP for coordinated freeway ramp metering. IEEE Trans Intell Transp Syst 12(4):990–999

    Article  Google Scholar 

  • Zhao D, Zhang Z, Dai Y (2012) Self-teaching adaptive dynamic programming for go-moku. Neurocomputing 78(1):23–29

    Article  Google Scholar 

  • Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099

    Article  Google Scholar 

  • Zhao D, Xia Z, Wang D (2015a) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468

    Article  Google Scholar 

  • Zhao D, Zhang Q, Wang D, Zhu Y (2015b) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported partly by National Natural Science Foundation of China (Nos. 61603150, 61273136, 61573353 and 61533017), the National Key Research and Development Plan (No. 2016YFB0101000), and Doctoral Foundation of University of Jinan (No. XBS1605).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Appendices

Appendices

A Proof of Lemma 1

The first difference of \(L_1(t)\) is

$$\begin{aligned} \Delta L_1(t)=\,&\frac{\lambda _1}{\beta }tr\left[ \tilde{w}^T_\mathrm{c}(t+1) \tilde{w}_\mathrm{c}(t+1)-\tilde{w}^T_\mathrm{c}(t)\tilde{w}_\mathrm{c}(t)\right] \nonumber \\ =\,&\frac{\lambda _1}{\beta }tr\left[ (\tilde{w}_\mathrm{c}(t) \right. +\Delta \hat{w}_\mathrm{c}(t))^T(\tilde{w}_\mathrm{c}(t)+\Delta \hat{w}_\mathrm{c}(t))\nonumber \\&-\,\left. \tilde{w}^T_\mathrm{c}(t)\tilde{w}_\mathrm{c}(t)\right] \end{aligned}$$
(25)

with the basic property of matrix \(tr(AB)=tr(BA)\), then

$$\begin{aligned} \Delta L_1(t)=\frac{\lambda _1}{\beta }tr\left[ 2\tilde{w}^T_\mathrm{c}(t)\Delta \hat{w}_\mathrm{c}(t)+\Delta \hat{w}^T_\mathrm{c}(t)\Delta \hat{w}_\mathrm{c}(t)\right] \end{aligned}$$
(26)

Substituting (9) into (26), we obtain

$$\begin{aligned} \Delta L_1(t)=\,&\lambda _1tr\left[ -2\gamma \tilde{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)e_\mathrm{c}^T(t)\right. \nonumber \\&-\,2\gamma \tilde{w}_\mathrm{c}^T(t) \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t) \nonumber \\&+\,\beta \gamma ^2\left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t)+\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t)\right) ^T \nonumber \\&\times \left. \left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t)+\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t)\right) \right] \nonumber \\ =&\lambda _1tr\left[ \mathfrak {R}_1(t)+\mathfrak {R}_2(t)\right] \end{aligned}$$
(27)

where

$$\begin{aligned} \mathfrak {R}_1(t)=\,&-2\gamma \tilde{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)e_\mathrm{c}^T(t)-2\gamma \tilde{w}_\mathrm{c}^T(t) \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\\ \mathfrak {R}_2(t)=&\beta \gamma ^2\left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t) +\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\right) ^T\\&\times \left( \phi _\mathrm{c}(t)e_\mathrm{c}^T(t)+\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\right) \end{aligned}$$

with (7) we have

$$\begin{aligned} e_\mathrm{c}(t)=\,&\gamma \hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)+r(t)-\hat{w}_\mathrm{c}^T(t-1)\phi _\mathrm{c}(t-1)\nonumber \\ =&\gamma \xi _\mathrm{c}(t)+\gamma w_\mathrm{c}^T(t)\phi _\mathrm{c}(t)+r(t)-\hat{w}_\mathrm{c}^T(t-1)\phi _\mathrm{c}(t-1) \end{aligned}$$
(28)

For simplicity, we remark

$$\begin{aligned} P(t)=\gamma w^{T}_\mathrm{c}\phi _\mathrm{c}(t)+r(t)-\hat{w}^T_\mathrm{c}(t-1)\phi _\mathrm{c}(t-1) \end{aligned}$$
(29)

then

$$\begin{aligned} \mathfrak {R}_1(t)=\,&-\,2\gamma \xi _\mathrm{c}(t)\left( \gamma \xi _\mathrm{c}(t)+P(t)\right) ^T\\&-\,2\gamma \tilde{w}_\mathrm{c}^T(t)\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t) \end{aligned}$$
$$\begin{aligned} \mathfrak {R}_2(t)=\,&\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2 \left( \gamma \xi _\mathrm{c}(t)+P(t)\right) \left( \gamma \xi _\mathrm{c}(t)+P(t)\right) ^T\\&+\,2\beta \gamma ^2\phi _\mathrm{c}(t)(\gamma \xi _\mathrm{c}(t)+P(t))^T\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t)\\&+\,\beta \gamma ^2\bigg \Vert \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2 \end{aligned}$$

Substituting \(\mathfrak {R}_1(t)\) and \(\mathfrak {R}_2(t)\) into (27), we obtain

$$\begin{aligned} \Delta L_1(t)=\,&-\lambda _1\gamma ^2\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \Vert \xi _\mathrm{c}(t)\Vert ^2\nonumber \\&-\,\lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \Vert \gamma \xi _\mathrm{c}(t)+P(t)\Vert ^2\nonumber \\&-\,\lambda _1\gamma ^2\Vert \tilde{w}_\mathrm{c}(t)\Vert ^2-\lambda _1 \left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \nonumber \\&\times \bigg \Vert \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2 -\lambda _1\beta \gamma ^2\Bigg \Vert \gamma \xi _\mathrm{c}(t)\phi _\mathrm{c}^T(t)\nonumber \\&-\,\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\Bigg \Vert ^2 +\lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \nonumber \\&\times \Vert P(t)\Vert ^2 +\lambda _1\bigg \Vert \gamma \tilde{w}_\mathrm{c}^T(t) -\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t) \bigg \Vert ^2\nonumber \\&+\,\lambda \beta \gamma ^2\bigg \Vert P(t)\phi _\mathrm{c}^T(t)-\sum _{j=1}^{l} \phi _{c_j}(t)e_{c_j}^T(t) \bigg \Vert ^2 \end{aligned}$$
(30)

Applying the Cauchy–Schwarz inequality \(\Vert x+y\Vert ^2\le 2\Vert x\Vert ^2+2\Vert y\Vert ^2\), we can derive (20).

The proof is completed.

B Proof of Lemma 2

The first difference of \(L_2(t)\) is

$$\begin{aligned} \Delta L_2(t)=&\frac{\lambda _2\gamma ^2}{\alpha }tr \left[ \tilde{w}^T_\mathrm{a}(t+1) \tilde{w}_\mathrm{a}(t+1)-\tilde{w}^T_\mathrm{a}(t)\tilde{w}_\mathrm{a}(t)\right] \nonumber \\ =&\frac{\lambda _2\gamma ^2}{\alpha }tr\left[ (\tilde{w}_\mathrm{a}(t) +\Delta \hat{w}_\mathrm{a}(t))^T(\tilde{w}_\mathrm{a}(t)+\Delta \hat{w}_\mathrm{a}(t))\right. \nonumber \\&-\,\left. \tilde{w}^T_\mathrm{a}(t)\tilde{w}_\mathrm{a}(t)\right] \nonumber \\ =&\frac{\lambda _2\gamma ^2}{\alpha }tr\left[ 2\tilde{w}^T_\mathrm{a}(t) \Delta \hat{w}_\mathrm{a}(t)+\Delta \hat{w}^{T}_\mathrm{a}(t)\Delta \hat{w}_\mathrm{a}(t)\right] \end{aligned}$$
(31)

Substituting (15) into (31), we have

$$\begin{aligned} \Delta L_2(t)=&\lambda _2\gamma ^2\left[ \bigg \Vert \xi _\mathrm{a}(t)C(t) -\hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\bigg \Vert ^2\right. \nonumber \\&-\,\Vert \hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\Vert ^2-\Vert C(t)\Vert ^2\Vert \xi _\mathrm{a}(t)\Vert ^2+\Vert \tilde{w}_\mathrm{a}(t)\Vert ^2\nonumber \\&-\,\bigg \Vert \tilde{w}_\mathrm{a}^T(t)+\sum _{j=1}^{l} \phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\nonumber \\&+\,\bigg \Vert \sum _{j=1}^{l}\phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\nonumber \\&+\,\left. \alpha \bigg \Vert \phi _\mathrm{a}(t)C(t)e_\mathrm{a}^T(t){+}\sum _{j=1}^{l} \phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\right] \end{aligned}$$
(32)

By Cauchy–Schwarz inequality, we can derive (22).

The proof is completed.

C Proof of Theorem 1

Consider the following Lyapunov function candidate

$$\begin{aligned} L(t)=L_1(t)+L_2(t) \end{aligned}$$

where

$$\begin{aligned} L_1(t)=&\frac{\lambda _1}{\beta }tr[\tilde{w}^T_\mathrm{c}(t)\tilde{w}_\mathrm{c}(t)]\\ L_2(t)=&\frac{\lambda _2}{\alpha }tr[\tilde{w}^T_\mathrm{a}(t)\tilde{w}_\mathrm{a}(t)] \end{aligned}$$

The first difference of L(t) is

$$\begin{aligned} \Delta L(t)=&L(t+1)-L(t)=\Delta L_1(t)+\Delta L_2(t) \end{aligned}$$
(33)

By employing Lemma 1 and Lemma 2, we obtain

$$\begin{aligned} \Delta L(t) \le&-\gamma ^2\left[ \lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) -2\lambda _2\right] \Vert \xi _\mathrm{c}(t)\Vert ^2\nonumber \\&-\,\lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \bigg \Vert \gamma \xi ^T_\mathrm{c}(t)+\gamma w^{T}_\mathrm{c}(t)\phi _\mathrm{c}(t)\nonumber \\&+\,r(t)-\hat{w}^T_\mathrm{c}(t-1)\phi _\mathrm{c}(t-1)\bigg \Vert ^2\nonumber \\&-\,\lambda _1\beta \gamma ^2\bigg \Vert \gamma \phi _\mathrm{c}(t)\xi _\mathrm{c}^T(t) -\sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2\nonumber \\&-\,\lambda _2\gamma ^2\Vert C(t)\Vert ^2\Vert \xi _\mathrm{a}(t)\Vert ^2\nonumber \\&-\,\lambda _2\gamma ^2\bigg \Vert \tilde{w}_\mathrm{a}^T(t)+\sum _{j=1}^{l} \phi _{a_j}(t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2\nonumber \\&-\,2\lambda _2\gamma ^2\left( 1-\alpha \Vert C(t)\Vert ^2\Vert \phi _\mathrm{a}(t)\Vert ^2\right) \bigg \Vert \hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\bigg \Vert ^2\nonumber \\&+\,\mathfrak {L}^2(t) \end{aligned}$$
(34)

where

$$\begin{aligned} \mathfrak {L}^2(t)=&\, \lambda _1\left( 1+\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) \bigg \Vert \gamma w^{T}_\mathrm{c}(t)\phi _\mathrm{c}(t)+r(t)\nonumber \\&-\,\hat{w}^T_\mathrm{c}(t-1)\phi _\mathrm{c}(t-1)\bigg \Vert ^2\nonumber \\&+\,\lambda _1\left( 1+3\beta \gamma ^2\right) \bigg \Vert \sum _{j=1}^{l}\phi _{c_j}(t)e_{c_j}^T(t)\bigg \Vert ^2\nonumber \\&+\,\lambda _1\gamma ^2\Vert \tilde{w}_\mathrm{c}(t)\Vert ^2 +\lambda _2\gamma ^2\Vert \tilde{w}_\mathrm{a}(t)\Vert ^2\nonumber \\&+\,2\lambda _2\gamma ^2\Vert w_\mathrm{c}^T\phi _\mathrm{c}(t)\Vert ^2\nonumber \\&+\,\lambda _2\gamma ^2\bigg \Vert \xi _\mathrm{a}(t)C(t)-\hat{w}_\mathrm{c}^T(t)\phi _\mathrm{c}(t)\bigg \Vert ^2\nonumber \\&+\,\lambda _2\gamma ^2(1+2\alpha )\bigg \Vert \sum _{j=1}^{l}\phi _{a_j} (t)C_j(t)e_{a_j}^T(t)\bigg \Vert ^2 \end{aligned}$$
(35)

With Assumption 1 and Cauchy–Schwarz inequality, we derive from (35)

$$\begin{aligned} \mathfrak {L}^2(t)\le&\, 3\lambda _1\left[ 2l^2+\beta \gamma ^2(1+6l^2 +\gamma ^2)\right] w_\mathrm{cm}^2\phi _\mathrm{cm}^4\nonumber \\&+\,\left[ 3\lambda _1+\gamma ^2(3\lambda _1+4\lambda _2)\right. \nonumber \\&+\,\left. \lambda _2\gamma ^{2}l^2(1+2\alpha )\phi _\mathrm{am}^2C_m^2\right] w_\mathrm{cm}^2\phi _\mathrm{cm}^2\nonumber \\&+\,3\lambda _1\left[ 1+l^2+\beta \gamma ^2(\phi _\mathrm{cm}^2+9l^2)\right] r_m^2 +\lambda _1\gamma ^2w_\mathrm{cm}^2\nonumber \\&+\,\gamma ^2(\lambda _1+2\lambda _2C_m^2\phi _\mathrm{am}^2)w_\mathrm{am}^2\triangleq \mathfrak {L}_m^2 \end{aligned}$$
(36)

where \(r_m\) and \(C_m\) are the upper bound of \(\Vert r(t)\Vert \) and \(\Vert C(t)\Vert \), respectively, that is \(\Vert r\Vert \le r_m\), \(\Vert C(t)\Vert \le C_m\).

Select the parameters to satisfy that

$$\begin{aligned} 0<\alpha \Vert C(t)\Vert ^2\Vert \phi _\mathrm{a}(t)\Vert ^2<1,\quad 0<\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2<1 \end{aligned}$$
(37)

and the parameters \(\lambda _i>0\ (i=1,2)\) are chosen as

$$\begin{aligned} \frac{\lambda _1}{\lambda _2}>\frac{2}{1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2} \end{aligned}$$
(38)

Then, if (37) and (38) hold, for any

$$\begin{aligned} \Vert \xi _\mathrm{c}(t)\Vert >&\sqrt{\frac{\mathfrak {L}_m^2}{\gamma ^2\left[ \lambda _1 \left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) -2\lambda _2\right] }} \end{aligned}$$
(39)
$$\begin{aligned} \Vert \xi _\mathrm{a}(t)\Vert >&\sqrt{\frac{\mathfrak {L}_m^2}{\lambda _2\gamma ^2\Vert C(t)\Vert ^2}} \end{aligned}$$
(40)

the first difference \(\Delta L(t)<0\) holds.

Note that \(\Vert \xi _\mathrm{c}(t)\Vert \le \Vert \tilde{w}_\mathrm{c}(t)\Vert \Vert \phi _\mathrm{cm}\Vert \) and \(\Vert \xi _\mathrm{a}(t)\Vert \le \Vert \tilde{w}_\mathrm{a}(t)\Vert \Vert \phi _\mathrm{am}\Vert \), then by (39) and (40), we have

$$\begin{aligned} \Vert \tilde{w}_\mathrm{c}(t)\Vert>&\frac{1}{\phi _\mathrm{cm}}\sqrt{\frac{\mathfrak {L}_m^2}{\gamma ^2\left[ \lambda _1\left( 1-\beta \gamma ^2\Vert \phi _\mathrm{c}(t)\Vert ^2\right) -2\lambda _2\right] }}\triangleq \mathfrak {B}_\mathrm{c}\\ \Vert \tilde{w}_\mathrm{a}(t)\Vert >&\frac{1}{\phi _\mathrm{am}} \sqrt{\frac{\mathfrak {L}_m^2}{\lambda _2\gamma ^2\Vert C(t)\Vert ^2}}\triangleq \mathfrak {B}_\mathrm{a} \end{aligned}$$

With the standard Lyapunov extension theorem, we can conclude that the weight estimation errors of the critic network \(\tilde{w}_\mathrm{c}(t)\) and the actor network \(\tilde{w}_\mathrm{a}(t)\) are UUB by positive constants \(\mathfrak {B}_\mathrm{c}\) and \(\mathfrak {B}_\mathrm{a}\).

The proof is completed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Zhao, D. & Cheng, J. Adaptive cruise control via adaptive dynamic programming with experience replay. Soft Comput 23, 4131–4144 (2019). https://doi.org/10.1007/s00500-018-3063-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3063-7

Keywords

Navigation