Abstract
The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay technology is proposed to design the ACC controller. Experience replay increases the data efficiency by recording the available driving data and repeatedly presenting them to the learning procedure of the acceleration controller in the ACC system. The learning framework that combines ADP with experience replay is described in detail. The distinguishing feature of the algorithm is that when estimating parameters of the critic network and the actor network with gradient rules, the gradients of historical data and current data are used to update parameters concurrently. It is proved with Lyapunov theory that the weight estimation errors of the actor network and the critic network are uniformly ultimately bounded under the novel weight update rules. The learning performance of the ACC controller implemented by this ADP algorithm is clearly demonstrated that experience replay can increase data efficiency significantly, and the approximate optimality and adaptability of the learned control policy are tested with typical driving scenarios.
Similar content being viewed by others
References
Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern C 42(2):201–212
Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators, vol 39. CRC Press, Boca Raton
Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. In: 2010 49th IEEE conference decision and control (CDC), pp 3674–3679
Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5):851–860
dSPACE (2015) dSPACE ASM vehicle dynamics. http://www.dspace.com/en/inc/home/products/sw/automotive_simulation_models/produkte_asm/vehicle_dynamics_models.cfm. Accessed Oct 28, 2015
Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C 42(6):1291–1307
Guvenc B, Kural E (2006) Adaptive cruise control simulator: a low-cost, multiple-driver-in-the-loop simulator. IEEE Control Syst Mag 26(3):42–55
Kang F, Han S, Salgado R, Li J (2015) System probabilistic stability analysis of soil slopes using gaussian process regression with latin hypercube sampling. Comput Geotech 63:13–25
Kang F, Xu Q, Li J (2016) Slope reliability analysis using surrogate models via new support vector machines with swarm intelligence. Appl Math Model 40(11):6105–6120
Kiumarsi B, Lewis FL (2015) Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(1):140–151
Kyongsu Y, Ilki M (2004) A driver-adaptive stop-and-go cruise control strategy. In: 2004 IEEE International Conference on Network Sensor Control, pp 601–606
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321
Liu D, Wang D, Zhao D, Wei Q, Jin N (2012a) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
Liu F, Sun J, Si J, Guo W, Mei S (2012b) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235
Mnih V, Kavukcuoglu K, Silver D, Rusu et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Moon S, Moon I, Yi K (2009) Design, tuning, and evaluation of a full-range adaptive cruise control system with collision avoidance. Control Eng Pract 17(4):442–455
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Department of Engineering, University of Cambridge
Shakouri P, Ordys A (2014) Nonlinear model predictive control approach in design of adaptive cruise control with automated switching to cruise control. Control Eng Pract 26:160–177
Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge
Tapani A (2012) Vehicle trajectory effects of adaptive cruise control. J Intell Transp Syst 16(1):36–44
Tsai CC, Hsieh SM, Chen CT (2010) Fuzzy longitudinal controller design and experimentation for adaptive cruise control and stop&go. J Intell Robot Syst 59(2):167–189
Wang B, Zhao D, Li C, Dai Y (2015) Design and implementation of an adaptive cruise control system based on supervised actor-critic learning. In: 2015 5th international conference on information science technology (ICIST), pp 243–248
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Wawrzyński P, Tanwani AK (2013) Autonomous reinforcement learning with experience replay. Neural Netw 41:156–167
Xiao L, Gao F (2011) Practical string stability of platoon of adaptive cruise control vehicles. IEEE Trans Intell Transp Syst 12:1184–1194
Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55:30–41
Zhang Q, Zhao D, Zhu Y (2016) Event-triggered H\(\infty \) control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst Man Cybern Syst 47(7):1071–1081
Zhao D, Bai X, Wang F, Xu J, Yu W (2011) DHP for coordinated freeway ramp metering. IEEE Trans Intell Transp Syst 12(4):990–999
Zhao D, Zhang Z, Dai Y (2012) Self-teaching adaptive dynamic programming for go-moku. Neurocomputing 78(1):23–29
Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099
Zhao D, Xia Z, Wang D (2015a) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468
Zhao D, Zhang Q, Wang D, Zhu Y (2015b) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865
Acknowledgements
This work was supported partly by National Natural Science Foundation of China (Nos. 61603150, 61273136, 61573353 and 61533017), the National Key Research and Development Plan (No. 2016YFB0101000), and Doctoral Foundation of University of Jinan (No. XBS1605).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Appendices
Appendices
A Proof of Lemma 1
The first difference of \(L_1(t)\) is
with the basic property of matrix \(tr(AB)=tr(BA)\), then
Substituting (9) into (26), we obtain
where
with (7) we have
For simplicity, we remark
then
Substituting \(\mathfrak {R}_1(t)\) and \(\mathfrak {R}_2(t)\) into (27), we obtain
Applying the Cauchy–Schwarz inequality \(\Vert x+y\Vert ^2\le 2\Vert x\Vert ^2+2\Vert y\Vert ^2\), we can derive (20).
The proof is completed.
B Proof of Lemma 2
The first difference of \(L_2(t)\) is
Substituting (15) into (31), we have
By Cauchy–Schwarz inequality, we can derive (22).
The proof is completed.
C Proof of Theorem 1
Consider the following Lyapunov function candidate
where
The first difference of L(t) is
By employing Lemma 1 and Lemma 2, we obtain
where
With Assumption 1 and Cauchy–Schwarz inequality, we derive from (35)
where \(r_m\) and \(C_m\) are the upper bound of \(\Vert r(t)\Vert \) and \(\Vert C(t)\Vert \), respectively, that is \(\Vert r\Vert \le r_m\), \(\Vert C(t)\Vert \le C_m\).
Select the parameters to satisfy that
and the parameters \(\lambda _i>0\ (i=1,2)\) are chosen as
Then, if (37) and (38) hold, for any
the first difference \(\Delta L(t)<0\) holds.
Note that \(\Vert \xi _\mathrm{c}(t)\Vert \le \Vert \tilde{w}_\mathrm{c}(t)\Vert \Vert \phi _\mathrm{cm}\Vert \) and \(\Vert \xi _\mathrm{a}(t)\Vert \le \Vert \tilde{w}_\mathrm{a}(t)\Vert \Vert \phi _\mathrm{am}\Vert \), then by (39) and (40), we have
With the standard Lyapunov extension theorem, we can conclude that the weight estimation errors of the critic network \(\tilde{w}_\mathrm{c}(t)\) and the actor network \(\tilde{w}_\mathrm{a}(t)\) are UUB by positive constants \(\mathfrak {B}_\mathrm{c}\) and \(\mathfrak {B}_\mathrm{a}\).
The proof is completed.
Rights and permissions
About this article
Cite this article
Wang, B., Zhao, D. & Cheng, J. Adaptive cruise control via adaptive dynamic programming with experience replay. Soft Comput 23, 4131–4144 (2019). https://doi.org/10.1007/s00500-018-3063-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3063-7