Online model-learning algorithm from samples and trajectories

Zhong, Shan; Fu, Qiming; Xia, Kaijian; Gong, Shengrong; Yao, Yufeng

doi:10.1007/s12652-018-1133-4

Online model-learning algorithm from samples and trajectories

Original Research
Published: 17 November 2018

Volume 11, pages 527–537, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Shan Zhong ORCID: orcid.org/0000-0003-0034-6952^1,2,3,
Qiming Fu^2,4,
Kaijian Xia⁵,
Shengrong Gong¹ &
…
Yufeng Yao¹

198 Accesses
2 Citations
Explore all metrics

Abstract

Learning of the value function and the policy for continuous MDPs is non-trial due to the difficulty in collecting enough data. Model learning can use the collected data effectively, to learn a model and then use the learned model for planning, so as to accelerate the learning of the value function and the policy. Most of the existing works about model learning only concern the improvement of the single-step or multiple-step prediction, while the combination of them may be a better choice. Therefore, we propose an online algorithm where the samples for learning the model are both from the samples and from the trajectories, called Online-ML-ST. Other than the existing work, the trajectories collected in the interaction with the environment are not only used to learn the model offline, but also to learn the model, the value function and the policy online. The experiments are implemented in two typical continuous benchmarks such as the Pole Balancing and Inverted Pendulum, and the result shows that Online-ML-ST outperforms the other three typical methods in learning rate and convergence rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions

Actively learning dynamical systems using Bayesian neural networks

Article 27 October 2023

Shengbing Tang, Kenji Fujimoto & Ichiro Maruta

Data-efficient model-based reinforcement learning with trajectory discrimination

Article Open access 11 October 2023

Tuo Qu, Fuqing Duan, … Wenzhen Huang

References

Busoniu L. Babuška R, Schutter BD et al (2010) Reinforcement Learning and dynamic programming using function approximators. CRC Press, New York
Google Scholar
Grondman I, Busoniu L, Babuska R (2012a) Model learning actor-critic algorithms: performance evaluation in a motion control task. In: Proceedings of IEEE conference on decision and control, pp 5272–5277
Grondman I, Vaandrager M, Busoniu L et al (2012b) Efficient model learning methods for actor–critic control systems. IEEE Trans Syst Man Cybern 42:591–602
Article Google Scholar
Hwangbo J, Sa I, Siegwart R et al (2017) Control of a quadrotor with reinforcement learning. IEEE Robot Auto Lett 2:2096–2103
Article Google Scholar
Koushik AM, Hu F, Kumar S (2018) Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks. IEEE Trans Mob Comput 17:1204–1215. https://doi.org/10.1109/tmc.2017.2744620
Article Google Scholar
Lample G, Chaplot DS (2017) Playing fps games with deep reinforcement learning. In: Proceedings of association for the advance of artificial intelligence, pp 2140–2146
Li L, Li D, Song T (2017) Sustainable ℓ2-regularized actor-critic based on recursive least-squares temporal difference learning. In: Proceedings of international conference on systems, man, and cybernetics, pp 1886–1891. https://doi.org/10.1109/smc.2017.8122892
Littman ML (2015) Reinforcement learning improves behaviour from evaluative feedback. Nature 7553:445–451. https://doi.org/10.1038/nature14540
Article Google Scholar
Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less real time. Mach Learn 1:103–130. https://doi.org/10.1007/bf00993104
Article Google Scholar
Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1:437–454. https://doi.org/10.1177/105971239300100403
Article Google Scholar
Sombolestan SM, Rasooli A, Khodaygan S (2018) Optimal path-planning for mobile robots to find a hidden target in an unknown environment based on machine learning. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0777-4
Article Google Scholar
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of international conference on machine learning, pp. 216–224
Chapter Google Scholar
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT press, Cambridge
Book Google Scholar
Sutton RS, Szepesvári C, Geramfard A et al (2008) Dyna-style planning with linear function approximation and prioritized sweeping. In: Proceedings of uncertainty in artificial intelligence, pp 1–9
Tagorti M, Scherer B (2015) On the rate of the convergence and error bounds for LSTD(λ). In: Proceedings of international conference on machine learning, pp 528–536
Venkatraman A, Hebert M, Bagnell JA (2015) Improving multi-step prediction of learned time series models. In: Proceedings of association for the advance of artificial intelligence, pp 3024–3030
Venkatraman A, Capobianco R, Pinto L et al (2016) Improved learning of dynamics models for control. In: Proceedings of advanced robotics, pp 703–713. https://doi.org/10.1007/978-3-319-50115-4_61
Google Scholar
Wei Q, Song R, Yan P (2016) Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans Neural Netw Learn Syst 27:444–458. https://doi.org/10.1109/TNNLS.2015.2464080
Article MathSciNet Google Scholar
Zang Z, Li Z, Dan Z et al (2018) Improving selection strategies in zeroth-level classifier systems based on average reward reinforcement learning. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-0682-x
Article Google Scholar
Zhong S, Liu Q, Zhang Z et al (2018) Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation. Front Comput Sci. https://doi.org/10.1007/s11704-017-6222-6
Article Google Scholar
Zhu Y, Mottaghi R, Kolve E et al (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of IEEE international conference on robotics and automation, pp 3357–3364

Download references

Acknowledgements

This paper is supported by National Natural Science Foundation of China (61702055), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04). Program of Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Changshu Institute of Technology, Changshu, Jiangsu, 215500, China
Shan Zhong, Shengrong Gong & Yufeng Yao
Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215006, Jiangsu, China
Shan Zhong & Qiming Fu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
Shan Zhong
College of Electronic and Information Engineering, Suzhou University of Science and Technology, Jiangsu, Suzhou, 215000, China
Qiming Fu
Changshu Affiliated Hospital of Soochow University (Changshu No. 1 People’s Hospital), Jiangsu, Changshu, 215500, China
Kaijian Xia

Authors

Shan Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qiming Fu
View author publications
You can also search for this author in PubMed Google Scholar
Kaijian Xia
View author publications
You can also search for this author in PubMed Google Scholar
Shengrong Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shan Zhong.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, S., Fu, Q., Xia, K. et al. Online model-learning algorithm from samples and trajectories. J Ambient Intell Human Comput 11, 527–537 (2020). https://doi.org/10.1007/s12652-018-1133-4

Download citation

Received: 08 June 2018
Accepted: 08 November 2018
Published: 17 November 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s12652-018-1133-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online model-learning algorithm from samples and trajectories

Abstract

Access this article

Similar content being viewed by others

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions

Actively learning dynamical systems using Bayesian neural networks

Data-efficient model-based reinforcement learning with trajectory discrimination

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online model-learning algorithm from samples and trajectories

Abstract

Access this article

Similar content being viewed by others

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions

Actively learning dynamical systems using Bayesian neural networks

Data-efficient model-based reinforcement learning with trajectory discrimination

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation