Abstract
Developing offline reinforcement learning evaluation applications faces challenges such as heterogeneous data and algorithm integration, user-friendly interface, and flexible resource management. This paper designs and implements ORLEP, an efficient platform to provide high-level services for offline reinforcement learning evaluation. Besides integrating underlying infrastructure with highly concurrency and reliability, core components with distributed deployment and 3rd party libs and benchmarks incorporation, ORLEP supplies high-level abstractions for (1) data management, (2) model training and evaluation, (3) result visualization, and (4) resource configuration and supervision. Moreover, this paper verifies specific cases and the results demonstrate the performance and scalability of the proposed ORLEP.
Similar content being viewed by others
Data Availability
Data openly available in a public repository.
References
Alshuqayran N, Ali N, Evans R (2016) A systematic mapping study in microservice architecture. In: 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA). IEEE, pp 44–51
Burch C (2010) Django, a web framework using python: Tutorial presentation. In: J Comput Sci Coll 25(5):154–155
Cabi S, et al (2019) A framework for data-driven robotics. In: arXiv:1909.12200
D’Eramo C et al (2021) Mushroomrl: Simplifying reinforcement learning research. In: J Mach Learn Res 22(1):5867–5871
Denoyer L, et al (2021) Salina: Sequential learning of agents. In: arXiv:2110.07910
Fu J, et al (2020) D4rl: Datasets for deep data-driven reinforcement learning. In: arXiv:2004.07219
Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International conference on machine learning. PMLR, pp 2052–2062
Gade AN et al (2018) REDIS: A value-based decision support tool for renovation of building portfolios. Building and environment 142:107–118
Henderson J, Lemon O, Georgila K (2008) Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. In: Comput Linguist 34(4):487–511
Ionescu VM (2015) The analysis of the performance of RabbitMQ and ActiveMQ. In: 2015 14th RoEduNet International Conference-Networking in Education and Research (RoEduNet NER). IEEE, pp 132–137
Jaques N et al (2019) Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. In: arXiv:1907.00456
Kuhnle A, Schaarschmidt M, Fricke K (2017) Tensorforce: a tensorflow library for applied reinforcement learning. In: Web p 9
Kumar A et al (2020) Conservative q-learning for offline reinforcement learning. Adv Neural Inf Process Syst 33:1179–1191
Levine S, et al (2020) Offline reinforcement learning:Tutorial, review, and perspectives on open problems. In: arXiv:2005.01643
Li L, et al (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World wide web, pp 661–670
Linzecong. LPOJ usage and development Document. https://docs.lpoj.cn/.2023.5.20
Nandy A, et al (2018) Reinforcement learning with keras, tensorflow, and chainerrl. In: Reinforcement learning: With open ai, tensorflow and keras using python, pp 129–153
Ouyang L et al (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 35:27730–27744
Pietquin O et al (2011) Sample-efficient batch reinforcement learning for dialogue management optimization. In: ACM Trans Audio Speech Lang Process (TSLP) 7(3):1–21s
Qin RJ et al (2022) NeoRL: A near real-world benchmark for offline reinforcement learning. Adv Neural Inf Process Syst 35:24753–24765
Raffin A et al (2021) Stable-baselines3: Reliable reinforcement learning implementations. In: J Mach Learn Res 22(1):12348–12355
Seno T, Imai M (2022) d3rlpy: An offline deep reinforcement learning library. In: J Mach Learn Res 23(1):14205–14224
Sheldon R, Moes G (2005) Beginning MySQL. John Wiley & Sons
Silver D et al (2017) Mastering the game of go without human knowledge. In: Nature 550(7676):354–359
Strehl A, et al (2010) Learning from logged implicit exploration data. In: Adv Neural Inf Process Syst 23
Thomas P, et al (2017) Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31(2), pp 4740–4745
Vinyals O et al (2019) Grandmaster level in Star-Craft II using multi-agent reinforcement learning. In: Nature 575(7782):350–354
Weng J, et al (2021) Tianshou: A highly modularized deep reinforcement learning library. In: arXiv:2107.14171
Wiering MA, Van Otterlo M (2012) Reinforcement learning. In: Adapt Learn Optim 12(3):729
You E (2022) Vue.js Progressive JavaScript Framework. https://v2.cn.vuejs.org/.2023.5.20
Acknowledgements
All authors contributed to the study conception and design. Material preparation, analysis and writing original draft were performed by Chen Chen. Resources, supervision, funding acquisition were performed by Mao Keming. Material preparation, software, investigation were performed by Zhang Jinkai. Data collection and test were performed by Li Yiyang. And all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
This work was supported by Natural Science Foundation(No.2022-MS-112) of Lianoning Province, China.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mao, K., Chen, C., Zhang, J. et al. ORLEP: an efficient offline reinforcement learning evaluation platform. Multimed Tools Appl 83, 37073–37087 (2024). https://doi.org/10.1007/s11042-023-16906-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16906-5