Abstract
In online markets, reinforcement learning (RL) is a promising way for dynamic pricing, due to its ability in maximizing long-term cumulative return. However, directly optimizing RL policies in the online markets can be costly since RL requires trial-and-error with the environment, which may lead to drastic revenue loss. In this paper, we propose a robust dynamic pricing algorithm using RL. The main idea is to train the dynamic pricing policy in an adversarial simulation environment built with a generative adversarial framework. In this framework, the generator is trained to: 1) imitate real customers behaviors; 2) generate adversarial behaviors. The algorithm is proved to converge under certain assumptions. The experiment results show that our algorithm can be comparable with the algorithm directly trained in the real environment. Moreover, it outperforms other baseline significantly in different scenarios.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
JD is one of the largest retail company in China, www.jd.com.
References
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Garcia, J., Fernández, F.: Safe exploration of state and action spaces in reinforcement learning. J. Artif. Intell. Res. 45, 515–564 (2012)
Hans, A., Schneegaß, D., Schäfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: ESANN, pp. 143–148. Citeseer (2008)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4565–4573 (2016)
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4(Nov), 1039–1069 (2003)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative Q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779 (2020)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Pinto, L., Davidson, J., Sukthankar, R., Gupta, A.: Robust adversarial reinforcement learning. In: International Conference on Machine Learning, pp. 2817–2826. PMLR (2017)
Shi, J.C., Yu, Y., Da, Q., Chen, S.Y., Zeng, A.X.: Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4902–4909 (2019)
Szepesvári, C., Littman, M.L.: A unified analysis of value-function-based reinforcement-learning algorithms. Neural Comput. 11(8), 2017–2060 (1999)
Acknowledgements
This research was supported by National Key Research and Development Program of China (No. 2019YFB2101704); Natural Science Foundation of Jiangsu Province (No. BK20200752); The NUPTSF (No. NY220080).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, B., Xiao, F. (2022). Robust Dynamic Pricing in Online Markets with Reinforcement Learning. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13246. Springer, Cham. https://doi.org/10.1007/978-3-031-00126-0_48
Download citation
DOI: https://doi.org/10.1007/978-3-031-00126-0_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00125-3
Online ISBN: 978-3-031-00126-0
eBook Packages: Computer ScienceComputer Science (R0)