Robust Dynamic Pricing in Online Markets with Reinforcement Learning

Zhang, Bolei; Xiao, Fu

doi:10.1007/978-3-031-00126-0_48

Robust Dynamic Pricing in Online Markets with Reinforcement Learning

Conference paper
First Online: 08 April 2022

2806 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13246))

Abstract

In online markets, reinforcement learning (RL) is a promising way for dynamic pricing, due to its ability in maximizing long-term cumulative return. However, directly optimizing RL policies in the online markets can be costly since RL requires trial-and-error with the environment, which may lead to drastic revenue loss. In this paper, we propose a robust dynamic pricing algorithm using RL. The main idea is to train the dynamic pricing policy in an adversarial simulation environment built with a generative adversarial framework. In this framework, the generator is trained to: 1) imitate real customers behaviors; 2) generate adversarial behaviors. The algorithm is proved to converge under certain assumptions. The experiment results show that our algorithm can be comparable with the algorithm directly trained in the real environment. Moreover, it outperforms other baseline significantly in different scenarios.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
JD is one of the largest retail company in China, www.jd.com.

References

Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Google Scholar
Garcia, J., Fernández, F.: Safe exploration of state and action spaces in reinforcement learning. J. Artif. Intell. Res. 45, 515–564 (2012)
Article MathSciNet Google Scholar
Hans, A., Schneegaß, D., Schäfer, A.M., Udluft, S.: Safe exploration for reinforcement learning. In: ESANN, pp. 143–148. Citeseer (2008)
Google Scholar
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4565–4573 (2016)
Google Scholar
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4(Nov), 1039–1069 (2003)
Google Scholar
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative Q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779 (2020)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Pinto, L., Davidson, J., Sukthankar, R., Gupta, A.: Robust adversarial reinforcement learning. In: International Conference on Machine Learning, pp. 2817–2826. PMLR (2017)
Google Scholar
Shi, J.C., Yu, Y., Da, Q., Chen, S.Y., Zeng, A.X.: Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4902–4909 (2019)
Google Scholar
Szepesvári, C., Littman, M.L.: A unified analysis of value-function-based reinforcement-learning algorithms. Neural Comput. 11(8), 2017–2060 (1999)
Article Google Scholar

Download references

Acknowledgements

This research was supported by National Key Research and Development Program of China (No. 2019YFB2101704); Natural Science Foundation of Jiangsu Province (No. BK20200752); The NUPTSF (No. NY220080).

Author information

Authors and Affiliations

School of Computer, Nanjing University of Posts and Telecommunications, Nanjing, People’s Republic of China
Bolei Zhang & Fu Xiao
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People’s Republic of China
Bolei Zhang

Authors

Bolei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fu Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fu Xiao .

Editor information

Editors and Affiliations

Dept. of Computer Science&Engr., Indian Institutes of Technology, Kanpur, Uttar Pradesh, India
Arnab Bhattacharya
National University of Singapore, Singapore, Singapore
Janice Lee Mong Li
University of California, Santa Barbara, Santa Barbara, CA, USA
Divyakant Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Mukesh Mohania
Ashoka University, Sonepat, Haryana, India
Anirban Mondal
Indraprastha Institute of Information Te, New Delhi, India
Vikram Goyal
University of Aizu, Aizu, Japan
Rage Uday Kiran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B., Xiao, F. (2022). Robust Dynamic Pricing in Online Markets with Reinforcement Learning. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13246. Springer, Cham. https://doi.org/10.1007/978-3-031-00126-0_48

Download citation

DOI: https://doi.org/10.1007/978-3-031-00126-0_48
Published: 08 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00125-3
Online ISBN: 978-3-031-00126-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics