research-article

Evaluating and Selecting Deep Reinforcement Learning Models for OptimalDynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC

Authors:

Terry R. Payne,

Yong YueAuthors Info & Claims

CCEAI '24: Proceedings of the 2024 8th International Conference on Control Engineering and Artificial Intelligence

Pages 215 - 219

https://doi.org/10.1145/3640824.3640871

Published: 08 March 2024 Publication History

Abstract

Given the plethora of available solutions, choosing an appropriate Deep Reinforcement Learning (DRL) model for dynamic pricing poses a significant challenge for practitioners. While many DRL solutions claim superior performance, there lacks a standardized framework for their evaluation. Addressing this gap, we introduce a novel framework and a set of metrics to select and assess DRL models systematically. To validate the utility of our framework, we critically compared three representative DRL models, emphasizing their performance in dynamic pricing tasks. Further ensuring the robustness of our assessment, we benchmarked these models against a well-established human agent policy. The DRL model that emerged as the most effective was rigorously tested on an Amazon dataset, demonstrating a notable performance boost of 5.64%. Our findings underscore the value of our proposed metrics and framework in guiding practitioners towards the most suitable DRL solution for dynamic pricing.

References

[1]

Nicolas Bondoux, Anh Quan Nguyen, Thomas Fiig, and Rodrigo Acuna-Agost. 2020. Reinforcement learning applied to airline revenue management. Journal of Revenue and Pricing Management 19, 5 (2020), 332–348.

[2]

Byung Do Chung, Jiahan Li, Tao Yao, Changhyun Kwon, and Terry L Friesz. 2011. Demand learning and dynamic pricing under competition in a state-space framework. IEEE Transactions on Engineering Management 59, 2 (2011), 240–249.

[3]

Wedad Elmaghraby and Pınar Keskinocak. 2003. Dynamic pricing in the presence of inventory considerations: Research overview, current practices, and future directions. Management science 49, 10 (2003), 1287–1309.

[4]

Guillermo Gallego and Garrett Van Ryzin. 1994. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management science 40, 8 (1994), 999–1020.

[5]

Rajan Gupta and Chaitanya Pathak. 2014. A machine learning framework for predicting purchase by online customers based on dynamic pricing. Procedia Computer Science 36 (2014), 599–605.

[6]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861–1870.

[7]

Jiaxi Liu, Yidong Zhang, Xiaoqing Wang, Yuming Deng, and Xingyu Wu. 2019. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment. arXiv preprint arXiv:1912.02572 (2019).

[8]

Yuchen Liu. 2023. Dynamic Pricing Algorithm. https://github.com/Larry-Liu02/Dynamic-Pricing-Algorithm.

[9]

Yuchen Liu, Ka Lok Man, Gangmin Li, Terry R. Payne, and Yong Yue. 2022. Dynamic Pricing Strategies on the Internet. In Proceedings of International Conference on Digital Contents: AICo (AI, IoT and Contents) Technology.

[10]

Yuchen Liu, Ka Lok Man, Gangmin Li, Terry R. Payne, and Yong Yue. 2023. Enhancing Sparse Data Performance in E-commerce Dynamic Pricing with Reinforcement Learning and Pre-Trained Learning. In 2023 International Conference on Platform Technology and Service (PlatCon). IEEE.

[11]

Serguei Netessine. 2006. Dynamic pricing of inventory/capacity with infrequent price changes. European Journal of Operational Research 174, 1 (2006), 553–580.

[12]

Rainer Schlosser and Martin Boissier. 2018. Dynamic Pricing under Competition on Online Marketplaces: A Data-Driven Approach. In KDD. 705–714.

[13]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889–1897.

[14]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[15]

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In International conference on machine learning. Pmlr, 387–395.

Cited By

Liu YLi GPayne TYue YMan K(2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
https://doi.org/10.3390/electronics13112075
Liu YLi GPayne TYue YLok Man K(2024)Choosing the Right Path: A Comparative Study of CNN, RNN, and Transformer Models for Sequential Recommendation Systems2024 International Conference on Platform Technology and Service (PlatCon)10.1109/PlatCon63925.2024.10830671(77-82)Online publication date: 26-Aug-2024
https://doi.org/10.1109/PlatCon63925.2024.10830671
Li GLiu Y(2024)Recommendation Systems with Non-stationary Transformer2024 29th International Conference on Automation and Computing (ICAC)10.1109/ICAC61394.2024.10718828(1-6)Online publication date: 28-Aug-2024
https://doi.org/10.1109/ICAC61394.2024.10718828
Show More Cited By

Index Terms

Evaluating and Selecting Deep Reinforcement Learning Models for OptimalDynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC
1. Information systems
  1. Information systems applications

Recommendations

Reinforcement learning algorithms: A brief survey
Highlights
- RL can be used to solve problems involving sequential decision-making.
- RL is based on trial-and-error learning through rewards and punishments.
- The ultimate goal of an RL agent is to maximize cumulative reward.
- RL agent tries ...
Abstract
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge ...
Dynamic pricing policies for interdependent perishable products or services using reinforcement learning

Dynamic prices maximize the expected revenue of interdependent products.Reinforcement learning optimizes the pricing of interdependent products.Interdependent pricing enhances learning. Many businesses offer multiple products or services that are ...
Robust Dynamic Pricing in Online Markets with Reinforcement Learning
Database Systems for Advanced Applications
Abstract
In online markets, reinforcement learning (RL) is a promising way for dynamic pricing, due to its ability in maximizing long-term cumulative return. However, directly optimizing RL policies in the online markets can be costly since RL requires ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CCEAI '24: Proceedings of the 2024 8th International Conference on Control Engineering and Artificial Intelligence

January 2024

297 pages

ISBN:9798400707971

DOI:10.1145/3640824

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

XJTLU Key Program Special Fund (KSF-A-17).
XJTLU-REF-21-01-002

Conference

CCEAI 2024

CCEAI 2024: 2024 8th International Conference on Control Engineering and Artificial Intelligence

January 26 - 28, 2024

Shanghai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
247
Total Downloads

Downloads (Last 12 months)247
Downloads (Last 6 weeks)32

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu YLi GPayne TYue YMan K(2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
https://doi.org/10.3390/electronics13112075
Liu YLi GPayne TYue YLok Man K(2024)Choosing the Right Path: A Comparative Study of CNN, RNN, and Transformer Models for Sequential Recommendation Systems2024 International Conference on Platform Technology and Service (PlatCon)10.1109/PlatCon63925.2024.10830671(77-82)Online publication date: 26-Aug-2024
https://doi.org/10.1109/PlatCon63925.2024.10830671
Li GLiu Y(2024)Recommendation Systems with Non-stationary Transformer2024 29th International Conference on Automation and Computing (ICAC)10.1109/ICAC61394.2024.10718828(1-6)Online publication date: 28-Aug-2024
https://doi.org/10.1109/ICAC61394.2024.10718828
Lin ZZeng MWang CWu SXie L(2024)Integrating Vision Localization and Deep Reinforcement Learning for High-Precision, Low-Cost Peg-in-Hole Assembly2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM)10.1109/CIS-RAM61939.2024.10673040(138-143)Online publication date: 8-Aug-2024
https://doi.org/10.1109/CIS-RAM61939.2024.10673040
Rio AJimenez DSerrano J(2024)Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General EnvironmentsIEEE Access10.1109/ACCESS.2024.347247312(146795-146806)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3472473
Liu YMikriukov DTjahyadi OLi GPayne TYue YSiddique KMan K(2023)Revolutionising Financial Portfolio Management: The Non-Stationary Transformer’s Fusion of Macroeconomic Indicators and Sentiment Analysis in a Deep Reinforcement Learning FrameworkApplied Sciences10.3390/app1401027414:1(274)Online publication date: 28-Dec-2023
https://doi.org/10.3390/app14010274

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten