skip to main content
10.1145/3640824.3640871acmotherconferencesArticle/Chapter ViewAbstractPublication PagescceaiConference Proceedingsconference-collections
research-article

Evaluating and Selecting Deep Reinforcement Learning Models for OptimalDynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC

Published: 08 March 2024 Publication History

Abstract

Given the plethora of available solutions, choosing an appropriate Deep Reinforcement Learning (DRL) model for dynamic pricing poses a significant challenge for practitioners. While many DRL solutions claim superior performance, there lacks a standardized framework for their evaluation. Addressing this gap, we introduce a novel framework and a set of metrics to select and assess DRL models systematically. To validate the utility of our framework, we critically compared three representative DRL models, emphasizing their performance in dynamic pricing tasks. Further ensuring the robustness of our assessment, we benchmarked these models against a well-established human agent policy. The DRL model that emerged as the most effective was rigorously tested on an Amazon dataset, demonstrating a notable performance boost of 5.64%. Our findings underscore the value of our proposed metrics and framework in guiding practitioners towards the most suitable DRL solution for dynamic pricing.

References

[1]
Nicolas Bondoux, Anh Quan Nguyen, Thomas Fiig, and Rodrigo Acuna-Agost. 2020. Reinforcement learning applied to airline revenue management. Journal of Revenue and Pricing Management 19, 5 (2020), 332–348.
[2]
Byung Do Chung, Jiahan Li, Tao Yao, Changhyun Kwon, and Terry L Friesz. 2011. Demand learning and dynamic pricing under competition in a state-space framework. IEEE Transactions on Engineering Management 59, 2 (2011), 240–249.
[3]
Wedad Elmaghraby and Pınar Keskinocak. 2003. Dynamic pricing in the presence of inventory considerations: Research overview, current practices, and future directions. Management science 49, 10 (2003), 1287–1309.
[4]
Guillermo Gallego and Garrett Van Ryzin. 1994. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management science 40, 8 (1994), 999–1020.
[5]
Rajan Gupta and Chaitanya Pathak. 2014. A machine learning framework for predicting purchase by online customers based on dynamic pricing. Procedia Computer Science 36 (2014), 599–605.
[6]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861–1870.
[7]
Jiaxi Liu, Yidong Zhang, Xiaoqing Wang, Yuming Deng, and Xingyu Wu. 2019. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment. arXiv preprint arXiv:1912.02572 (2019).
[8]
Yuchen Liu. 2023. Dynamic Pricing Algorithm. https://github.com/Larry-Liu02/Dynamic-Pricing-Algorithm.
[9]
Yuchen Liu, Ka Lok Man, Gangmin Li, Terry R. Payne, and Yong Yue. 2022. Dynamic Pricing Strategies on the Internet. In Proceedings of International Conference on Digital Contents: AICo (AI, IoT and Contents) Technology.
[10]
Yuchen Liu, Ka Lok Man, Gangmin Li, Terry R. Payne, and Yong Yue. 2023. Enhancing Sparse Data Performance in E-commerce Dynamic Pricing with Reinforcement Learning and Pre-Trained Learning. In 2023 International Conference on Platform Technology and Service (PlatCon). IEEE.
[11]
Serguei Netessine. 2006. Dynamic pricing of inventory/capacity with infrequent price changes. European Journal of Operational Research 174, 1 (2006), 553–580.
[12]
Rainer Schlosser and Martin Boissier. 2018. Dynamic Pricing under Competition on Online Marketplaces: A Data-Driven Approach. In KDD. 705–714.
[13]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889–1897.
[14]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[15]
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In International conference on machine learning. Pmlr, 387–395.

Cited By

View all
  • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
  • (2024)Choosing the Right Path: A Comparative Study of CNN, RNN, and Transformer Models for Sequential Recommendation Systems2024 International Conference on Platform Technology and Service (PlatCon)10.1109/PlatCon63925.2024.10830671(77-82)Online publication date: 26-Aug-2024
  • (2024)Recommendation Systems with Non-stationary Transformer2024 29th International Conference on Automation and Computing (ICAC)10.1109/ICAC61394.2024.10718828(1-6)Online publication date: 28-Aug-2024
  • Show More Cited By

Index Terms

  1. Evaluating and Selecting Deep Reinforcement Learning Models for OptimalDynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        CCEAI '24: Proceedings of the 2024 8th International Conference on Control Engineering and Artificial Intelligence
        January 2024
        297 pages
        ISBN:9798400707971
        DOI:10.1145/3640824
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 March 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. DDPG (Deep Deterministic Policy Gradient)
        2. Deep Reinforcement Learning (DRL)
        3. Dynamic Pricing
        4. E-commerce
        5. Inventory Management
        6. Markov Decision Process
        7. Model Evaluation
        8. PPO (Proximal Policy Optimization)
        9. Price Elasticity of Demand
        10. SAC (Soft Actor-Critic)

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • XJTLU Key Program Special Fund (KSF-A-17).
        • XJTLU-REF-21-01-002

        Conference

        CCEAI 2024

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)247
        • Downloads (Last 6 weeks)32
        Reflects downloads up to 16 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
        • (2024)Choosing the Right Path: A Comparative Study of CNN, RNN, and Transformer Models for Sequential Recommendation Systems2024 International Conference on Platform Technology and Service (PlatCon)10.1109/PlatCon63925.2024.10830671(77-82)Online publication date: 26-Aug-2024
        • (2024)Recommendation Systems with Non-stationary Transformer2024 29th International Conference on Automation and Computing (ICAC)10.1109/ICAC61394.2024.10718828(1-6)Online publication date: 28-Aug-2024
        • (2024)Integrating Vision Localization and Deep Reinforcement Learning for High-Precision, Low-Cost Peg-in-Hole Assembly2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM)10.1109/CIS-RAM61939.2024.10673040(138-143)Online publication date: 8-Aug-2024
        • (2024)Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General EnvironmentsIEEE Access10.1109/ACCESS.2024.347247312(146795-146806)Online publication date: 2024
        • (2023)Revolutionising Financial Portfolio Management: The Non-Stationary Transformer’s Fusion of Macroeconomic Indicators and Sentiment Analysis in a Deep Reinforcement Learning FrameworkApplied Sciences10.3390/app1401027414:1(274)Online publication date: 28-Dec-2023

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media