Dynamic pricing with real-time demand learning

doi:10.1016/j.ejor.2005.01.041

European Journal of Operational Research

Volume 174, Issue 1, 1 October 2006, Pages 522-538

https://doi.org/10.1016/j.ejor.2005.01.041 Get rights and content

Abstract

In many service industries, the firm adjusts the product price dynamically by taking into account the current product inventory and the future demand distribution. Because the firm can easily monitor the product inventory, the success of dynamic pricing relies on an accurate demand forecast. In this paper, we consider a situation where the firm does not have an accurate demand forecast, but can only roughly estimate the customer arrival rate before the sale begins. As the sale moves forward, the firm uses real-time sales data to fine-tune this arrival rate estimation. We show how the firm can first use this modified arrival rate estimation to forecast the future demand distribution with better precision, and then use the new information to dynamically adjust the product price in order to maximize the expected total revenue. Numerical study shows that this strategy not only is nearly optimal, but also is robust when the true customer arrival rate is much different from the original forecast. Finally, we extend the results to four situations commonly encountered in practice: unobservable lost customers, time dependent arrival rate, batch demand, and discrete set of allowable prices.

Introduction

Dynamic pricing is a business strategy that adjusts the product price in a timely fashion in order to allocate the right service to the right customer at the right time. The rationale of dynamic pricing can be understood with an example of an airline company. When an airline sells seats in the same class, it offers different fares depending on time to departure and current seat inventory. The airline has the incentive to promote sale when the departure time is approaching with a lot of vacancies on hand, because each empty seat is worth nothing after the airplane takes off. On the other hand, the airline still wants to reserve a certain number of seats for possible last-minute travelers who are willing to pay substantially more in price. As a consequence, airfare often fluctuates in its selling horizon.

Products such as airline seats are called perishable products, which have three major characteristics: (1) the quantity is fixed and reordering is not possible; (2) there is a deadline for sale; and (3) the marginal cost of selling one more item is little, so most part of revenue goes directly to profit. Because of these characteristics, perishable products are particularly suitable for dynamic pricing. Besides being used extensively in the airline industry, dynamic pricing can also be found in other travel industries––such as hotel rooms [4], rental cars, and cruise lines [11]––to incorporate seasonal fluctuation in demand. Interested readers are referred to survey papers, such as [20] and [17], for an overview of dynamic pricing and its role in revenue management.

In general, there are two major sources of randomness in demand: customer arrival rate and customer reservation price distribution. Most existing literature concerning dynamic pricing assumes that both customer arrival rate and customer reservation price distribution are well known before the sale begins. In many service industries, however, whereas the seller can use historical data to estimate the customer reservation price to a good extent, it is rather difficult to accurately forecast the customer arrival rate before the sale begins. For example in the travel industry, the demand rates for air travel services and for hotel rooms on different days may be different if an event––such as a commencement, a trade show, or a conference––takes place at the destination city. For another example in the entertainment industry, when a pop singer goes on an international tour, it is relatively easy to know how much a loyal fan is willing to pay for a ticket, but it is rather difficult to know how many fans there are in each city and how many of those fans will be aware of the event. In these cases, if the seller roughly estimates the customer arrival rate and dynamically sets the product price based on this rough estimation, he faces a significant risk. If the true customer arrival rate is much lower than the estimated rate, the seller will end up with many unsold items at the end. On the other hand, if the true arrival rate is much higher, the seller will be out of stock quickly and loses the opportunity to take advantage of the excess demand. The dynamic pricing literature does not adequately address this risk.

In this paper, we present a dynamic pricing model where customers arrive in accordance with a conditional Poisson process, whose rate is not known to the seller in advance. Instead, through preliminary pre-sale market research, the seller obtains a prior distribution of the customer arrival rate. As the sale moves forward, the seller uses real-time sales data from the realized demand to fine-tune the arrival rate estimation, and then uses the fine-tuned arrival rate estimation to better understand the demand curve in the future. Consequently, the seller updates the future demand distribution in real time, and then dynamically sets the product price to maximize the expected total revenue.

In recent years, the problem of dynamic pricing has drawn much attention. Most research on dynamic pricing assumes that the customers arrive according to a stochastic process that has independent increments; that is, the numbers of customers in disjoint time intervals are independent random variables. With this assumption, knowing the number of customers that have shown up so far provides no additional information about how many more customers will show up later on, so learning is not possible. For example, in a continuous-time setting, a common assumption is that customers arrive according to a Poisson process with a given intensity function [6], [8], [9]. In a discrete time setting, time is divided into small intervals such that in each time interval there is a small probability a customer will arrive, independent of everything else [13], [19]. With the assumption that the demand process has independent increments, the problem is often formulated as a Markov decision process. In most cases, it can be shown that the optimal product price increases in the remaining time and decreases in the current inventory level. However, because the optimal policy is difficult to derive, most research focuses on developing heuristic policies.

Learning models have been studied in the operations management literature to better forecast the future demand curve. Most work assumes that price is exogenous, while the firm decides how much inventory to replenish in each time period [2], [12], [16]. Learning models that incorporate both price and replenishment decisions include [3] and [18]. In both papers, the demand curve in each time period is a deterministic and identical function of the price, while the parameters of the function are unknown to the decision maker. Based on the realized demand in early periods, the decision maker learns about the demand curve in order to set a proper price later on. Burnetas and Smith [5] considered a similar problem except that the demands in different periods are independent and identically distributed random variables. They developed a policy with which the realized profit converges with probability one to the optimal value under complete information. These learning models are different from our model because the seller learns from repetition of identical experiments (same flight number through different days), and in our model the seller learns throughout the sales horizon of a single event.

The rest of this paper is organized as follows. In Section 2, we introduce a dynamic pricing model where customers arrive according to a conditional Poisson process. We show how the seller can improve the estimation on the customer arrival rate from the real-time sales data as the sale moves forward. Motivated by these preliminary results, we consider a surrogate dynamic pricing model and derive its optimal policy in Section 3. Then in Section 4, we use the results from this surrogate model to develop the variable-rate policy for the original problem described in Section 2. The numerical experiments show that this variable-rate policy is not only nearly optimal, but also robust even when the pre-sale estimation on the customer arrival rate is relatively poor. In Section 5, we extend the results to four settings that are often encountered in practice: (1) lost customers are not observable; (2) the customer arrival process is non-stationary; (3) each customer can request more than one item; and (4) the allowable price set is discrete. Finally we conclude the paper and discuss future research directions in Section 6.

Section snippets

The model and preliminaries

Consider a dynamic pricing model where a seller sells a given stock of identical items over a finite time horizon [0, T]. Customers arrive according to a conditional Poisson process with an unknown rate Λ. Upon arrival, a customer will purchase one item if the posted product price is lower than her reservation price, or leaves empty-handed otherwise. We assume the reservation prices of all customers are independent and identically distributed with a continuous cumulative distribution function F.

A surrogate model

It is difficult, if not impossible, to derive the optimal policy for the continuous-time dynamic pricing problem in the previous section, because the distribution of the customer arrival rate depends on both time elapsed and the number of customers that have shown up. In Section 3.1, we consider a surrogate dynamic pricing problem, which is motivated by the observation that whenever the seller needs to set the product price for an arriving customer, the number of future customers follows a

Dynamic pricing with real-time demand learning

In this section, we return to the model in Section 2, where the seller updates the demand distribution in real time. In Section 4.1, we present a dynamic pricing policy such that the seller sets the product price based on the updated demand distribution. We then present numerical examples to demonstrate this policy’s efficiency in Section 4.2 and its robustness in Section 4.3.

Extensions

We next consider several extensions to the basic model. The first extension deals with the situation when the seller can track only the number of items sold but not the number of customers. The second extension considers the case when the customer arrival rate is time dependent. For these two extensions, we modify the VR policy so that the seller can quote the price from the same three-dimensional table discussed in Section 4.1. The third extension allows each customer to buy multiple items,

Conclusions

In this paper we present a dynamic pricing model where the seller needs to sell a given stock of identical items by a deadline. Unlike traditional dynamic pricing models where the seller knows the customer arrival rate, a key assumption in our model is that the seller can only estimate the arrival rate. As the sale moves forward, the seller collects the sales data in real time to fine-tune the customer arrival rate estimation. He then uses this fine-tuned arrival rate estimation to better

Acknowledgements

The author thanks anonymous referees for careful reviews and helpful comments. This material is based upon work supported by the National Science Foundation under Grant No. 0223314. Most of the work was done when the author was in the Grado Department of Industrial and Systems Engineering at Virginia Tech.

References (20)

R.E. Chatwin
Optimal dynamic pricing of perishable products with stochastic demand and a finite set of prices
European Journal of Operational Research
(2000)
W.M. Kincaid et al.
An inventory pricing problem
Journal of Mathematical Analysis and Applications
(1963)
S.P. Ladany et al.
Optimal cruise-liner passenger cabin pricing policy
European Journal of Operational Research
(1991)
N. Agrawal et al.
Estimating negative binomial demand for retail inventory management with unobservable lost sales
Naval Research Logistics
(1996)
K.S. Azoury
Bayes solutions to dynamic inventory models under unknown demand distribution
Management Science
(1985)
R.J. Balvers et al.
Actively learning about demand and the dynamics of price adjustment
The Economic Journal
(1990)
G.R. Bitran et al.
An application of yield management to the hotel industry considering multiple day stays
Operations Research
(1995)
A.N. Burnetas et al.
Adaptive ordering and pricing for perishable products
Operational Research
(2000)
J.M. Feldman
Fares: To raise or not to raise. Air Transport World
(1990)
Y. Feng et al.
A continuous-time yield management model with multiple prices and reversible price changes
Management Science
(2000)

There are more references available in the full text version of this article.

Cited by (89)

Distributed dynamic pricing of multiple perishable products using multi-agent reinforcement learning
2024, Expert Systems with Applications
Revenue management (RM) is essential for a wide range of industries such as airlines, hotels, cruise lines, fashion, and seasonal retail. This paper focuses on the multi-perishable-product dynamic pricing (MPPDP) problem, a significant research field in RM, where a company sells multiple interactive and perishable products over a limited selling window without replenishment. Most studies in this field assume customer behavior, which is modeled by demand function, is known in advance. Even when considering uncertainty in customer behavior, most studies still assume the mathematical form or structural properties of the underlying demand function are known in advance. However, these assumptions are usually inconsistent with the actual market situation. Recently, Reinforcement Learning (RL), a potent technique for handling sequential decision-making problems, has been increasingly applied to solve complex dynamic pricing problems without relying on any assumption about demand functions. However, the curse of dimensionality poses a challenge for currently used centralized RL algorithms when solving the MPPDP problem due to the exponential expansion of the joint price space with the number of products. To address this issue, our paper proposes a distributed dynamic pricing framework and innovatively models the MPPDP problem as a Fully Cooperative Markov Game solved by Multi-Agent Reinforcement Learning (MARL). Additionally, we use counterfactual baselines to design appropriate agent-specific reward signals that facilitate faster learning for the agents in our established multi-agent cooperative system. Finally, two MARL-based distributed dynamic pricing algorithms, Counterfactual Q-learning, and Counterfactual DQN, are proposed for the MPPDP problem. Through the case studies on four computer-simulated markets, we show that our algorithms can alleviate the curse of dimensionality faced by centralized RL algorithms, expedite the learning process, and demonstrate satisfactory performance without relying on any assumption about demand functions. In conclusion, our work provides an effective MARL-based distributed dynamic pricing framework and algorithms for companies to efficiently price their multiple perishable products in modern highly uncertain markets.
Resource efficient PV power forecasting: Transductive transfer learning based hybrid deep learning model for smart grid in Industry 5.0
2023, Energy Conversion and Management: X
This paper presents an innovative approach for enhancing power output forecasting of Photovoltaic (PV) power plants in dynamic environmental conditions using a Hybrid Deep Learning Model (DLM). The hybrid DLM employs a synergy of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, and Bidirectional LSTM (Bi-LSTM), effectively capturing spatial and temporal dependencies within weather data crucial for accurate predictions. To optimize the DLM’s performance efficiently, a unique Kepler Optimization Algorithm (KOA) is introduced for hyperparameter tuning, drawing inspiration from Kepler’s laws of planetary motion. By leveraging KOA, the DLM attains optimal hyperparameter configurations, elevating power output prediction precision. Additionally, this study integrates Transductive Transfer Learning (TTL) with the deep learning models to enhance resource efficiency. By leveraging knowledge gained from previously learned tasks, TTL enables the DLM to improve its forecasting capabilities while minimizing resource utilization. Datasets encompassing environmental parameters and PV plant-generated power across diverse sites are employed for DLM training and testing. Three hybrid models, amalgamating KOA, CNN, LSTM, and Bi-LSTM techniques, are introduced and evaluated. Comparative assessment of these models across distinct PV sites yields insightful observations. Performance evaluation, focused on short-term PV power forecasting, underscores the hybrid DLM’s superiority over individual CNN and LSTM models. This hybrid approach achieves remarkable accuracy and resilience in predicting power output under varying weather conditions, showcasing its potential for efficient PV power plant management.
Target-initiated takeover with search frictions
2023, European Journal of Operational Research
Although most takeover theories presume the acquirers initiate the transactions, many transactions are in reality initiated by targets. This paper attempts to understand such a target’s sellout timing and pricing decisions by developing a target-initiated model, in which the firm randomly and periodically meets high- and low-type acquirers and makes take-it-or-leave-it price offers. The model captures the illiquidity and heterogeneity of sellout opportunities. Optimally, the firm takes either a high-price strategy—making only a high-price offer in a good economic state— or a flexible strategy—even making a low-price offer in a very good economic state— based on a tradeoff between sellout pricing and timing efficiency. With higher frequency and heterogeneity of acquirers, the firm makes a high-price offer more eagerly and a low-price offer more restrictively. With asymmetric information, where the acquirer types are unobservable, the firm reduces the acquirer’s information rent by making a high-price offer more eagerly and restricting a low-price offer more severely. With higher economic state volatility, the low-type sellout probability increases because the economic state has a higher potential to increase beyond the low-price sellout threshold. The jump in firm value at the sellout time (i.e., target stock price reaction) is not monotonic with respect to the economic state and arrival rate due to the interactions between the sellout price and timing.
Green vehicle routing and dynamic pricing for scheduling on-site services
2022, International Journal of Production Economics
In this paper, we study a problem where a company sends engineers with vehicles to customer sites to provide services. Customers request the service through a website or by calling a call centre and the company needs to allocate the service tasks to time windows and decide on how to schedule these jobs to their vehicles. We propose a new approach to this problem which applies low-emission vehicle routing techniques with dynamic pricing to reduce CO₂ emissions and maximise profit. When a customer requests a service, the company will provide the customer with different service time-window options and their corresponding prices. Prices are differentiated to influence the customer's choice. To help the company in determining the prices, our approach solves the problem in two phases. The first phase solves a time-dependent vehicle routing model with the objective of minimising CO₂ emissions for each of the time window options and the second phase solves a dynamic pricing model to determine the service prices for these options to maximise profit. Metaheuristic methods are applied for real-life business applications which enable the solution framework to be applied online where shorter computational time is required. The approach is tested through numerical experiments. Results show that dynamic pricing leads to a reduction in CO₂ emissions and an improvement in profit.
Learning customer preferences and dynamic pricing for perishable products
2022, Computers and Industrial Engineering
This research proposes a revenue management framework for perishable products when customer preferences are unknown before the selling season begins. In this research, customer preferences are measured by the distribution of customer willingness to pay (WTP). When the WTP distribution is initially unknown, a long short-term memory (LSTM) neural network is adopted to quickly learn the distribution by using limited selling data in early periods of the selling season. The average LSTM estimation error is less than 5% in the fifth period of the selling horizon and approximately 1% in the 25th period when the WTP follows a Gaussian distribution with an unknown mean. The estimation of WTP distribution is then used by a dynamic pricing model to generate optimal price decisions. To reduce the calculation burden of the pricing model, we present the existence of a lower bound on the optimal price, provided that the coefficient of variation of WTP distribution is bounded by 80%. In our numerical study, the proposed pricing framework is benchmarked against the optimal pricing strategy under a known customer WTP distribution, and the revenue difference caused by the unknown WTP distributions is less than 2% in most cases. This small revenue difference represents the costs to learn the unknown customer preferences. For perishable products without inventory replenishment, such as airline tickets or hotel rooms, the small preference learning costs make the proposed framework especially valuable.
Dynamic ordering decisions with approximate learning of supply yield uncertainty
2022, International Journal of Production Economics
We consider the real-life problem of a coach bus manufacturer located in Turkey, facing the problem of setting ordering quantities for a part procured from an unreliable supplier, where the number of items delivered is binomially distributed with an unknown yield parameter, p. We use the well-defined finite-horizon planning context with deterministic demand per period, purchasing, holding, and shortage costs to investigate the effectiveness of a fill-rate based approximate learning scheme in comparison to an exact Bayesian learning scheme, where observations on the supplier's delivery performance are used to update the assumed distribution of p. We formulate the exact optimal learning problem as a Bayes-adaptive Markov decision process and solve the corresponding finite horizon stochastic dynamic program to provide insights on the value of online learning in comparison to the unrealistic perfect information ( $PI$ ) and no information ( $NT$ ) benchmarks. We contrast the performance of the so-called Bayesian Updating ( $BU$ ) policy to other practical approaches such as using an assumed/guessed value of p and implementing a constant safety stock. Noting the significant value of learning, we finally study the effectiveness of an approximate learning formulation that does not enjoy the asymptotic consistency and convergence properties but involves much lower computational burden, and demonstrate its confounding performance, at times beating the $BU$ policy with exact Bayesian updates.

View all citing articles on Scopus

View full text

Published by Elsevier B.V.

Production, Manufacturing and LogisticsDynamic pricing with real-time demand learning

Abstract

Introduction

Section snippets

The model and preliminaries

A surrogate model

Dynamic pricing with real-time demand learning

Extensions

Conclusions

Acknowledgements

European Journal of Operational Research

Journal of Mathematical Analysis and Applications

European Journal of Operational Research

Estimating negative binomial demand for retail inventory management with unobservable lost sales

Naval Research Logistics

Bayes solutions to dynamic inventory models under unknown demand distribution

Management Science

Actively learning about demand and the dynamics of price adjustment

The Economic Journal

An application of yield management to the hotel industry considering multiple day stays

Operations Research

Adaptive ordering and pricing for perishable products

Operational Research

Fares: To raise or not to raise. Air Transport World

A continuous-time yield management model with multiple prices and reversible price changes

Management Science

Production, Manufacturing and Logistics
Dynamic pricing with real-time demand learning