A reinforcement learning approach for individualizing erythropoietin dosages in hemodialysis patients

https://doi.org/10.1016/j.eswa.2009.02.041Get rights and content

Abstract

This paper presents a reinforcement learning (RL) approach for anemia management in patients undergoing chronic renal failure. Erythropoietin (EPO) is the treatment of choice for this kind of anemia but it is an expensive drug and with some dangerous side-effects that should be considered especially for patients who do not respond to the treatment. Therefore, an individualized treatment appears to be necessary. RL is a suitable approach to tackle this problem. Moreover, resulting policies are similar to medical protocols, and hence, they can easily be transferred to daily practice. A cohort of 64 patients are included in the study. An implementation of the Q-learning algorithm based on a state-aggregation table and another implementation using the multi-layer perceptron as a function approximator (Q-MLP) are compared with the protocols followed in the Nephrology Unit. The policy obtained by the Q-MLP approach outperforms the hospital policy in terms of the ratio of patients that are within the targeted range of hemoglobin (11.5–12.5 g/dl) at the end of the analyzed period, since an increase of 25% is observed. It ensures an improvement in patients’ quality-of-life and considerable economic savings for the health care system due to both the expensiveness of EPO treatment and the costs incurred by the health care system in order to alleviate problems related to EPO over-dosing. It should be pointed out that the approach presented here is completely general, and therefore, it can be applied to any problem of drug dosage optimization.

Introduction

Patients who suffer from chronic renal failure (CRF) tend to suffer from an associated anemia, as well. Currently, erythropoietin (EPO) is the treatment of choice for this kind of anemia. The use of this drug has greatly reduced cardiovascular problems and the necessity of multiple transfusions. However, EPO is expensive, making the already costly CRF program even more so. There are significant risks associated with erythropoietic stimulating factors (ESFs) such as thrombo embolisms and vascular problems (Lynne Peterson, 2004), if hemoglobin (Hb) levels are too high or they increase too fast. Consequently, dosage optimization is critical to ensure adequate pharmacotherapy as well as a reasonable treatment cost.

Drug administration for chronic conditions is usually a trial-and-error procedure. The relationship between the drug dose and the patient’s response tend to be complex and non-linear, and, therefore, physicians prefer using protocols, which take into account the average response of populations of patients (Gaweda et al., 2005). There are two reasons why such protocols are not suitable for the problem tackled in this work, and an individualized treatment is preferred:

  • (1)

    The response to the treatment with EPO is highly dependent on the patient. The same dosages may have very different responses in different patients, most notably the so-called EPO-resistant patients, who do not respond to EPO treatment, even after receiving high dosages.

  • (2)

    It may be very dangerous to apply a trial-and-error procedure in real patients. In fact, in addition to high treatment cost, the health care system usually invests a considerable amount of money in alleviating side-effects directly related to the treatment.

The goal of this work is to learn a policy for EPO prescription using reinforcement learning (RL) (Sutton & Barto, 1998), that maintains patients within a targeted range of Hb. Since EPO administration is a sequential decision task where the long-term consequences are not known in advance, it is natural to formulate it as a RL problem. Unlike previous approaches (e.g. Bayesian theory, fuzzy logic, Artificial Neural Networks (ANNs), etc.) (Bellazzi, 1992, Bellazzi et al., 1994, Jacobs et al., 2001, Martín, Camps, et al., 2003, Martín, Soria, et al., 2003) based on supervised learning, where examples of “correct” behavior are available, RL methods do not require prior knowledge of optimal performance. Instead they work by learning to associate a reward signal (e.g. indicating proximity to the desired Hb concentration) with the actions taken in particular states, so that the amount of reward received in the future is maximized. Therefore, RL can potentially learn the correct sequence of EPO treatment decisions for each patient, that leads to long-term maintenance of desired Hb levels.

The organization of the paper is as follows. Sections 2 Related work, 3 Reinforcement learning (RL) review related work in this domain and RL, respectively. In Section 4, we present our experiments in learning EPO treatment policies. Section 5 discusses our particular approach, the achieved results and their potential implications. Finally, Section 6 gives some concluding remarks, and suggestions for future work.

Section snippets

Related work

There have been several approaches to optimizing EPO administration automatically. In Bellazzi (1992), EPO delivery was based on parametric identification in a Bayesian framework, in Bellazzi et al. (1994) a fuzzy rule-based control strategy was used, and a few studies used Artificial Neural Networks (ANNs) for individualized EPO dosing (Jacobs et al., 2001, Martín, Camps, et al., 2003, Martín, Soria, et al., 2003). Those approaches were based on using current and previous Hb levels, EPO

Reinforcement learning (RL)

RL is a methodology based on the theory of Markov decision processes (MDP), whose elements are defined as follows:

  • State (stS): All of the information that uniquely describes the environment at time t, where S is the set of all possible states.

  • Action (atA): Action taken by the agent at time t, from the set of possible actions A.

  • Policy (π(s,a)): Probability distribution over the actions in the state s.

  • Immediate reward (rt+1): Value returned by the environment to the agent depending on the

Experiments

We applied Q-learning to learn policies that can maintain Hb within a desired range based on patient state information. The most accepted recommendation is to maintain Hb within 11 and 13 g/dl (Steensma et al., 2006, Cohen et al., 2005). In this study, our target is to maintain Hb levels within 11.5 and 12.5 g/dl. This narrower range increases the sensitivity of alert criteria so that it is easier to identify patients for whom it is possible to improve treatment.

Q-learning was run off-line, i.e.,

Discussion

In this work, we have proposed a new approach to improve the management of anemia in hemodialysis patients treated with EPO. Other previous approaches (Martín, Camps, et al., 2003, Martín, Soria, et al., 2003) are good models for either classification or time regression, but they did not achieve a straightforward way to transfer the extracted knowledge to daily clinical practice. In this framework, RL appears to be a promising tool since it provides policies that can be consulted directly to

Conclusions and future work

This paper has presented an application of reinforcement learning to individualized EPO treatment for anemia management in hemodialysis patients. Two implementations of Q-learning with different action-value function representations have been compared: a state-aggregated table (Q-SA), and a multi-layer perceptron neural network (Q-MLP).

The results have shown that a policy obtained by Q-SA has achieved significant improvement in performance over the hospital policy on the training data, but has

Acknowledgements

This work has been partially supported by the Conselleria de Universitat, Empresa i Ciència de la Generalitat Valenciana with the Project ARVIV/2007/094, and by the Spanish Ministry of Education and Science with the Projects TIN2007-61006 and CSD2007-00018.

References (13)

  • A.E. Gaweda et al.

    Individualization of pharmacological anemia management using reinforcement learning

    Neural Networks

    (2005)
  • R. Bellazzi

    Drug delivery optimization through bayesian networks: An application to erythropoietin therapy in uremic anemia

    Computers and Biomedical Research

    (1992)
  • R. Bellazzi et al.

    Mathematical modeling of erythropoietin therapy in uremic anemia. Does it improve cost-effectiveness?

    Haematologica

    (1994)
  • Bertsekas, D., Borkar, V., & Nedi, A. (2003). Improved temporal difference methods with linear function approximation....
  • D. Bertsekas et al.

    Neuro-dynamic programming

    (1996)
  • Cohen, H., Dowling, T., & Goodin, S. (2005). Anemia in cancer and chronic kidney disease: Clinical and financial...
There are more references available in the full text version of this article.

Cited by (47)

  • Reinforcement learning strategies in cancer chemotherapy treatments: A review

    2023, Computer Methods and Programs in Biomedicine
    Citation Excerpt :

    The conceptual use of adaptive optimal control to model the conditions of disease was initiated in the mid-1970′s by Swan and Vincent in proposing the first recorded chemotherapy treatment planning system as an optimal control problem [48]. There have been several attempts approached by researchers so far in modeling the curing of diseases with MDP such as comprehensive surveys [49], optimal biopsy decision models [50], optimized HIV therapy models [51], diabetes treatment models [41,52,53], optimized anemia treating models [42,54], image-based cancer therapy models [55], anesthesia controllers [56], vascular tumor growth models [57]. Interestingly, it was suggested if properly implemented, RL would be expected to deliver critical results in the coming years, with a significant impact on the treatment and control of common chronic diseases by influencing the patient-centered health care with external factors such as weather, economic dynamics, and pollution exposure.

  • Reinforcement learning-based decision support system for COVID-19

    2021, Biomedical Signal Processing and Control
  • Reinforcement learning for intelligent healthcare applications: A survey

    2020, Artificial Intelligence in Medicine
    Citation Excerpt :

    The proposed method gives daily updates of the basal rate and insulin to carbohydrate ratio for the optimization of glucose regulation. Personalized erythropoietin dosages in hemodialysis patients is addressed in [41] using Q-learning. Erythropoietin is used to manage patients with anemia undergoing chronic renal failure.

  • Enhanced prediction of hemoglobin concentration in a very large cohort of hemodialysis patients by means of deep recurrent neural networks

    2020, Artificial Intelligence in Medicine
    Citation Excerpt :

    As a solution, and despite the difficulties, in the last two decades an assortment of models for hemoglobin prediction and/or individualized ESA dosage optimization have been proposed. Earlier works targeted the latter objective, focusing on approaches such as Fuzzy Control [18], Support Vector Regression [19], Reinforcement Learning [20–22], Model Predictive Control [23,24] or Rule-Based Systems [25,26] among others. Recently, the focus seems to have shifted towards building more accurate hemoglobin prediction models, as an obvious requirement for the development of better ESA dosage recommender systems, which can make use of such predictors to simulate the expected outcome of competing treatment choices and choose the optimal one.

View all citing articles on Scopus
View full text