Abstract
The main purpose of this paper is to provide a solution, through which one can efficiently reduce the population of cancer cells by injecting the lowest dose of the drug; therefore, reducing the side effects of the drug on healthy cells. In this paper, a mathematical model of stem Chronic Myelogenous Leukemia (CML) is used. To this aim, a hybrid method is used, that is a combination of the Eligibility Traces algorithm and Neural Networks. The eligibility traces algorithm is one of the well-known methods for solving problems under the Reinforcement Learning (RL) approach. The reason is that the population of cancer cells can be controlled with a higher accuracy and will have a significant impact on dosage of injection. The eligibility traces algorithm has the advantage of backward view, meaning it will investigate previous states, as well. That will result in improving the learning procedure, speed of reduction in cancer cells population and the total dosage of the injected drug during the treatment period, in patients with CML. Combination of the mentioned method and neural networks has provided continuous states in the considered problem. Hence, there will be no limitation for considering all possible states for solving the problem. Moreover, this can accelerate obtaining the optimal dosage with a high accuracy, which is a significant advantage of the proposed method. To show the effectiveness of the proposed method to control the population of cancer cells and obtaining the optimal dosage, it is compared with four different cases: when only the eligibility traces algorithm is employed, in the case only the Q-learning algorithm is used, when the Optimal Control is applied and in the case no dosage is injected. Finally, it is revealed that the combinatory method of the eligibility traces algorithm and neural networks can control the population of cancer cells more quickly, with a higher accuracy as well as applying a lower dosage of the drug.
Similar content being viewed by others
References
Aïnseba BE, Benosman C (2010) Optimal control for resistance and suboptimal response in CML. Math Biosci 227:81–93
Aïnseba BE, Benosman C (2011) Cml dynamics: optimal control of age-structured stem cell population. Math Comput Simul 81:1962–1977
Angstreich GR, Smith BD, Jones RJ (2004) Treatment options for chronic myeloid leukemia: imatinib versus interferon versus allogeneic transplant. Curr Opin Oncol 16:95–99
Banjar H, Adelson D, Brown F, Chaudhri N (2017) Intelligent techniques using molecular data analysis in leukaemia: an opportunity for personalized medicine support system. BioMed Res Int. https://doi.org/10.1155/2017/3587309
Choy MC, Srinivasan D, Cheu RL (2006) Neural networks for continuous online learning and control. IEEE Trans Neural Netw 17:1511–1531
Colijn C, Mackey MC (2005) A mathematical model of hematopoiesis—I. Periodic chronic myelogenous leukemia. J Theoret Biol 237:117–132
Dai B, Shaw A, Li L, Xiao L, He N, Liu Z, Chen J, Song L (2018) SBEED: convergent reinforcement learning with nonlinear function approximation. In: International conference on machine learning. PMLR, pp 1125–1134
Dayan P (1992) The convergence of TD (λ) for general λ. Mach Learn 8:341–362
Duguleana M, Mogan G (2016) Neural networks based reinforcement learning for mobile robots obstacle avoidance. Expert Syst Appl 62:104–115
Dupuis X (2014) Optimal control of leukemic cell population dynamics. Math Model Nat Phenom 9:4–26
Faußer S, Schwenker F (2015) Neural network ensembles in reinforcement learning. Neural Process Lett 41:55–69
Gajewski J, Vališ D (2021) Verification of the technical equipment degradation method using a hybrid reinforcement learning trees–artificial neural network system. Tribol Int 153:106618
Gal OAN, Fan Y, Meerzaman D (2019) Predicting complete remission of acute myeloid leukemia: machine learning applied to gene expression. Cancer Inf. https://doi.org/10.1177/1176935119835544
Geramifard A, Bowling M, Zinkevich M, Sutton RS (2007) iLSTD: eligibility traces and convergence analysis. In: Advances in Neural information processing systems, 2007, pp 441–448
Gholizade-Narm H, Noori A (2018) Control the population of free viruses in nonlinear uncertain hiv system using q-learning. Int J Mach Learn Cybern 9:1169–1179
Greer JPFJ, Lukens JN (2003) Wintrobe’s clinical hematology, 11th edn. Lippincott Williams & Wilkins, Philadelphia
Huang B-Q, Cao G-Y, Guo M (2005) Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In: 2005 International Conference on Machine Learning and Cybernetics, 2005. IEEE, pp 85–89
Huang C, Clayton EA, Matyunina LV, McDonald LD, Benigno BB, Vannberg F, McDonald JF (2018) Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy. Sci Rep 8:16444
Jagadev P, Virani H (2017) Detection of leukemia and its types using image processing and machine learning. In: 2017 International Conference on Trends in Electronics and Informatics (ICEI), 2017. IEEE, pp 522–526
Komarova NL (2011) Mathematical modeling of cyclic treatments of chronic myeloid leukemia. Mathe BiosciEng 8:289
Liu H, Yu C, Yu C, Chen C, Wu H (2020) A novel axle temperature forecasting method based on decomposition, reinforcement learning optimization and neural network Advanced Engineering Informatics 44:101089
Liu Z-B, Zeng X-Q, Xu Y, Yu J-G (2015) Learning to control by neural networks using eligibility traces. Cont Theo Appl 32:887–894
Mazdeyasna S, Jafari A, Hadjati J, Allahverdy A, Alavi-Moghaddam M (2015) Modeling the effect of chemotherapy on melanoma B16F10 in mice using cellular automata and genetic algorithm in tapered dosage of fbs and cisplatin. Front Biomed Technol 2:103–108
Moore H, Li NK (2004) A mathematical model for chronic myelogenous leukemia (CML) and T cell interaction. J Theoret Biol 227:513–523
Mosquera Orgueira A et al (2019) Time to treatment prediction in chronic lymphocytic leukemia based on new transcriptional patterns. Front Oncol 9:79
Nanda S, Moore H, Lenhart S (2007) Optimal control of treatment in a mathematical model of chronic myelogenous leukemia. Math Biosci 210:143–156
Nissen S (2007) Large scale reinforcement learning using q-sarsa (λ) and cascading neural networks Unpublished masters thesis, Department of Computer Science, University of Copenhagen, København, Denmark
Noel MM, Pandian BJ (2014) Control of a nonlinear liquid level system using a new artificial neural network based reinforcement learning approach. Appl Soft Comput 23:444–451
Noori A, Sadrnia MA (2017) Glucose level control using temporal difference methods. In: 2017 Iranian Conference on Electrical Engineering (ICEE), 2017. IEEE, pp 895–900
Radivoyevitch T, Hlatky L, Landaw J, Sachs RK (2012) Quantitative modeling of chronic myeloid leukemia: insights from radiobiology blood. J Am Soc Hematol 119:4363–4371
Rădulescu I, Cândea D, Halanay AA (2015) Complex mathematical model with competition in leukemia with immune response-an optimal control approach. In: IFIP Conference on System Modeling and Optimization, 2015. Springer, pp 430–441
Rădulescu IR, Cândea D, Halanay AA (2013) Control delay differential equations model of evolution of normal and leukemic cell populations under treatment. In: IFIP Conference on system modeling and optimization, 2013. Springer, pp 257–266
Roeder I, Horn M, Glauche I, Hochhaus A, Mueller MC, Loeffler M (2006) Dynamic modeling of imatinib-treated chronic myeloid leukemia: functional insights and clinical implications. Nat Med 12:1181
Sasaki K et al (2019) The impact of treatment recommendation by leukemia artificial intelligence program (LEAP) on survival in patients with chronic myeloid leukemia in chronic phase (CML-CP). American Society of Hematology, Washington, DC
Schäfer AM (2008) Reinforcement learning with recurrent neural networks
Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mahc Learn 38:287–308
Sirin U, Polat F, Alhajj R (2013) Employing batch reinforcement learning to control gene regulation without explicitly constructing gene regulatory networks. In: Twenty-Third International Joint Conference on artificial intelligence, 2013
Smart WD, Kaelbling LP Practical reinforcement learning in continuous spaces. In: ICML, 2000. Citeseer, pp 903–910
Srisukkham W, Zhang L, Neoh SC, Todryk S, Lim CP (2017) Intelligent leukaemia diagnosis with bare-bones PSO based feature optimization. Appl Soft Comput 56:405–419
Stanley KO, Miikkulainen R Efficient reinforcement learning through evolving neural network topologies. In: Proceedings of the 4th Annual Conference on genetic and evolutionary computation, 2002, pp 569–577
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Massachusetts
Szepesvári C(1998) The asymptotic convergence-rate of Q-learning. In: Advances in neural information processing systems, 1998, pp 1064–1070
Taiwo O, Kasali F, Akinyemi I, Kuyoro S, Awodele O, Ogbaro D, Olaniyan T (2019) Stratification of chronic myeloid leukemia cancer dataset into risk groups using four machine learning algorithms with minimal loss function. Afr J Manag Inf Syst 1:1–18
Todorov Y, Nuernberg F (2014) Optimal therapy protocols in the mathematical model of acute leukemia with several phase constraints. Optim Control Appl Methods 35:559–574
Tordesillas J, Arbelaiz J (2019) Personalized cancer chemotherapy schedule: a numerical comparison of performance and robustness in model-based and model-free scheduling methodologies arXiv preprint arXiv:190401200
Tseng HH, Luo Y, Cui S, Chien JT, Ten Haken RK, El Naqa I (2017) Deep reinforcement learning for automated radiation adaptation in lung cancer. Med Phys 44:6690–6705
Van Hasselt H (2007) Wiering MA reinforcement learning in continuous action spaces. In: 2007 IEEE International Symposium on approximate dynamic programming and reinforcement learning, 2007. IEEE, pp 272–279
Wang X, Si L, Guo J (2014) Treatment algorithm of metastatic mucosal melanoma. Chin Clin Oncol 3:38
Wei Q (2016) Application of machine learning techniques to acute myeloid leukemia. PhD diss
Xu B, Yang C, Shi Z (2013) Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Trans Neural Netw Learn Syst 25:635–641
Yu H (2015) On convergence of emphatic temporal-difference learning. In: Conference on learning theory, 2015, pp 1724–1751
Zheng Y, Jiang Y (2015) mTOR inhibitors at a glance. Mol Cell Pharmacol 7:15
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest with any person(s) or Organization(s).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kalhor, E., Noori, A. & Noori, G. Cancer cells population control in a delayed-model of a leukemic patient using the combination of the eligibility traces algorithm and neural networks. Int. J. Mach. Learn. & Cyber. 12, 1973–1992 (2021). https://doi.org/10.1007/s13042-021-01287-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01287-8