Abstract
The high computational capacity that we have thanks to the new technologies allows us to communicate two great worlds such as optimization methods and machine learning. The concept behind the hybridization of both worlds is called Learnheuristics which allows to improve optimization methods through machine learning techniques where the input data for learning is the data produced by the optimization methods during the search process. Among the most outstanding machine learning techniques is Q-Learning whose learning process is based on rewarding or punishing the agents according to the consequences of their actions and this reward or punishment is carried out by means of a reward function. This work seeks to compare different Learnheuristics instances composed by Sine Cosine Algorithm and Q-Learning whose different lies in the reward function applied. Preliminary results indicate that there is an influence on the quality of the solutions based on the reward function applied.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bayliss, C., Juan, A.A., Currie, C.S., Panadero, J.: A learnheuristic approach for the team orienteering problem with aerial drone motion constraints. Appl. Soft Comput. 92, 106280 (2020)
Cisternas-Caneo, F., et al.: A data-driven dynamic discretization framework to solve combinatorial problems using continuous metaheuristics. In: Innovations in Bio-Inspired Computing and Applications, pp. 76–85. Springer International Publishing, Cham (2021)
Crawford, B., de la Barra, C.L.: Los algoritmos ambidiestros (2020). https://www.mercuriovalpo.cl/impresa/2020/07/13/full/cuerpo-principal/15/. Accessed 2 December 2021
Crawford, B., Soto, R., Astorga, G., García, J., Castro, C., Paredes, F.: Putting continuous metaheuristics to work in binary search spaces. Complexity, 2017 (2017)
Dorigo, M., Gambardella, L.M.: A study of some properties of Ant-Q. In: Voigt, H.-M., Ebeling, W., Rechenberg, I., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 656–665. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61723-X_1029
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019)
Fuchida, T., Aung, K.T., Sakuragi, A.: A study of q-learning considering negative rewards. Artif. Life Robot. 15(3), 351–354 (2010)
Michael, R.G., David, S.J.: Computers and intractability: a guide to the theory of np-completeness (1979)
Hussain, K., Zhu, W., Salleh, M.N.M.: Long-term memory harris\(\prime \) hawk optimization for high dimensional and optimal power flow problems. IEEE Access 7, 147596–147616 (2019)
Lanza-Gutierrez, J.M., et al.: Exploring further advantages in an alternative formulation for the set covering problem. Mathematical Problems in Engineering (2020)
Lazaric, A., Restelli, M., Bonarini, A.: Reinforcement learning in continuous action spaces through sequential monte carlo methods. Adv. Neural. Inf. Process. Syst. 20, 833–840 (2007)
Lemus-Romani, J., et al.: Ambidextrous socio-cultural algorithms. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 923–938. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_65
Mirjalili, S.: SCA: a sine cosine algorithm for solving optimization problems. Knowl. Based Syst. 96, 120–133 (2016)
Morales-Castañeda, B., Zaldivar, D., Cuevas, E., Fausto, F., Rodríguez, A.: A better balance in metaheuristic algorithms: does it exist? Swarm Evolut. Comput. 54, 100671 (2020)
Nareyek, A.: Choosing search heuristics by non-stationary reinforcement learning. In: Metaheuristics: Computer Decision-Making. Applied Optimization, vol. 86. Springer, Boston, MA (2003). https://doi.org/10.1007/978-1-4757-4137-7_25
Xinyan, O., Chang, Q., Chakraborty, N.: Simulation study on reward function of reinforcement learning in gantry work cell scheduling. J. Manuf. Syst. 50, 1–8 (2019)
Park, Y., Nielsen, P., Moon, I.: Unmanned aerial vehicle set covering problem considering fixed-radius coverage constraint. Comput. Oper. Res. 119, 104936 (2020)
Smith, B.M.: Impacs-a bus crew scheduling system using integer programming. Math. Program. 42(1–3), 181–187 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Tapia, D., et al.: A q-learning hyperheuristic binarization framework to balance exploration and exploitation. In: Florez, H., Misra, S. (eds.) ICAI 2020. CCIS, vol. 1277, pp. 14–28. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61702-8_2
Tapia, D., et al.: Embedding q-learning in the selection of metaheuristic operators: the enhanced binary grey wolf optimizar case. In: Proceeding of 2021 IEEE International Conference on Automation/XXIV Congress of the Chilean Association of Automatic Control (ICA-ACCA), IEEE ICA/ACCA 2021, ARTICLE IN PRESS (2021)
Vianna, S.S.V.: The set covering problem applied to optimisation of gas detectors in chemical process plants. Comput. Chem. Eng. 121, 388–395 (2019)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Zamli, K.Z., Din, F., Ahmed, B.S., Bures, M.: A hybrid q-learning sine-cosine-based strategy for addressing the combinatorial test suite minimization problem. PloS one 13(5), e0195675 (2018)
Zhang, L., Shaffer, B., Brown, T., Scott Samuelsen, G.: The optimization of dc fast charging deployment in california. Appl. Energy 157, 111–122 (2015)
Acknowledgements
Felipe Cisternas-Caneo and Marcelo Becerra-Rozas are supported by Grant DI Investigación Interdisciplinaria del Pregrado/VRIEA/PUCV/039.324/2020. Broderick Crawford and Wenceslao Palma are supported by Grant CONICYT /FONDECYT/REGULAR/1210810. Ricardo Soto is supported by Grant CONICYT/FONDECYT /REGULAR/1190129. Broderick Crawford, Ricardo Soto and Hanns de la Fuente-Mella are supported by Grant Núcleo de Investigación en Data Analytics/VRIEA /PUCV/039.432/2020. José Lemus-Romani is supported by National Agency for Research and Development (ANID)/Scholarship Program/DOCTORADO NACIONAL /2019-21191692.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Crawford, B. et al. (2021). A Comparison of Learnheuristics Using Different Reward Functions to Solve the Set Covering Problem. In: Dorronsoro, B., Amodeo, L., Pavone, M., Ruiz, P. (eds) Optimization and Learning. OLA 2021. Communications in Computer and Information Science, vol 1443. Springer, Cham. https://doi.org/10.1007/978-3-030-85672-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-85672-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85671-7
Online ISBN: 978-3-030-85672-4
eBook Packages: Computer ScienceComputer Science (R0)