Abstract
Quantum Computing promises the availability of computational resources and generalization capabilities well beyond the possibilities of classical computers. An interesting approach for leveraging the near-term, Noisy Intermediate-Scale Quantum Computers, is the hybrid training of Parameterized Quantum Circuits (PQCs), i.e. the optimization of a parameterized quantum algorithms as a function approximation with classical optimization techniques. When PQCs are used in Machine Learning models, they may offer some advantages over classical models in terms of memory consumption and sample complexity for classical data analysis. In this work we explore and assess the advantages of the application of Parametric Quantum Circuits to one of the state-of-art Reinforcement Learning algorithm for continuous control - namely Soft Actor-Critic. We investigate its performance on the control of a virtual robotic arm by means of digital simulations of quantum circuits. A quantum advantage over the classical algorithm has been found in terms of a significant decrease in the amount of required parameters for satisfactory model training, paving the way for further developments and studies. A quantum advantage over the classical algorithm has been found in terms of a significant decrease in the amount of required parameters for satisfactory model training, paving the way for further developments and studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, H., et al.: Deep reinforcement learning: a survey. Front. Inf. Technol. Electron. Eng. 21 (2020)
Liu, R., Nageotte, F., Zanne, P., Mathelin, M., Dresp-Langley, B.: Deep reinforcement learning for the control of robotic manipulation: a focused mini-review. Robotics 10, 22 (2021). https://doi.org/10.3390
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. The MIT Press (2018). http://incompleteideas.net/book/the-book-2nd.html
Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press (2011)
Preskill, J.: Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018). https://doi.org/10.22331
Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. 4, 043001 (2019). https://doi.org/10.1088
Schuld, M., Petruccione, F.: Machine Learning with Quantum Computers. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83098-4. https://books.google.it/books?id=-N5IEAAAQBAJ
Banchi, L., Pereira, J., Pirandola, S.: Generalization in quantum machine learning: a quantum information standpoint. PRX Quantum 2 (2021). https://doi.org/10.1103
Heimann, D., Hohenfeld, H., Wiebe, F., Kirchner, F.: Quantum deep reinforcement learning for robot navigation tasks. arXiv (2022). https://arxiv.org/abs/2202.12180
Chen, S., et al.: Variational quantum circuits for deep reinforcement learning. IEEE Access 8, 141007–141024 (2020)
Valdez, F., Melin, P.: A review on quantum computing and deep learning algorithms and their applications. Soft Comput. (2022)
Nian, R., Liu, J., Huang, B.: A review on reinforcement learning: introduction and applications in industrial process control. Comput. Chem. Eng. 139, 106–886 (2020). https://www.sciencedirect.com/science/article/pii/S0098135420300557
Lillicrap, T., et al.: Continuous control with deep reinforcement learning. arXiv (2015). https://arxiv.org/abs/1509.02971
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2005). https://cds.cern.ch/record/1319893
Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Soc. Ind. Appl. Math. 42 (2001)
Li, H., Lau, T.: Reinforcement learning: prediction, control and value function approximation. arXiv (2019). https://arxiv.org/abs/1908.10771
Han, M., Zhang, L., Wang, J., Pan, W.: Actor-critic reinforcement learning for control with stability guarantee. arXiv (2020). https://arxiv.org/abs/2004.14288
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv (2018). https://arxiv.org/abs/1801.01290
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Crooks, G.: Gradients of parameterized quantum gates using the parameter-shift rule and gate decomposition. arXiv (2019). https://arxiv.org/abs/1905.13311
Lan, Q.: Variational quantum soft actor-critic. arXiv (2021). https://arxiv.org/abs/2112.11921
Pérez-Salinas, A., Cervera-Lierta, A., Gil-Fuster, E., Latorre, J.: Data re-uploading for a universal quantum classifier. Quantum 4, 226 (2020). https://doi.org/10.22331
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv (2016). https://arxiv.org/abs/1610.00633
Ibarz, J., Tan, J., Finn, C., Kalakrishnan, M., Pastor, P., Levine, S.: How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021). https://doi.org/10.1177
Kilinc, O., Montana, G.: Reinforcement learning for robotic manipulation using simulated locomotion demonstrations. Mach. Learn. 111, 465–486 (2021). https://doi.org/10.1007
Catto, E.: Box2D, a 2D physics engine for games (2011). https://box2d.org/
Brockman, G., et al.: OpenAI Gym. arXiv (2016). https://arxiv.org/abs/1606.01540
Van Rossum, G., Drake, F.: Python 3 reference manual. CreateSpace (2009)
Broughton, M., et al.: TensorFlow quantum: a software framework for quantum machine learning. arXiv (2020). https://arxiv.org/abs/2003.02989
Gidney, C., et al.: Cirq, Zenodo (2022). https://doi.org/10.5281/zenodo.6599601
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org
Acuto, A., Barillà, P., Bozzolo, L., Conterno, M., Pavese, M., Policicchio, A.: Variational quantum soft actor-critic for robotic arm control (2022)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014). https://arxiv.org/abs/1412.6980
Acknowledgements
This work was fully funded by NTT DATA Corporation (Japan) and supported by NTT DATA Italia S.p.A. (Italy).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Policicchio, A., Acuto, A., Barillà, P., Bozzolo, L., Conterno, M. (2025). A Variational Quantum Soft Actor-Critic Algorithm for Continuous Control Tasks. In: Sergeyev, Y.D., Kvasov, D.E., Astorino, A. (eds) Numerical Computations: Theory and Algorithms. NUMTA 2023. Lecture Notes in Computer Science, vol 14478. Springer, Cham. https://doi.org/10.1007/978-3-031-81247-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-81247-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-81246-0
Online ISBN: 978-3-031-81247-7
eBook Packages: Computer ScienceComputer Science (R0)