Skip to main content

A Variational Quantum Soft Actor-Critic Algorithm for Continuous Control Tasks

  • Conference paper
  • First Online:
Numerical Computations: Theory and Algorithms (NUMTA 2023)

Abstract

Quantum Computing promises the availability of computational resources and generalization capabilities well beyond the possibilities of classical computers. An interesting approach for leveraging the near-term, Noisy Intermediate-Scale Quantum Computers, is the hybrid training of Parameterized Quantum Circuits (PQCs), i.e. the optimization of a parameterized quantum algorithms as a function approximation with classical optimization techniques. When PQCs are used in Machine Learning models, they may offer some advantages over classical models in terms of memory consumption and sample complexity for classical data analysis. In this work we explore and assess the advantages of the application of Parametric Quantum Circuits to one of the state-of-art Reinforcement Learning algorithm for continuous control - namely Soft Actor-Critic. We investigate its performance on the control of a virtual robotic arm by means of digital simulations of quantum circuits. A quantum advantage over the classical algorithm has been found in terms of a significant decrease in the amount of required parameters for satisfactory model training, paving the way for further developments and studies. A quantum advantage over the classical algorithm has been found in terms of a significant decrease in the amount of required parameters for satisfactory model training, paving the way for further developments and studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang, H., et al.: Deep reinforcement learning: a survey. Front. Inf. Technol. Electron. Eng. 21 (2020)

    Google Scholar 

  2. Liu, R., Nageotte, F., Zanne, P., Mathelin, M., Dresp-Langley, B.: Deep reinforcement learning for the control of robotic manipulation: a focused mini-review. Robotics 10, 22 (2021). https://doi.org/10.3390

  3. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. The MIT Press (2018). http://incompleteideas.net/book/the-book-2nd.html

  4. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press (2011)

    Google Scholar 

  5. Preskill, J.: Quantum computing in the NISQ era and beyond. Quantum 2, 79 (2018). https://doi.org/10.22331

  6. Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. 4, 043001 (2019). https://doi.org/10.1088

  7. Schuld, M., Petruccione, F.: Machine Learning with Quantum Computers. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-83098-4. https://books.google.it/books?id=-N5IEAAAQBAJ

  8. Banchi, L., Pereira, J., Pirandola, S.: Generalization in quantum machine learning: a quantum information standpoint. PRX Quantum 2 (2021). https://doi.org/10.1103

  9. Heimann, D., Hohenfeld, H., Wiebe, F., Kirchner, F.: Quantum deep reinforcement learning for robot navigation tasks. arXiv (2022). https://arxiv.org/abs/2202.12180

  10. Chen, S., et al.: Variational quantum circuits for deep reinforcement learning. IEEE Access 8, 141007–141024 (2020)

    Google Scholar 

  11. Valdez, F., Melin, P.: A review on quantum computing and deep learning algorithms and their applications. Soft Comput. (2022)

    Google Scholar 

  12. Nian, R., Liu, J., Huang, B.: A review on reinforcement learning: introduction and applications in industrial process control. Comput. Chem. Eng. 139, 106–886 (2020). https://www.sciencedirect.com/science/article/pii/S0098135420300557

  13. Lillicrap, T., et al.: Continuous control with deep reinforcement learning. arXiv (2015). https://arxiv.org/abs/1509.02971

  14. Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2005). https://cds.cern.ch/record/1319893

  15. Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    Article  MATH  Google Scholar 

  16. Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Soc. Ind. Appl. Math. 42 (2001)

    Google Scholar 

  17. Li, H., Lau, T.: Reinforcement learning: prediction, control and value function approximation. arXiv (2019). https://arxiv.org/abs/1908.10771

  18. Han, M., Zhang, L., Wang, J., Pan, W.: Actor-critic reinforcement learning for control with stability guarantee. arXiv (2020). https://arxiv.org/abs/2004.14288

  19. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv (2018). https://arxiv.org/abs/1801.01290

  20. Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  MATH  Google Scholar 

  21. Crooks, G.: Gradients of parameterized quantum gates using the parameter-shift rule and gate decomposition. arXiv (2019). https://arxiv.org/abs/1905.13311

  22. Lan, Q.: Variational quantum soft actor-critic. arXiv (2021). https://arxiv.org/abs/2112.11921

  23. Pérez-Salinas, A., Cervera-Lierta, A., Gil-Fuster, E., Latorre, J.: Data re-uploading for a universal quantum classifier. Quantum 4, 226 (2020). https://doi.org/10.22331

  24. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. arXiv (2016). https://arxiv.org/abs/1610.00633

  25. Ibarz, J., Tan, J., Finn, C., Kalakrishnan, M., Pastor, P., Levine, S.: How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021). https://doi.org/10.1177

  26. Kilinc, O., Montana, G.: Reinforcement learning for robotic manipulation using simulated locomotion demonstrations. Mach. Learn. 111, 465–486 (2021). https://doi.org/10.1007

  27. Catto, E.: Box2D, a 2D physics engine for games (2011). https://box2d.org/

  28. Brockman, G., et al.: OpenAI Gym. arXiv (2016). https://arxiv.org/abs/1606.01540

  29. Van Rossum, G., Drake, F.: Python 3 reference manual. CreateSpace (2009)

    Google Scholar 

  30. Broughton, M., et al.: TensorFlow quantum: a software framework for quantum machine learning. arXiv (2020). https://arxiv.org/abs/2003.02989

  31. Gidney, C., et al.: Cirq, Zenodo (2022). https://doi.org/10.5281/zenodo.6599601

  32. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org

  33. Acuto, A., Barillà, P., Bozzolo, L., Conterno, M., Pavese, M., Policicchio, A.: Variational quantum soft actor-critic for robotic arm control (2022)

    Google Scholar 

  34. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014). https://arxiv.org/abs/1412.6980

Download references

Acknowledgements

This work was fully funded by NTT DATA Corporation (Japan) and supported by NTT DATA Italia S.p.A. (Italy).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Policicchio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Policicchio, A., Acuto, A., Barillà, P., Bozzolo, L., Conterno, M. (2025). A Variational Quantum Soft Actor-Critic Algorithm for Continuous Control Tasks. In: Sergeyev, Y.D., Kvasov, D.E., Astorino, A. (eds) Numerical Computations: Theory and Algorithms. NUMTA 2023. Lecture Notes in Computer Science, vol 14478. Springer, Cham. https://doi.org/10.1007/978-3-031-81247-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-81247-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-81246-0

  • Online ISBN: 978-3-031-81247-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics