Deep Reinforcement Learning With Reward Design for Quantum Control | IEEE Journals & Magazine | IEEE Xplore

Deep Reinforcement Learning With Reward Design for Quantum Control


Impact Statement:Over the past few years, quantum machine learning has received growing attention. In particular, reinforcement learning (RL) and quantum physics have gradually intersecte...Show More

Abstract:

Deep reinforcement learning (DRL) has been recognized as a powerful tool in quantum physics, where DRL's reward design is nontrivial but crucial for quantum control tasks...Show More
Impact Statement:
Over the past few years, quantum machine learning has received growing attention. In particular, reinforcement learning (RL) and quantum physics have gradually intersected, and one representative aspect is that some impressive results have been achieved regarding the application of RL algorithms in quantum system tasks. Despite some advances, the full potential of RL remains massively unexplored in quantum physics. A major limitation is how to reward the learning agent. Most previous works adopted hand-designed methods, which tend to be time-consuming and vulnerable to human interference from empirical knowledge. The RL algorithm proposed in this paper reduces the above limitation, where rewards are automatically generated with the learning process. This paper is helpful to the technical research on automated reinforcement learning and quantum machine learning.

Abstract:

Deep reinforcement learning (DRL) has been recognized as a powerful tool in quantum physics, where DRL's reward design is nontrivial but crucial for quantum control tasks. To address the problem of over-reliance on human empirical knowledge to design DRL's rewards, we propose a DRL with a novel reward paradigm designed by the learning process information (DRL-LPI), where the learning process information (LPI) comprises the state information and the experiences. In DRL-LPI, the state information after being classified by a fidelity threshold, and the experiences are first stored simultaneously in the respective sequences, and this process is repeated until a similar-segment ends. Then, the stored state information is converted to the real value and used to design the reward value by applying a self-amplitude function. Next, the designed reward values are integrated with the stored experiences to compose transitions for DRL's training. Through comparisons to five representative reward sc...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 3, March 2024)
Page(s): 1087 - 1101
Date of Publication: 28 November 2022
Electronic ISSN: 2691-4581

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.