Loading [a11y]/accessibility-menu.js
Self-Model-Free Learning Versus Learning With External Rewards in Information Constrained Environments | IEEE Journals & Magazine | IEEE Xplore

Self-Model-Free Learning Versus Learning With External Rewards in Information Constrained Environments


Impact Statement:A complicated communication topology that forms the backbone of an RL algorithm poses several threats to the cyber-physical systems’ learning mechanism. Increased modular...Show More

Abstract:

In this article, we provide a model-free reinforcement learning (RL) framework that relies on internal reinforcement signals, called self-model-free RL, for learning agen...Show More
Impact Statement:
A complicated communication topology that forms the backbone of an RL algorithm poses several threats to the cyber-physical systems’ learning mechanism. Increased modularity accompanied by a static communication topology conducting data sharing exposes communication channels to attacks from malicious agents such as jamming attacks, sensor spoofing, etc. In the absence of a reinforcement signal from the adversarial, information-constrained environment (reward drops to zero), it becomes challenging for the learning agent to optimize its control policies. Here, we present a method to internally compensate for reinforcement loss by constructing a goal network. When the learning agent has access to a portion of the reinforcement signal, a trade-off mechanism is proposed so the goal network can update when the signals are available, and inform the policy evaluation step when signals are dropped. To the best of our knowledge, no synchronous internal compensation methods exist in the literature with theoretical guarantees to combat reinforcement loss.

Abstract:

In this article, we provide a model-free reinforcement learning (RL) framework that relies on internal reinforcement signals, called self-model-free RL, for learning agents that experience loss of the reinforcement signals in the form of packet drops and/or jamming attacks by malicious agents. The framework embeds a correcting mechanism in the form of a goal network to compensate for information loss and produce optimal and stabilizing policies. It also provides a trade-off scheme that reconstructs the reward using a goal network whenever the reinforcement signals are lost but utilizes true reinforcement signals when they are available. The stability of the equilibrium point is guaranteed despite fractional information loss in the reinforcement signals. Finally, simulation results validate the efficacy of the proposed work.
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 12, December 2024)
Page(s): 6566 - 6579
Date of Publication: 18 September 2024
Electronic ISSN: 2691-4581

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.