Significance extraction based on data augmentation for reinforcement learning

Han, Yuxi; Li, Dequan; Yang, Yang

doi:10.1631/FITEE.2400406

Significance extraction based on data augmentation for reinforcement learning

基于数据增强的显著性提取强化学习

Research Article
Published: 06 March 2025

Volume 26, pages 385–399, (2025)
Cite this article

Download PDF

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Significance extraction based on data augmentation for reinforcement learning

Download PDF

174 Accesses
Explore all metrics

Abstract

Deep reinforcement learning has shown remarkable capabilities in visual tasks, but it does not have a good generalization ability in the context of interference signals in the input images; this approach is therefore hard to be applied to trained agents in a new environment. To enable agents to distinguish between noise signals and important pixels in images, data augmentation techniques and the establishment of auxiliary networks are proven effective solutions. We introduce a novel algorithm, namely, saliency-extracted Q-value by augmentation (SEQA), which encourages the agent to explore unknown states more comprehensively and focus its attention on important information. Specifically, SEQA masks out interfering features and extracts salient features and then updates the mask decoder network with critic losses to encourage the agent to focus on important features and make correct decisions. We evaluate our algorithm on the DeepMind Control generalization benchmark (DMControl-GB), and the experimental results show that our algorithm greatly improves training efficiency and stability. Meanwhile, our algorithm is superior to state-of-the-art reinforcement learning methods in terms of sample efficiency and generalization in most DMControl-GB tasks.

摘要

深度强化学习在视觉任务中展现了显著的能力,但在输入图像受到干扰信号的情况下,其泛化能力较弱,因此难以将训练有素的智能体应用于新环境中。为了让智能体能区分图像中的噪声信号和重要像素,数据增强技术和辅助网络的建立是有效的解决方案。提出一种新的算法,即增强提取显著性Q值(SEQA),该算法鼓励智能体全面探索未知状态,并将注意力集中在重要信息上。具体来说,SEQA屏蔽干扰特征,提取显著特征,使用评论家损失更新掩码解码网络,从而促使智能体关注重要特征并做出正确决策。在DeepMind控制泛化基准上评估该算法,实验结果表明,该算法极大提高了训练效率和稳定性。同时,在大多数DeepMind控制泛化基准任务中,我们的算法在样本效率和泛化能力方面优于最先进的强化学习方法。

Article PDF

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

Hard Negative Sample Mining for Contrastive Representation in Reinforcement Learning

Robust Visual Reinforcement Learning by Prompt Tuning

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Almuzairee A, Hansen N, Christensen HI, 2024. A recipe for unbounded data augmentation in visual reinforcement learning. https://arxiv.org/abs/2405.17416
MATH Google Scholar
Antotsiou D, Ciliberto C, Kim TK, 2021. Adversarial imitation learning with trajectorial augmentation and correction. IEEE Int Conf on Robotics and Automation, p.4724–4730. https://doi.org/10.1109/ICRA48506.2021.9561915
MATH Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, et al., 2017. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag, 34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240
Article Google Scholar
Bertoin D, Zouitine A, Zouitine M, et al., 2022. Look where you look! Saliency-guided Q-networks for generalization in visual reinforcement learning. Proc 36^th Int Conf on Neural Information Processing Systems, Article 2225.
MATH Google Scholar
Chen T, Kornblith S, Norouzi M, et al., 2020. A simple framework for contrastive learning of visual representations. Proc 37^th Int Conf on Machine Learning, p.1597–1607.
MATH Google Scholar
Cobbe K, Klimov O, Hesse C, et al., 2019. Quantifying generalization in reinforcement learning. Proc 36^th Int Conf on Machine Learning, p.1282–1289.
MATH Google Scholar
Farebrother J, Machado MC, Bowling M, 2018. Generalization and regularization in DQN. https://arxiv.org/abs/1810.00123
MATH Google Scholar
Fu X, Yang G, Agrawal P, et al., 2021. Learning task informed abstractions. Proc 38^th Int Conf on Machine Learning, p.3480–3491.
MATH Google Scholar
Gamrian S, Goldberg Y, 2019. Transfer learning for related reinforcement learning tasks via image-to-image translation. Proc 36^th Int Conf on Machine Learning, p.2063–2072.
Google Scholar
Gelada C, Kumar S, Buckman J, et al., 2019. DeepMDP: learning continuous latent space models for representation learning. Proc 36^th Int Conf on Machine Learning, p.2170–2179.
MATH Google Scholar
Grooten B, Tomilin T, Vasan G, et al., 2024. MaDi: learning to mask distractions for generalization in visual deep reinforcement learning. Proc 23^rd Int Conf on Autonomous Agents and Multiagent Systems, p.733–742.
Google Scholar
Hansen N, Wang XL, 2021. Generalization in reinforcement learning by soft data augmentation. IEEE Int Conf on Robotics and Automation, p.13611–13617. https://doi.org/10.1109/ICRA48506.2021.9561103
MATH Google Scholar
Hansen N, Jangir R, Sun Y, et al., 2021a. Self-supervised policy adaptation during deployment. Proc 9^th Int Conf on Learning Representations.
MATH Google Scholar
Hansen N, Su H, Wang XL, 2021b. Stabilizing deep Q-learning with ConvNets and vision Transformers under data augmentation. Proc 35^th Int Conf on Neural Information Processing Systems, Article 281.
Google Scholar
Hansen N, Yuan ZC, Ze YJ, et al., 2023. On pre-training for visuo-motor control: revisiting a learning-from-scratch baseline. Proc 40^th Int Conf on Machine Learning, Article 506.
MATH Google Scholar
Henderson P, Islam R, Bachman P, et al., 2017. Deep reinforcement learning that matters. Proc 32^nd AAAI Conf on Artificial Intelligence, Article 392. https://doi.org/10.1609/aaai.v32i1.11694
MATH Google Scholar
Kaelbling LP, Littman ML, Cassandra AR, 1998. Planning and acting in partially observable stochastic domains. Artif Intell, 101(1–2):99–134. https://doi.org/10.1016/S0004-3702(98)00023-X
Article MathSciNet MATH Google Scholar
Kalashnikov D, Irpan A, Pastor P, et al., 2018. Scalable deep reinforcement learning for vision-based robotic manipulation. Proc 2^nd Conf on Robot Learning, p.651–673.
Google Scholar
Khraishi R, Okhrati R, 2023. Simple noisy environment augmentation for reinforcement learning. https://arxiv.org/abs/2305.02882
MATH Google Scholar
Kirk R, Zhang A, Grefenstette E, et al., 2023. A survey of zero-shot generalisation in deep reinforcement learning. J Artif Intell Res, 76:201–264. https://doi.org/10.1613/jair.1.14174
Article MathSciNet MATH Google Scholar
Kurniawati H, 2022. Partially observable Markov decision processes and robotics. Ann Rev Contr Rob Auton Syst, 5:253–277. https://doi.org/10.1146/annurev-control-042920-092451
Article MATH Google Scholar
Laskin M, Srinivas A, Abbeel P, 2020a. CURL: contrastive unsupervised representations for reinforcement learning. Proc 37^th Int Conf on Machine Learning, Article 523.
MATH Google Scholar
Laskin M, Lee K, Stooke A, et al., 2020b. Reinforcement learning with augmented data. Proc 34^th Int Conf on Neural Information Processing Systems, Article 1669.
MATH Google Scholar
Lee K, Lee K, Shin J, et al., 2020. Network randomization: a simple technique for generalization in deep reinforcement learning. Proc 8^th Int Conf on Learning Representations.
MATH Google Scholar
Levine S, Finn C, Darrell T, et al., 2016. End-to-end training of deep visuomotor policies. J Mach Learn Res, 17(1):1334–1373.
MathSciNet MATH Google Scholar
Lin X, Baweja HS, Kantor GA, et al., 2019. Adaptive auxiliary task weighting for reinforcement learning. Proc 33^rd Conf on Neural Information Processing Systems, p.4772–4783.
MATH Google Scholar
Luketina J, Nardelli N, Farquhar G, et al., 2019. A survey of reinforcement learning informed by natural language. Proc 28^th Int Joint Conf on Artificial Intelligence, p.6309–6317.
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Nair A, Pong VH, Dalal M, et al., 2018. Visual reinforcement learning with imagined goals. Proc 32^nd Int Conf on Neural Information Processing Systems, p.9209–9220.
MATH Google Scholar
OpenAI, Akkaya I, Andrychowicz M, et al., 2019. Solving Rubik’s cube with a robot hand. https://arxiv.org/abs/1910.07113
MATH Google Scholar
Pinto L, Andrychowicz M, Welinder P, et al., 2018. Asymmetric actor critic for image-based robot learning. https://arxiv.org/abs/1710.06542
Book MATH Google Scholar
Sinha S, Mandlekar A, Garg A, 2022. S4RL: surprisingly simple self-supervision for offline reinforcement learning in robotics. Proc 5^th Conf on Robot Learning, p.907–917.
MATH Google Scholar
Song XY, Jiang YD, Tu S, et al., 2020. Observational over-fitting in reinforcement learning. Proc 8^th Int Conf on Learning Representations.
MATH Google Scholar
Sutton RS, Barto AG, 2018. Reinforcement learning: an introduction. IEEE Trans Neur Netw, 9:1054.
Article MATH Google Scholar
Tassa Y, Doron Y, Muldal A, et al., 2018. DeepMind Control Suite. https://arxiv.org/abs/1801.00690
Google Scholar
Tobin J, Fong R, Ray A, et al., 2017. Domain randomization for transferring deep neural networks from simulation to the real world. IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.23–30. https://doi.org/10.1109/IROS.2017.8202133
MATH Google Scholar
Wang XD, Lian L, Yu SX, 2021. Unsupervised visual attention and invariance for reinforcement learning. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6673–6683. https://doi.org/10.1109/CVPR46437.2021.00661
MATH Google Scholar
Xing JW, Nagata T, Chen KX, et al., 2021. Domain adaptation in reinforcement learning via latent unified state representation. Proc 35^th AAAI Conf on Artificial Intelligence, p.10452–10459. https://doi.org/10.1609/aaai.v35i12.17251
Google Scholar
Yang SZ, Ze YJ, Xu HZ, 2023. MoVie: visual model-based policy adaptation for view generalization. Proc 37^th Int Conf on Neural Information Processing Systems, Article 940.
MATH Google Scholar
Yang W, Wang XL, Farhadi A, et al., 2019. Visual semantic navigation using scene priors. Proc 7^th Int Conf on Learning Representations.
MATH Google Scholar
Yarats D, Zhang A, Kostrikov I, et al., 2019. Improving sample efficiency in model-free reinforcement learning from images. Proc 35^th AAAI Conf on Artificial Intelligence, p.10674–10681. https://doi.org/10.1609/aaai.v35i12.17276
MATH Google Scholar
Yarats D, Kostrikov I, Fergus R, 2021. Image augmentation is all you need: regularizing deep reinforcement learning from pixels. Proc 9^th Int Conf on Learning Representations.
MATH Google Scholar
Yu T, Zhang ZZ, Lan CL, et al., 2022. Mask-based latent reconstruction for reinforcement learning. Proc 36^th Conf on Neural Information Processing Systems, p.25117–25131.
Google Scholar
Ze YJ, Hansen N, Chen YB, et al., 2023. Visual reinforcement learning with self-supervised 3D representations. IEEE Rob Autom Lett, 8(5):2890–2897. https://doi.org/10.1109/LRA.2023.3259681
Article MATH Google Scholar
Zhang A, Ballas N, Pineau J, 2018. A dissection of overfitting and generalization in continuous reinforcement learning. https://arxiv.org/abs/1806.07937
MATH Google Scholar
Zhang A, McAllister RT, Calandra R, et al., 2021. Learning invariant representations for reinforcement learning without reconstruction. Proc 9^th Int Conf on Learning Representations.
MATH Google Scholar
Zhang H, Chen HG, Xiao CW, et al., 2020. Robust deep reinforcement learning against adversarial perturbations on state observations. Proc 34^th Int Conf on Neural Information Processing Systems, Article 1765.
MATH Google Scholar
Zhao J, Zhao YP, Wang WX, et al., 2022. Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents. Front Inform Technol Electron Eng, 23(7):1032–1042. https://doi.org/10.1631/FITEE.2100594
Article MATH Google Scholar
Zhou ZH, 2024. Continuous control reinforcement learning: distributed distributional DrQ algorithms. https://arxiv.org/abs/2404.10645
MATH Google Scholar
Zhu YK, Mottaghi R, Kolve E, et al., 2016. Target-driven visual navigation in indoor scenes using deep reinforcement learning. IEEE Int Conf on Robotics and Automation, p.3357–3364. https://doi.org/10.1109/ICRA.2017.7989381
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232000, China
Yuxi Han (韩玉玺), Dequan Li (李德权) & Yang Yang (杨洋)

Authors

Yuxi Han (韩玉玺)
View author publications
You can also search for this author inPubMed Google Scholar
Dequan Li (李德权)
View author publications
You can also search for this author inPubMed Google Scholar
Yang Yang (杨洋)
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yuxi HAN designed the research and drafted the paper. Yang YANG and Dequan LI helped organize the paper. Yuxi HAN and Dequan LI revised and finalized the paper.

Corresponding author

Correspondence to Dequan Li (李德权).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the Academic and Technical Leaders and Backup Candidates Program of Anhui Province, China (No. 2019h211) and the Natural Science Foundation of Anhui Province, China (No. 2208085ME128)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, Y., Li, D. & Yang, Y. Significance extraction based on data augmentation for reinforcement learning. Front Inform Technol Electron Eng 26, 385–399 (2025). https://doi.org/10.1631/FITEE.2400406

Download citation

Received: 17 May 2024
Accepted: 18 September 2024
Published: 06 March 2025
Issue Date: March 2025
DOI: https://doi.org/10.1631/FITEE.2400406

Key words

关键词

CLC number

TP391.4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Significance extraction based on data augmentation for reinforcement learning

Abstract

摘要

Article PDF

Similar content being viewed by others

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

Hard Negative Sample Mining for Contrastive Representation in Reinforcement Learning

Robust Visual Reinforcement Learning by Prompt Tuning

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number