Abstract
Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and high-dimensional spaces of the assembly environment cause traditional RL methods to behave poorly in terms of their efficiency and accuracy. In this paper, a model-driven adaptive proximal proximity optimization (MAPPO) method was presented to make the assembly system autonomously rectify the bolt posture error. In the MAPPO method, a probabilistic tree and adaptive reward mechanism were used to improve the calculation efficiency and accuracy of the traditional PPO method. The size of the action space was reduced by establishing a hierarchical logical relationship for each parameter with a probabilistic tree. Based on an adaptive reward mechanism, the phenomenon that the algorithm easily falls into local minima could be improved. Finally, the proposed method was verified based on the Unity simulation engine. The advancement and robustness of the proposed model were also validated by comparing different cases in simulations and experiments. The results revealed that MAPPO has better algorithm efficiency and accuracy compared with other state-of-the-art algorithms.
Similar content being viewed by others
References
Acharya UR, Fujita H, Lih OS, Hagiwara Y, Tan JH, Adam M (2017) Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network. Inf Sci 405:81–90. https://doi.org/10.1016/j.ins.2017.04.012
Sudarshan VK, Mookiah MRK, Acharya UR, Chandran V, Molinari F, Fujita H, Ng KH (2016) Application of wavelet techniques for cancer diagnosis using ultrasound images: a review. Comput Biol Med 69:97–111. https://doi.org/10.1016/j.compbiomed.2015.12.006
Acharya UR, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M (2017) Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf Sci 415-416:190–198. https://doi.org/10.1016/j.ins.2017.06.027
Capuano N, Chiclana F, Fujita H, Herrera-Viedma E, Loia V (2018) Fuzzy group decision making with incomplete information guided by social influence. IEEE Trans Fuzzy Syst 26(3):1704–1718. https://doi.org/10.1109/TFUZZ.2017.2744605
Protopapadakis E, Voulodimos A, Doulamis A, Doulamis N, Stathaki T (2019) Automatic crack detection for tunnel inspection using deep learning and heuristic image post-processing. Appl Intell 49(7):2793–2806. https://doi.org/10.1007/s10489-018-01396-y
Villalonga A, Beruvides G, Castaño F, Haber RE (2020) Cloud-based industrial cyber–physical system for data-driven reasoning: a review and use case on an industry 4.0 pilot line. IEEE Trans Ind Informatics 16(9):5975–5984. https://doi.org/10.1109/TII.2020.2971057
Gullapalli V, Franklin JA, Benbrahim H (1994) Acquiring robot SKILLS via reinforcement learning. IEEE Control Syst Mag 14(1):13–24. https://doi.org/10.1109/37.257890
Yang BH, Asada H (1996) Progressive learning and its application to robot impedance learning. IEEE Trans Neural Netw 7(4):941–952. https://doi.org/10.1109/72.508937
Nuttin M, VanBrussel H (1997) Learning the peg-into-hole assembly operation with a connectionist reinforcement technique. Comput Ind 33(1):101–109. https://doi.org/10.1016/s0166-3615(97)00015-8
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054. https://doi.org/10.1109/TNN.1998.712192
Ding S, Du W, Zhao X, Wang L, Jia W (2019) A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 49(12):4211–4222. https://doi.org/10.1007/s10489-019-01487-4
Liu P, Zhao Y, Zhao W, Tang X, Yang Z (2019) An exploratory rollout policy for imagination-augmented agents. Appl Intell 49(10):3749–3764. https://doi.org/10.1007/s10489-019-01484-7
Yang CG, Zeng C, Cong Y, Wang N, Wang M (2019) A learning framework of adaptive manipulative Skills from human to robot. Ieee Trans Ind Informatics 15(2):1153–1161. https://doi.org/10.1109/tii.2018.2826064
Wan A, Xu J, Chen HP, Zhang S, Chen K (2017) Optimal path planning and control of assembly robots for hard-measuring easy-deformation assemblies. Ieee-Asme Trans Mechatron 22(4):1600–1609. https://doi.org/10.1109/tmech.2017.2671342
Xu J, Hou ZM, Wang W, Xu BH, Zhang KG, Chen K (2019) Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. Ieee Trans Ind Informatics 15(3):1658–1667. https://doi.org/10.1109/tii.2018.2868859
Young Ho K, Lewis FL (2000) Reinforcement adaptive learning neural-net-based friction compensation control for high speed and precision. IEEE Trans Control Syst Technol 8(1):118–126. https://doi.org/10.1109/87.817697
Deisenroth MP, Fox D, Rasmussen CE (2015) Gaussian processes for data-efficient learning in robotics and control. IEEE Trans Pattern Anal Mach Intell 37(2):408–423. https://doi.org/10.1109/TPAMI.2013.218
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv e-prints
Mnih V, Badia AP, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning, vol 48. Paper presented at the proceedings of the 33rd International Conference on International Conference on Machine Learning, New York
Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay
Mousavi SS, Schukat M, Howley E (2018) Deep reinforcement learning: an overview. In: Bi Y, Kapoor S, Bhatia R (eds) Proceedings of Sai Intelligent Systems Conference, vol 16. Lecture Notes in Networks and Systems. pp 426-440. https://doi.org/10.1007/978-3-319-56991-8_32
Singh S, Lewis RL, Barto AG, Sorg J (2010) Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans Auton Ment Dev 2(2):70–82. https://doi.org/10.1109/tamd.2010.2051031
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Gershman SJ, Daw ND (2017) Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu Rev Psychol 68:101–128. https://doi.org/10.1146/annurev-psych-122414-033625
Lewis FL, Vamvoudakis KG (2011) Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans Syst Man Cybernetics Part B (Cybernetics) 41(1):14–25. https://doi.org/10.1109/TSMCB.2010.2043839
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Computer Science
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Paper presented at the Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research,
Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. arXiv e-prints:arXiv:1502.05477
Acknowledgments
We gratefully acknowledge the financial support from the National Defence Basic Scientific Research Program of China (JCKY2018208A001), and Tsinghua University-Weichai Power Joint Institute of Intelligent Manufacturing (JIIM02).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, W., Zhang, J., Feng, P. et al. An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm. Appl Intell 51, 3405–3420 (2021). https://doi.org/10.1007/s10489-020-01906-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01906-x