Skip to main content
Log in

An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Designing an intelligent and autonomous system remains a great challenge in the assembly field. Most reinforcement learning (RL) methods are applied to experiments with relatively small state spaces. However, the complicated situation and high-dimensional spaces of the assembly environment cause traditional RL methods to behave poorly in terms of their efficiency and accuracy. In this paper, a model-driven adaptive proximal proximity optimization (MAPPO) method was presented to make the assembly system autonomously rectify the bolt posture error. In the MAPPO method, a probabilistic tree and adaptive reward mechanism were used to improve the calculation efficiency and accuracy of the traditional PPO method. The size of the action space was reduced by establishing a hierarchical logical relationship for each parameter with a probabilistic tree. Based on an adaptive reward mechanism, the phenomenon that the algorithm easily falls into local minima could be improved. Finally, the proposed method was verified based on the Unity simulation engine. The advancement and robustness of the proposed model were also validated by comparing different cases in simulations and experiments. The results revealed that MAPPO has better algorithm efficiency and accuracy compared with other state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Acharya UR, Fujita H, Lih OS, Hagiwara Y, Tan JH, Adam M (2017) Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network. Inf Sci 405:81–90. https://doi.org/10.1016/j.ins.2017.04.012

    Article  Google Scholar 

  2. Sudarshan VK, Mookiah MRK, Acharya UR, Chandran V, Molinari F, Fujita H, Ng KH (2016) Application of wavelet techniques for cancer diagnosis using ultrasound images: a review. Comput Biol Med 69:97–111. https://doi.org/10.1016/j.compbiomed.2015.12.006

    Article  Google Scholar 

  3. Acharya UR, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M (2017) Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf Sci 415-416:190–198. https://doi.org/10.1016/j.ins.2017.06.027

    Article  Google Scholar 

  4. Capuano N, Chiclana F, Fujita H, Herrera-Viedma E, Loia V (2018) Fuzzy group decision making with incomplete information guided by social influence. IEEE Trans Fuzzy Syst 26(3):1704–1718. https://doi.org/10.1109/TFUZZ.2017.2744605

    Article  Google Scholar 

  5. Protopapadakis E, Voulodimos A, Doulamis A, Doulamis N, Stathaki T (2019) Automatic crack detection for tunnel inspection using deep learning and heuristic image post-processing. Appl Intell 49(7):2793–2806. https://doi.org/10.1007/s10489-018-01396-y

    Article  Google Scholar 

  6. Villalonga A, Beruvides G, Castaño F, Haber RE (2020) Cloud-based industrial cyber–physical system for data-driven reasoning: a review and use case on an industry 4.0 pilot line. IEEE Trans Ind Informatics 16(9):5975–5984. https://doi.org/10.1109/TII.2020.2971057

    Article  Google Scholar 

  7. Gullapalli V, Franklin JA, Benbrahim H (1994) Acquiring robot SKILLS via reinforcement learning. IEEE Control Syst Mag 14(1):13–24. https://doi.org/10.1109/37.257890

    Article  Google Scholar 

  8. Yang BH, Asada H (1996) Progressive learning and its application to robot impedance learning. IEEE Trans Neural Netw 7(4):941–952. https://doi.org/10.1109/72.508937

    Article  Google Scholar 

  9. Nuttin M, VanBrussel H (1997) Learning the peg-into-hole assembly operation with a connectionist reinforcement technique. Comput Ind 33(1):101–109. https://doi.org/10.1016/s0166-3615(97)00015-8

    Article  Google Scholar 

  10. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054. https://doi.org/10.1109/TNN.1998.712192

    Article  Google Scholar 

  11. Ding S, Du W, Zhao X, Wang L, Jia W (2019) A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 49(12):4211–4222. https://doi.org/10.1007/s10489-019-01487-4

    Article  Google Scholar 

  12. Liu P, Zhao Y, Zhao W, Tang X, Yang Z (2019) An exploratory rollout policy for imagination-augmented agents. Appl Intell 49(10):3749–3764. https://doi.org/10.1007/s10489-019-01484-7

    Article  Google Scholar 

  13. Yang CG, Zeng C, Cong Y, Wang N, Wang M (2019) A learning framework of adaptive manipulative Skills from human to robot. Ieee Trans Ind Informatics 15(2):1153–1161. https://doi.org/10.1109/tii.2018.2826064

    Article  Google Scholar 

  14. Wan A, Xu J, Chen HP, Zhang S, Chen K (2017) Optimal path planning and control of assembly robots for hard-measuring easy-deformation assemblies. Ieee-Asme Trans Mechatron 22(4):1600–1609. https://doi.org/10.1109/tmech.2017.2671342

    Article  Google Scholar 

  15. Xu J, Hou ZM, Wang W, Xu BH, Zhang KG, Chen K (2019) Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. Ieee Trans Ind Informatics 15(3):1658–1667. https://doi.org/10.1109/tii.2018.2868859

    Article  Google Scholar 

  16. Young Ho K, Lewis FL (2000) Reinforcement adaptive learning neural-net-based friction compensation control for high speed and precision. IEEE Trans Control Syst Technol 8(1):118–126. https://doi.org/10.1109/87.817697

    Article  Google Scholar 

  17. Deisenroth MP, Fox D, Rasmussen CE (2015) Gaussian processes for data-efficient learning in robotics and control. IEEE Trans Pattern Anal Mach Intell 37(2):408–423. https://doi.org/10.1109/TPAMI.2013.218

    Article  Google Scholar 

  18. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv e-prints

  19. Mnih V, Badia AP, Mirza M, Graves A, Harley T, Lillicrap TP, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning, vol 48. Paper presented at the proceedings of the 33rd International Conference on International Conference on Machine Learning, New York

    Google Scholar 

  20. Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay

  21. Mousavi SS, Schukat M, Howley E (2018) Deep reinforcement learning: an overview. In: Bi Y, Kapoor S, Bhatia R (eds) Proceedings of Sai Intelligent Systems Conference, vol 16. Lecture Notes in Networks and Systems. pp 426-440. https://doi.org/10.1007/978-3-319-56991-8_32

  22. Singh S, Lewis RL, Barto AG, Sorg J (2010) Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans Auton Ment Dev 2(2):70–82. https://doi.org/10.1109/tamd.2010.2051031

    Article  Google Scholar 

  23. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  24. Gershman SJ, Daw ND (2017) Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu Rev Psychol 68:101–128. https://doi.org/10.1146/annurev-psych-122414-033625

    Article  Google Scholar 

  25. Lewis FL, Vamvoudakis KG (2011) Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans Syst Man Cybernetics Part B (Cybernetics) 41(1):14–25. https://doi.org/10.1109/TSMCB.2010.2043839

    Article  Google Scholar 

  26. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Computer Science

  27. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Paper presented at the Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research,

  28. Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. arXiv e-prints:arXiv:1502.05477

Download references

Acknowledgments

We gratefully acknowledge the financial support from the National Defence Basic Scientific Research Program of China (JCKY2018208A001), and Tsinghua University-Weichai Power Joint Institute of Intelligent Manufacturing (JIIM02).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianfu Zhang.

Ethics declarations

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, W., Zhang, J., Feng, P. et al. An adaptive adjustment strategy for bolt posture errors based on an improved reinforcement learning algorithm. Appl Intell 51, 3405–3420 (2021). https://doi.org/10.1007/s10489-020-01906-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01906-x

Keywords

Navigation