Experimental Technology and Management

2025 04 v.42 78-85

Design and experimental verification of a dynamic obstacle avoidance algorithm for robot manipulators based on deep reinforcement learning

MAO Jianliang;WANG Zhan;ZHOU Xin;XIA Fei;ZHANG Chuanlin;

Email: xinzhou@shiep.edu.cn;

DOI: 10.16791/j.cnki.sjg.2025.04.010

English author unit:

College of Automation Engineering,Shanghai University of Electric Power;

Abstract:

[Objective] The study addresses the challenge of dynamic obstacle avoidance for robot manipulators operating in unstructured environments. Traditional motion planning algorithms often struggle with real-time adaptability and responsiveness to dynamic changes, especially in scenarios involving nonstatic obstacles and targets where the ability to adapt quickly and accurately is crucial for safe and efficient operation. Therefore, this research aims to develop an advanced algorithm based on deep reinforcement learning(DRL) that effectively balances dynamic obstacle avoidance and target tracking, ensuring the safe and efficient operation of robot manipulators in complex, unpredictable scenarios. [Methods] To achieve this goal, a DRL framework using the soft actor-critic(SAC) algorithm was designed. The SAC algorithm, known for its suitability in continuous control tasks, uses neural networks to handle high-dimensional tasks without requiring precise environment modeling. The robot manipulator learns optimal control strategies through trial-and-error interactions with the environment. The proposed method incorporates a comprehensive reward function that balances critical factors, including end-effector and body obstacle avoidance, self-collision prevention, precise target reaching, and motion smoothness. This comprehensive reward function guides the learning process by providing clear feedback signals that encourage the agent to develop efficient and safe behaviors. The state space provides a comprehensive representation of the environment, incorporating crucial details about the robot manipulator, obstacles, and target. It includes joint angles, joint velocities, end-effector positions and orientations, as well as key points on the manipulator's body. This holistic representation of the environment ensures that the agent has all the necessary information for making accurate and efficient decisions. The action space is defined by joint accelerations, which are transformed into planned joint velocities and communicated to the manipulator for control. This control strategy effectively eliminates motion singularities, enabling smooth and continuous operation. [Results] The algorithm is trained in a simulation environment that leverages Python and the PyBullet simulator, providing a realistic and efficient platform for agent training. This environment is encapsulated using the Gym framework and integrated with the Stable-Baselines3 library to facilitate smooth agent–environment interactions. Extensive simulations demonstrate the algorithm's ability to learn effective dynamic obstacle avoidance strategies, with average reward and success rate curves showing noticeable improvement and eventual stabilization. These results indicate that the model achieves a relatively stable state, capable of navigating complex and dynamic environments. The trained model is subsequently deployed on a real robot manipulator equipped with a visual servoing system. This setup includes a Realsense D435 camera and an Onrobot gripper attached to a UR5 manipulator. The visual servoing system employs ArUco markers for detecting obstacles and targets, while OpenCV handles image processing and pose estimation,enabling real-time environmental perception and precise manipulator control. Experimental results validate the algorithm's practical effectiveness, as the robot successfully avoids dynamic obstacles and reliably reaches target positions regardless of the direction of obstacle motion. Quantitative analysis reveals that the end-effector's position error with respect to the target converges to zero, and joint velocities remain smooth throughout the operation. These results validate the algorithm's precision and reliability. [Conclusions] This study successfully develops and validates a DRL-based dynamic algorithm for obstacle avoidance in robot manipulators. By utilizing the soft actor-critic algorithm and a well-structured reward function, the proposed method demonstrates superior performance in navigating complex, dynamic environments. Deployment of the trained model on a real robot manipulator, integrated with a visual servoing system, further validates the algorithm's practical applicability. These results highlight the potential of DRL in enhancing the autonomy and adaptability of robot manipulators, paving the way for future research in intelligent robotic systems.

KeyWords: robot manipulator;deep reinforcement learning;dynamic obstacle avoidance;trajectory planning

518	0	1
Downloads	Citas	Reads

Cite Download

PDF

Reference

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

Full Article References Publication Related

for the full text, please visit CNKI.net

References

[1]柴铜,庄春刚,张波.动态环境下六自由度机械臂在线运动规划算法[J].机械制造, 2021, 59(2):7–11.CHAI T, ZHUANG C G, ZHANG B. Online motion planning algorithm for the 6-DOF manipulator in the dynamic environment[J]. Machinery, 2021, 59(2):7–11.(in Chinese)

[2]邱宏波,司徒仕忠,高萌,等.基于深度相机的机械臂动态避障研究[J].组合机床与自动化加工技术, 2022(10):120–123.QIU H B, SITU S Z, GAO M, et al. Research on dynamic obstacle avoidance of manipulator based on depth camera[J].Modular Machine Tool&Automatic Manufacturing Technique,2022(10):120–123.(in Chinese)

[3] JIANG L, LIU S, CUI Y, et al. Path planning for robotic manipulator in complex multi-obstacle environment based on improved RRT[J]. IEEE/ASME Transactions on Mechatronics,2022, 27(6):4774–4785.

[4] JANG K, BAEK J, PARK S, et al. Motion planning for closed-chain constraints based on probabilistic roadmap with improved connectivity[J]. IEEE/ASME Transactions on Mechatronics, 2022, 27(4):2035–2043.

[5] SEPEHRI A, MOGHADDAM A M. A motion planning algorithm for redundant manipulators using rapidly exploring randomized trees and artificial potential fields[J]. IEEE Access,2021, 9:26059–26070.

[6] ZHU T, MAO J, HAN L, et al. Real-time dynamic obstacle avoidance for robot manipulators based on cascaded nonlinear MPC with artificial potential field[C]//IEEE Transactions on Industrial Electronics. 2023.

[7] SAFAOUI S, VINOD A P, CHAKRABARTY A, et al. Safe multi-agent motion planning under uncertainty for drones using filtered reinforcement learning[C]//IEEE Transactions on Robotics.

[8] CHENG R, OROSZ G, MURRAY R M, et al. End-to-end safe reinforcement learning through barrier functions for safetycritical continuous control tasks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(1):3387–3395.

[9] ZHENG L, WANG Y H, YANG R, et al. An efficiently convergent deep reinforcement learning-based trajectory planning method for manipulators in dynamic environments[J].Journal of Intelligent&Robotic Systems, 2023, 107(4):50.

[10]王倩男,何文辉,赵颖.基于ROS的机械臂在线实验系统设计[J].实验技术与管理, 2022, 39(4):163–167, 177.WANG Q N, HE W H, ZHAO Y. Design of robot arm on-line experiment system based on ROS[J]. Experimental Technology and Management, 2022, 39(4):163–167, 177.(in Chinese)

[11]沙林秀,曾童年.基于深度强化学习的机械臂动态目标跟踪控制[J].实验技术与管理, 2023, 40(6):128–134.SHA L X, ZENG T N. Dynamic target tracking control of manipulator based on deep reinforcement learning[J]. Experimental Technology and Management, 2023, 40(6):128–134.(in Chinese)

[12] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning. PMLR, 2018:1861–1870.

[13] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actorcritic algorithms and applications[Z]. arXiv preprint arXiv:1812. 05905, 2018.

[14] XIE J, SHAO Z, LI Y, et al. Deep reinforcement learning with optimized reward functions for robotic trajectory planning[J].IEEE Access, 2019(7):105669–105679.

[15] CHENG X, LIU S. Dynamic obstacle avoidance algorithm for robot arm based on deep reinforcement learning[C]//2022 IEEE11th Data Driven Control and Learning Systems Conference(DDCLS). IEEE, 2022:1136–1141.

[16] BAI C, ZHANG J, GUO J, et al. Adaptive hybrid optimization learning-based accurate motion planning of multi-joint arm[J].IEEE Transactions on Neural Networks and Learning Systems,2023, 34(9):5440–5451.

[17]杨亮,郭志军,李文生,等.基于视觉伺服的桌面型机械臂创新实验平台研制[J].实验技术与管理, 2018, 35(5):92–94,101.YANG L, GUO Z J, LI W S, et al. Development of an innovative experimental platform for the desktop manipulator based on the visual servo[J]. Experimental Technology and Management,2018, 35(5):92–94, 101.(in Chinese)

Basic Information:

DOI：10.16791/j.cnki.sjg.2025.04.010

China Classification Code:TP18;TP241

Citation Information:

[1]冒建亮,王展,周昕等.基于深度强化学习的机械臂动态避障算法设计与实验验证[J].实验技术与管理,2025,42(04):78-85.DOI:10.16791/j.cnki.sjg.2025.04.010.

Fund Information:

国家自然科学基金项目(62203292)

请选择需要下载的pdf数据

Experimental Technology and Management

Summary

quote