To improve the convergence speed and training stability of the SAC(soft actor-critic) algorithm, an improved SAC algorithm was proposed, incorporating the concepts of advantage functions and reward centering. To validate the performance of the improved SAC algorithm, simulation analyses were conducted in a six-axis robotic arm path planning scenario, comparing with DDPG(deep deterministic policy gradient), TD3(twin delayed deep deterministic policy gradient), and the original SAC algorithm. The results show that the improved SAC outperforms DDPG, TD3, and SAC in both of the convergence speed and stability. After 1500 training episodes, the path planning success rate increases by 4.8% compared to the SAC algorithm. Further experiments confirm the feasibility and effectiveness of the improved SAC algorithm's planning results in real-world environments.
LIUY C, HUANGC Y.DDPG-based Adaptive Robust Tracking Control for Aerial Manipulators with Decoupling Approach[J].IEEE Transactions on Cybernetics, 2021(99):1-14.
[2]
ALIA A, SHIJ F, ZHUZ H.Path Planning of 6-DOF Free-floating Space Robotic Manipulators Using Reinforcement Learning[J].Acta Astronautica, 2024, 224:367-378.
DAIShengtan, WANGYin, SHANGChenchen.Multi-UAV Collaborative Path Planning Method Based on Deep Reinforcement Learning[J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2024. (2024-09-10)[2025-04-08].
LIYongdi, LICaihong, ZHANGYaoyu, et al. Path Planning of Mobile Robots Based on the Improved SAC algorithm[J]. Computer Applications, 2022, 43(2):654-660.
[9]
刘正发.面向局部路径规划的深度强化学习移动机器人导航[D].贵阳:贵州大学,2021.
[10]
LIUZhengfa. Deep Reinforcement Learning-based Mobile Robot Navigation for Local Path Planning[D]. Guiyang: Guizhou University, 2021.
[11]
裴结安.基于深度强化学习的机械臂动态避障规划策略研究[D].南昌:华东交通大学,2022.
[12]
PEIJiean. Research on Dynamic Obstacle Avoidance Planning Strategy of Robotic Arm Based on Deep Reinforcement Learning[D]. Nanchang:East China Jiaotong University, 2022.
[13]
ZHANGY, CHENP.Path Planning of a Mobile Robot for a Dynamic Indoor Environment Based on an SAC-LSTM Algorithm[J].Sensors, 2023, 23(24): 9802.
[14]
KHALILW, KLEINFINGERJ.A New Geometric Notation for Open and Closed-loop Robots[C]∥IEEE International Conference on Robotics & Automation.San Francisco, 1986:1174-1179.
JINYanxia, QIAOXingyu, ZHANGLing, et al. A Spatial Mesh Collision Detection Method Between Cloth and Rigid Body Models[J]. Journal of Image and Graphics, 29(10), 3144-3156..
[17]
CAOX, ZOUX, JIAC,et al.RRT-based Path Planning for an Intelligent Litchi-picking Manipulator[J].Computers and Electronics in Agriculture, 2019, 156:105-118.
[18]
SUTTONR S, BARTOA G.Reinforcement Learning: an Introduction[J].IEEE Transactions on Neural Networks, 1998, 9(5):1054.
[19]
BELLMANR. A Markovian Decision Process[J]. Journal of Mathematics and Mechanics, 1957(6): 679-684.
ZHANGYuhang, CHENWenbai, ZHANGJiaqi, et al.A Deep Reinforcement Learning Strategy for Compliant Assembly of Six-degree-of-freedom Robotic Arms[J]. Journal of Chongqing University of Technology (Natural Science), 2025, 38(12): 148-154.
[22]
HAARNOJAT, ZHOUA, HARTIKAINENK,et al.Soft Actor-Critic Algorithms and Applications[EB/OL].[2025-04-08].
HELianluo, LITianhua, NIEYuanhuang, et al. A Study on the Control of a 6-axis Robotic Arm Based on the DDPG Algorithm[J]. Journal of Chongqing University of Technology(Natural Science), 2023, 37(9), 134-140.
SHIGaosong, ZHAOQinghai, DONGXin, et al. A Human-machine Interactive Reinforcement Learning Method for Autonomous Driving Based on the PPO Algorithm[J]. Application Research of Computers, 2024, 41(9):2732-2736.
FANGBaofu, YUTingting, WANGHao, et al.Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 435-446.
[29]
NAIKA, WANY, TOMARM, et al. Reward Centering[EB/OL]. [2025-04-08].