Aiming at the problems such as the lack of independent learning ability of traditional robotic arms and poor adaptability in unknown environments, this paper proposes a deep reinforcement learning algorithm combining prior knowledge, and modifies the design of reward function and experience pool to solve the problems such as low data quality, slow convergence speed and unsatisfactory learning effect in the early stage of training, while enhancing its generalization. The results on Pybullet simulation platform show that compared with the original algorithm, the convergence speed of the improved model is increased by 48.5%, and the success rate is increased by 8%. Compared with other mainstream algorithms, the convergence speed and success rate are significantly improved.
深度强化学习[1](Deep reinforcement learning,DRL)是一种结合了深度学习感知能力和强化学习决策能力方法,文献[2]提出了改进的深度双网络算法(DDQN),通过使用两个独立的神经网络来分别估计值函数,以解决原始DQN算法中的过度估计问题。为了解决样本采集过程中等概率选取样本而忽略优先级的问题,文献[3]提出了深度双网络以及优先级经验回放(Double deep Network with prioritized experience replay),该方法对样本的选择进行重要性划分,并减少了训练时间;文献[4]提出事后经验回放,用于在稀疏奖励环境中提高样本学习效率;文献[5]提出了双延迟深度确定性策略梯度(Twin delayed deep deterministic policy gradient, TD3)算法以及基于软策略和软值的算法;文献[6]使用基于TD3算法的元强化学习框架,引入概率性上下文变量机制来处理机械臂控制问题;文献[7]提出了一种面向信号交叉口的自适应学习生态驾驶策略,基于深度确定性策略梯度算法(DDPG)对车辆加速度进行实时控制与训练;文献[8]提出了一种基于异步合作更新的LSTM-MADDPG多智能体协同决策算法。基于差异奖励和值分解思想,利用长短时记忆(LSTM)网络提取轨迹序列间特征,优化全局奖励划分方法,实现各智能体的动作奖励分配。文献[9]采用目标位置引导和TD3算法的联合方式进行轨迹优化,解决了高维动作空间中学习效率低下的问题。
ShuaiLyu, LiuJing. Stochastic local search heuristic method based on deep reinforcement learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(4): 1420-1426.
LiBao-gang, WangYu, KongFan-wei, et al. Security status updates based on intelligent reflecting surface assistance and age of information metrics[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(10): 3014-3025.
ZhuangWei-chao, DingHao-nan, DongHao-xuan, et al. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(1): 82-93.
GaoJing-peng, WangGuo-xuan, GaoLu. LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update[J]. Journal of Jilin University (Engineering and Technology Edition), 2024, 54(3): 797-806.
ZhangQiang, WenWen, ZhouXiao-dong, et al. Research on the manipulator intelligent trajectory planning method based on the improved TD3 algorithm[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 223-232.
XianBin, ZhangShi-jing, HanXiao-wei, et al. Trajectory planning for unmanned aerial vehicle slung⁃payload aerial transportation system based on reinforcement learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(6): 2259-2267.
ZhaoHong-wei, ChenXiao, LongMan-li, et al. Object classification algorithm based on improved PLSA[J]. Journal of Jilin University (Engineering and Technology Edition), 2012, 42(Sup.1): 231-235.
LiuYong, XuLei, ZhangChu-han. Deep reinforcement learning model for text games[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(3): 666-674.