1.School of Mathematics,Hohai University,Nanjing 211100,China
2.College of Artificial Intelligence and Automation,Hohai University,Changzhou 213000,China
3.School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 211100,China
Show less
文章历史+
Received
Accepted
Published
2025-01-21
Issue Date
2026-02-13
PDF (1671K)
摘要
针对复杂动态障碍环境下多无人机编队保持与目标导航协同控制问题,提出基于虚拟中心控制架构的多目标多智能体双延迟深度确定性策略梯度算法(multi - objective multi - agent twin delayed deep deterministic policy gradient,MO - MATD3).首先,基于人工势场理论构建连续稠密奖励函数,提升智能体对复杂行为策略的学习效率与训练收敛速度.其次,基于多目标规划的思想设计了编队导航和避障的模式切换,当智能体检测到障碍物时,优先采取避障策略,使得多智能体在同时面对编队、导航和避障等多个相互冲突的目标时能够切换策略,在保证安全的前提下完成编队导航任务.最后,通过对照实验验证算法有效性,利用泛化环境测试算法泛化性,并通过参数扰动验证系统鲁棒性.
Abstract
Aiming at the cooperative control problem of multi - UAV formation maintenance and target navigation in complex dynamic obstacle environments, a multi - objective multi - agent twin delayed deep deterministic policy gradient (MO - MATD3) algorithm based on a virtual center control architecture is proposed. First, a continuous dense reward function is constructed based on artificial potential field theory to improve the learning efficiency of complex behavioral strategies and accelerate training convergence. Second, a mode - switching mechanism for formation navigation and obstacle avoidance is designed using multi - objective planning principles, with agents granting priority to avoidance strategies during obstacle encounters. This enables multi - agent systems to switch strategies amid conflicting objectives including formation maintenance, navigation, and obstacle avoidance, thereby ensuring safe completion of formation navigation tasks. Finally, the effectiveness of the algorithm is verified through comparative experiments, its generalization capability is tested in diverse environments, and system robustness is validated through parameter perturbation.
DONGX, LIY, LUC, et al. Time - varying formation tracking for UAV swarm systems with switching directed topologies[J]. IEEE Transactions on Neural Networks and Learning Systems,2018,30(12):3674 - 3685.
[2]
WUE, SUNY, HUANGJ, et al. Multi UAV cluster control method based on virtual core in improved artificial potential Field[J]. IEEE Access,2020,8:131647 - 131661.
[3]
WENG, CHENC L P, LIUY J. Formation control with obstacle avoidance for a class of stochastic multiagent systems[J]. IEEE Transactions on Industrial Electronics, 2017, 65(7): 5847 - 5855.
[4]
MUC, PENGJ. Learning - based cooperative multiagent formation control with collision avoidance[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems,2022,52(12):7341 - 7352.
FUJIMOTOS, HOOFH, MEGERD. Addressing function approximation error in actor - critic methods[C]// International Conference on Machine Learning.PMLR.2018:1587 - 1596.
[7]
WUJ, LID, YUY, et al. An attention mechanism and adaptive accuracy triple - dependent MADDPG formation control method for hybrid UAVs[J]. IEEE Transactions on Intelligent Transportation Systems,2024,25(8):8945 - 8958.
[8]
XINGX, ZHOUZ, LIY, et al. Multi - UAV adaptive cooperative formation trajectory planning based on an improved MATD3 algorithm of deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology,2024,73(7):9456 - 9470.
[9]
HUT, LUOB, YANGC, et al. MO - MIX: Multi - objective multi - agent cooperative decision - making with deep reinforcement learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(10): 12098 - 12112.
[10]
BAYEZITI, FIDANB. Distributed cohesive motion control of flight vehicle formations[J]. IEEE Transactions on Industrial Electronics, 2012, 60(12): 5763 - 5772.
[11]
WANGJ, HANL, DONGX, et al. Distributed sliding mode control for time - varying formation tracking of multi - UAV system with a dynamic leader[J]. Aerospace Science and Technology,2021,111:106549.