基于MADDPG的再入飞行器协同制导方法

王嘉磊 ,  郭建国

弹道学报 ›› 2025, Vol. 37 ›› Issue (4) : 30 -37.

PDF (2711KB)
弹道学报 ›› 2025, Vol. 37 ›› Issue (4) : 30 -37. DOI: 10.12115/ddxb.2025.10006

基于MADDPG的再入飞行器协同制导方法

作者信息 +

Cooperative Guidance Method for Reentry Vehicles Based on MADDPG

Author information +
文章历史 +
PDF (2775K)

摘要

临近空间再入阶段的多飞行器协同制导任务面临强气动耦合、剧烈非线性特性以及复杂任务与威胁约束。传统制导方法大多依赖解析模型或单体优化策略,在实时决策、复杂约束处理及协同能力方面均存在不足,难以满足未来高动态集群作战场景的需求。针对这一问题,提出了一种基于多智能体深度确定性策略梯度(MADDPG)的主-从式协同制导方法。首先,在视线坐标系下构建主-从相对动力学模型,为构建多飞行器协同编队模型提供了理论支撑;其次,为提升智能体在多约束环境下的策略学习能力,设计了以视线角变化率、相对距离保持误差与编队偏差为核心的复合奖励函数,并引入雷达威胁区惩罚项,以实现对编队保持、终端需求满足及威胁规避等多目标的统一描述;最后,结合残差网络结构框架进行主-从飞行器的策略学习与训练,实现了多飞行器的协同控制。仿真结果表明,所提出的方法在控制精度、稳定性及计算效率方面均显著优于传统制导策略。该方法能够在高动态环境下保持从飞行器对主飞行器的稳定编队跟随,显著降低相对距离误差与视线角抖动,并有效规避雷达威胁区,提高了整体协同制导的完成质量与任务成功率。研究内容为临近空间再入阶段多飞行器协同制导,提供了一种可扩展、智能化、高可靠性的技术路径,提高了多飞行器协同制导的稳定性与决策能力。

Abstract

Cooperative guidance for multiple vehicles during the near-space reentry phase is challenged by strong aerodynamic coupling,pronounced nonlinear dynamics,and stringent mission and threat constraints.Traditional guidance methods,which typically rely on analytical formulations or single-agent optimization strategies,exhibit limitations in real-time decision-making,constraint handling,and cooperative capability,making them insufficient for future high-dynamic swarm engagement scenarios.To address these issues,a master-slave cooperative guidance framework based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm was proposed in this paper.Firstly,a relative dynamics model between the master and slave vehicles was constructed in the line-of-sight (LOS) coordinate system,providing a theoretical foundation for the modeling of cooperative formation control among mutiple vehicles.Secondly,to enhance policy learning under multi-constraint conditions,a composite reward function was designed using LOS rate,relative distance-keeping error,and formation deviation as core indicators.A radar-threat penalty term was incorporated to achieve unified representation of formation maintenance,terminal mission requirements,and threat avoidance.Furthermore,a centralized training-decentralized execution paradigm was adopted,in which a residual network architecture was incorporated to facilitate policy learning and training for the master-slave vehicles,thereby enabling effective learning of cooperative strategies and achieving multi-vehicle coordinated control.Simulation results demonstrate that the proposed method significantly outperforms traditional guidance strategies in control accuracy,stability and computational efficiency.The learned policies maintain reliable formation-following of the slave vehicles with respect to the master vehicle under highly dynamic reentry conditions,substantially reducing relative distance errors and line-of-sight jitter while effectively avoiding radar threat zones.In summary,the proposed approach provides a scalable,intelligent,and highly reliable solution for cooperative guidance of multiple vehicles in near-space reentry missions,enhancing the overall stability and decision-making capability of multi-vehicle coordination.

关键词

多飞行器编队 / MADDPG算法 / 再入段 / 协同制导

Key words

multi-aircraft formation / MADDPG algorithm / reentry phase / cooperative guidance

引用本文

引用格式 ▾
王嘉磊,郭建国. 基于MADDPG的再入飞行器协同制导方法[J]. 弹道学报, 2025, 37(4): 30-37 DOI:10.12115/ddxb.2025.10006

登录浏览全文

4963

注册一个新账户 忘记密码

参考文献

[1]

SIK J H. The China Daily's framing of THAAD deployment:"a new cold war in East Asia"[J]. Newspaper Research Journal, 2023, 44(3): 323-339.

[2]

周彬, 郑浩宇, 郝明瑞, . 多约束下的协同制导方法研究综述[J]. 无人系统技术, 2025, 8(4): 1-24.

[3]

ZHOU Bin, ZHENG Haoyu, HAO Mingrui, et al. Review of cooperative guidance method under multiple constraints[J]. Unmanned Systems Technology, 2025, 8(4): 1-24.(in Chinese)

[4]

王琪, 闫咏琪. 空空导弹协同探测技术研究[J]. 飞控与探测, 2025, 8(5): 88-96.

[5]

WANG Qi, YAN Yongqi. Research on collaborative detection technology of air-to-air missile[J]. Flight Control & Detection, 2025, 8(5): 88-96.(in Chinese)

[6]

周敏, 王一鸣, 郭建国, . 多弹协同末制导方法综述[J]. 航空兵器, 2023, 30(4): 17-25.

[7]

ZHOU Min, WANG Yiming, GUO Jianguo, et al. A survey of multi-missile cooperative terminal guidance[J]. Aero Weaponry, 2023, 30(4): 17-25.(in Chinese)

[8]

郭建国, 梁乐成, 周敏, . 高速飞行器俯冲段制导控制一体化综述[J]. 航空兵器, 2023, 30(1): 1-10.

[9]

GUO Jianguo, LIANG Lecheng, ZHOU Min, et al. Overview of integrated guidance and control for hypersonic vehicles in dive phase[J]. Aero Weaponry, 2023, 30(1): 1-10.(in Chinese)

[10]

郭杰, 郑金库, 王浩凝, . 高超声速滑翔飞行器再入制导方法及热点问题研究综述[J]. 空天技术, 2022(1): 54-63.

[11]

GUO Jie, ZHENG Jinku, WANG Haoning, et al. Review of research on reentry guidance methods and hot issues of hypersonic gliding vehicle[J]. Aerospace Technology, 2022(1): 54-63.(in Chinese)

[12]

WANG X, LI Y, QUAN Z, et al. Optimal trajectory-tracking guidance for reusable launch vehicle based on adaptive dynamic programming[J]. Engineering Applications of Artificial Intelligence, 2023(117): 105497.

[13]

YU J, DONG X, LI Q, et al. Cooperative guidance strategy for multiple hypersonic gliding vehicles system[J]. Chinese Journal of Aeronautics, 2020, 33(3): 990-1005.

[14]

徐梓赫, 明超, 白志恒, . 基于滑模控制的多弹编队飞行控制器设计[J]. 弹箭与制导学报, 2025, 45(4): 510-515.

[15]

XU Zihe, MING Chao, BAI Zhiheng, et al. Design of multimissile formation flight controller based on sliding mode theory[J]. Journal of Projectiles,Rockets,Missiles and Guidance, 2025, 45(4): 510-515.(in Chinese)

[16]

杜宇. 面向多弹协同的末段弹道制导技术研究[D]. 太原:中北大学, 2024.

[17]

DU Yu. Research on terminal ballistic guidance technology for multimissile collaboration[D]. Taiyuan:North University of China, 2024.(in Chinese)

[18]

DEMARCO A, D'ONZA P M, MANFREDI S. A deep reinforcement learning control approach for high-performance aircraft[J]. Nonlinear Dynamics, 2023, 111(18): 17037-17077.

[19]

ZHANG X, LIU S, YAN J, et al. Fixed-time cooperative trajectory optimisation strategy for multiple hypersonic gliding vehicles based on neural network and ABC algorithm[J]. Aeronautical Journal, 2023, 127(1316): 1737-1751.

[20]

惠俊鹏, 汪韧, 郭继峰. 基于强化学习的禁飞区绕飞智能制导技术[J]. 航空学报, 2023, 44(11): 327416.

[21]

HU Junpeng, WANG Ren, GUO Jifeng. Intelligent guidance for no-fly zone avoidance based on reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(11): 327416.(in Chinese)

[22]

武天才, 王宏伦, 刘一恒, . 基于深度强化学习与高度速率反馈的再入制导方法[J]. 无人系统技术, 2022, 5(4): 1-13.

[23]

WU Tiancai, WANG Honglun, LIU Yiheng, et al. Reentry guidance method based on deep reinforcement learning and altitude rate feedback[J]. Unmanned Systems Technology, 2022, 5(4): 1-13.(in Chinese)

[24]

高佳宁, 刘云平, 王富尧, . 基于深度强化学习的无人机集群队形保持方法[J]. 兵器装备工程学报, 2025, 46(6): 268-277.

[25]

GAO Jianing, LIU Yunping, WANG Fuyao, et al. Deep reinforcement learning-based UAV cluster formation maintenance approach[J]. Journal of Ordnance Equipment Engineering, 2025, 46(6): 268-277.(in Chinese)

[26]

魏诗卉, 杨春伟, 刘炳琪, . 一种新型多弹编队控制器设计方法研究[J]. 弹箭与制导学报, 2022, 42(3): 69-73.

[27]

WEI Shihui, YANG Chunwei, LIU Bingqi, et al. Research on a new multi-missile formation controller design method[J]. Journal of Projectiles,Rockets,Missiles and Guidance, 2022, 42(3): 69-73.(in Chinese)

[28]

SUTTON R S, BARTO A G. Reinforcement learning:an introduction[M]. Cambridge: The MIT Press, 1998.

[29]

RANA K, XU M, TIDD B, et al. Residual skill policies:learning an adaptable skill-based action space for reinforcement learning for robotics[C]// Proceedings of 6th Conference on Robot Learning.Auckland:The Robot Learning Foundation,Inc, 2022.

基金资助

国家自然科学基金(52472419)

AI Summary AI Mindmap
PDF (2711KB)

7

访问

0

被引

详细

导航
相关文章

AI思维导图

/