基于TD3的高超声速再入弹道动态规划方法

张瀚; 王正强; 王立峰; 王德昊; 张勋

doi:10.12115/ddxb.2025.10013

弹道学报 ›› 2025, Vol. 37 ›› Issue (4) : 48 -56. DOI: 10.12115/ddxb.2025.10013

基于TD3的高超声速再入弹道动态规划方法

张瀚 ¹ ,
王正强 ² ,
王立峰 ² ,
王德昊 ¹ ,
张勋 ¹^,^*

作者信息 +

Dynamic Trajectory Planning for Hypersonic Reentry Based on TD3

Author information +

文章历史 +

PDF (3892K)

摘要

高超声速飞行器在突防过程中需同时满足热流、动压、过载等物理约束以及禁飞区、拦截弹等多重任务约束。传统轨迹规划方法难以在毫秒级时间内完成满足上述所有约束的同时实时规划与决策。为此,提出了一种基于双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)算法的弹道动态规划方法。该方法将再入弹道规划问题建模为马尔可夫决策过程,通过合理定义状态空间与动作空间,并构建融合多约束的复合奖励函数,驱动智能体在仿真环境中自主学习最优控制策略。该奖励函数综合考虑了热流密度、动压、过载等路径约束,以及禁飞区避让和拦截弹规避等任务目标,实现了多目标之间的有效平衡。仿真结果表明,本文方法能够在满足所有严格路径约束的同时,精确命中终端位置,有效提升再入飞行的鲁棒性与自主决策能力。与主流SAC算法的对比进一步验证了所提算法在控制指令平滑性、训练效率和策略稳定性方面的优势,为高超声速飞行器在复杂拦截环境下的智能制导提供了一种可行且高效的解决方案。

Abstract

Hypersonic vehicles are required to simultaneously satisfy physical constraints such as heat flux, dynamic pressure, and overload limitations, as well as mission constraints including no-fly zones and evading interceptors. Conventional trajectory planning methods struggle to perform real-time planning and decision-making under such multi-constraint conditions within millisecond-level time frames. To overcome these challenges, a dynamic trajectory planning method based on the twin delayed deep deterministic policy gradient (TD3) algorithm is introduced in this paper. The reentry trajectory planning task is modeled as a Markov decision process, in which well-designed state space, action space, and a composite reward function incorporating multiple constraints guide the agent to autonomously learn an optimal control policy. The designed reward function balances path constraints (e.g., heat flux, dynamic pressure, and overload) and mission objectives (e.g., no-fly zone avoidance and interceptor evasion). Simulation results show that the proposed approach satisfies all path constraints while accurately reaching the target position, significantly enhancing both robustness and autonomous decision-making capability of the reentry vehicle. Compared with the soft actor-critic algorithm, TD3 demonstrates superior performance in terms of smoother control commands, higher training efficiency, and more stable policy convergence, offering a practical and effective solution for intelligent guidance of hypersonic vehicles in complex interception scenarios.

关键词

高超声速飞行器 / 再入 / 弹道规划 / 深度强化学习 / TD3算法 / 多约束优化 / 突防

Key words

hypersonic vehicle / reentry / trajectory planning / deep reinforcement learning / TD3 algorithm / multi-constraint optimization / penetration

引用本文

引用格式 ▾

张瀚,王正强,王立峰,王德昊,张勋. 基于TD3的高超声速再入弹道动态规划方法[J]. 弹道学报, 2025, 37(4): 48-56 DOI:10.12115/ddxb.2025.10013

登录浏览全文

4963

注册一个新账户忘记密码

参考文献

原文顺序 | 出版日期 | 本文引用

[1]	SHIMA T. Optimal cooperative pursuit and evasion strategies against a homing missile[J]. Journal of Guidance, Control, and Dynamics, 2011, 34: 414-425.

[2]	SINGH S K, REDDY P V. Dynamic network analysis of a target defense differential game with limited observations[J]. IEEE Transactions on Control of Network Systems, 2022, 10: 308-320.

[3]	SHINAR J. Solution techniques for realistic pursuit-evasion games[J]. Control and Dynamic Systems, 1981, 17: 63-124.

[4]	HAGEDORN P, BREAKWELL J V. A differential game with two pursuers and one evader[J]. Journal of Optimization Theory and Applications, 1976, 18(1): 15-29.

[5]	LEVCHENKOVA Y, PASHKOVA G. Differential game of optimal approach of two inertial pursuers to a noninertial evader[J]. Journal of Optimization Theory and Applications, 1990, 65(3): 501-518.

[6]	吴倩, 李斌, 李杰. 基于深度神经网络的无限时域航天器追逃策略求解[J]. 航天控制, 2019, 37: 13-18.

[7]	WU Qian, LI Bin, LI Jie. Solution of infinite time domain spacecraft pursuit strategy based on deep neural network[J]. Aerospace Control, 2019, 37: 13-18. (in Chinese)

[8]	DOMINGUEZ CALABUIG G J, MOOIJ E. Optimal on-board abort guidance based on successive convexification for atmospheric re-entry: AIAA 2021-0860[R]. 2021.

[9]	YAN B, LIU R, DAI P, et al. A rapid penetration trajectory optimization method for hypersonic vehicles[J]. International Journal of Aerospace Engineering, 2019, 2019: 1490342.

[10]	武天才, 王宏伦, 任斌, 等. 考虑规避与突防的高超声速飞行器智能容错制导控制一体化设计[J]. 航空学报, 2024, 45(15): 329607.

[11]	WU Tiancai, WANG Honglun, REN Bin, et al. Learning-based integrated fault-tolerant guidance and control for hypersonic vehicles considering avoidance and penetration[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(15): 329607. (in Chinese)

[12]	SZMUK M, PASCUCCI C A, DUERI D, et al. Convexification and real-time on-board optimization for agile quad-rotor maneuvering and obstacle avoidance[C]// Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver: IEEE, 2017: 4862-4868.

[13]	SZMUK M, REYNOLDS T P, ACIKMESE B. Successive convexification for real-time six-degree-of-freedom powered descent guidance with state-triggered constraints[J]. Journal of Guidance, Control, and Dynamics, 2020, 43(8): 1399-1413.

[14]	曹文洁, 周文涛, 常思江. 基于机器学习的制导炮弹控制方法研究[J]. 弹道学报, 2025, 37(2): 60-68.

[15]	CAO Wenjie, ZHOU Wentao, CHANG Sijiang. Research on guided projectile control method based on machine learning[J]. Journal of Ballistics, 2025, 37(2): 60-68. (in Chinese)

[16]	王中原, 史金光, 常思江, 等. “智能弹道理论与技术”的兴起给外弹道学发展带来的问题与挑战[J]. 弹道学报, 2024, 36(4): 1-10.

[17]	WANG Zhongyuan, SHI Jinguang, CHANG Sijiang, et al. Problems and challenges for the development of exterior ballistics arising from theory and technology of intelligent ballistics[J]. Journal of Ballistics, 2024, 36(4): 1-10. (in Chinese)

[18]	ZHAO D, SONG Z. Reentry trajectory optimization with waypoint and no-fly zone constraints using multiphase convex programming[J]. Acta Astronautica, 2017, 137: 60-69.