PDF (1364K)
摘要
为解决多智能体射击游戏训练样本效率低、训练不稳定、奖励函数设计困难等问题,改进了MA-POCA(multi-agent posthumous credit assignment)算法,提出了基于时间衰减的分层奖励机制。首先基于Unity3D搭建训练环境,实现智能体与环境的交互,再采用射线传感器及Unity API构建观测系统并设计混合动作空间,实现智能体的自主决策;然后采用基于时间衰减的分层奖励机制改进的MA-POCA算法构建模型,解决长期任务中的信用分配问题,再通过时空注意力机制实现记忆检索,提高战术连续性。仿真实验结果显示,经过3000万步训练,智能体实现了从个体作战到高级团队协作,掌握了交叉火力等战术行为。优化后的算法显著提高了智能体的战术同步率,可为游戏AI和机器人协作等领域的深入研究提供重要参考。
Abstract
To solve the problems of low sample efficiency, unstable training, and difficulty in designing reward functions in multi-agent shooting games, the MA-POCA (multi-agent posthumous credit assignment) algorithm was improved, and a hierarchical reward mechanism based on time decay was proposed. Firstly, a training environment was built based on Unity3D to enable interaction between the intelligent agent and the environment. Then, a radiation sensor and Unity APl were used to construct an observation system and design a hybrid action space to achieve autonomous de-cision-making of the intelligent agent. Then, the MA-POCA algorithm improved by a hierarchical reward mechanism based on time decay was used to construct a model to solve the credit allocation problem in long-term tasks. The spatiotemporal attention mechanism was then used to achieve memory retrieval and improve tactical continuity. The simulation experiment results show that after 30million steps of training, the intelligent agent has achieved from individual combat to advanced team collaboration, and mastered tactical behaviors such as cross firepower. The optimized algorithm significantly improves the tactical synchronization rate of the intelligent agent, which can provide important references for in-depth research in fields such as game Al and robot collaboration.
关键词
Key words
[Author(id=1273280983051952410, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1273280983119061278, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983051952410, language=EN, stringName=Jiaxin LIANG, firstName=Jiaxin, middleName=null, lastName=LIANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Shenyang Ligong University, Shenyang 110159, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1273280983169392928, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983051952410, language=CN, stringName=梁嘉欣, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=沈阳理工大学 信息科学与工程学院, 沈阳 110159, bio={"content":"梁嘉欣(1999—),女,硕士研究生。
"}, bioImg=null, bioContent=梁嘉欣(1999—),女,硕士研究生。
, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1273280982968066325, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, xref=null, ext=[AuthorCompanyExt(id=1273280982984843542, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Shenyang Ligong University, Shenyang 110159, China), AuthorCompanyExt(id=1273280982997426455, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=沈阳理工大学 信息科学与工程学院, 沈阳 110159)])]), Author(id=1273280983223918883, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1273280983291027751, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983223918883, language=EN, stringName=Haotian MIAO, firstName=Haotian, middleName=null, lastName=MIAO, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Shenyang Ligong University, Shenyang 110159, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1273280983341359401, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983223918883, language=CN, stringName=苗好田, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=沈阳理工大学 信息科学与工程学院, 沈阳 110159, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1273280982968066325, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, xref=null, ext=[AuthorCompanyExt(id=1273280982984843542, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Shenyang Ligong University, Shenyang 110159, China), AuthorCompanyExt(id=1273280982997426455, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=沈阳理工大学 信息科学与工程学院, 沈阳 110159)])]), Author(id=1273280983387496748, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1273280983450411312, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983387496748, language=EN, stringName=Boyou LI, firstName=Boyou, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Shenyang Ligong University, Shenyang 110159, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1273280983500742963, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983387496748, language=CN, stringName=李博由, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=沈阳理工大学 信息科学与工程学院, 沈阳 110159, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1273280982968066325, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, xref=null, ext=[AuthorCompanyExt(id=1273280982984843542, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Shenyang Ligong University, Shenyang 110159, China), AuthorCompanyExt(id=1273280982997426455, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=沈阳理工大学 信息科学与工程学院, 沈阳 110159)])]), Author(id=1273280983546880310, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1273280983609794873, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983546880310, language=EN, stringName=Yueqiu JIANG, firstName=Yueqiu, middleName=null, lastName=JIANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Shenyang Ligong University, Shenyang 110159, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1273280983655932219, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, authorId=1273280983546880310, language=CN, stringName=姜月秋, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=沈阳理工大学 信息科学与工程学院, 沈阳 110159, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1273280982968066325, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, xref=null, ext=[AuthorCompanyExt(id=1273280982984843542, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Shenyang Ligong University, Shenyang 110159, China), AuthorCompanyExt(id=1273280982997426455, tenantId=1045748351789510663, journalId=1155139928303341618, articleId=1271784238220239799, companyId=1273280982968066325, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=沈阳理工大学 信息科学与工程学院, 沈阳 110159)])])]
梁嘉欣,苗好田,李博由,姜月秋.
基于深度强化学习的多智能体射击游戏研究[J].
沈阳理工大学学报, 2026, 45(4): 1-7 DOI:10.3969/j.issn.1003-1251.2026.04.001
| [1] |
庞皓冰, 崔林, 周建山, 等. 基于深度强化学习的空地协同组网与资源优化研究综述[J]. 人工智能, 2025, 12(1):1-14.
|
| [2] |
孙或, 曹雷, 陈希亮, 等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5):13-24.
|
| [3] |
Sun Y, Cao L, Chen X L, et al. Overview of multi-agent deep reinforcement learning[J]. Computer Engineering and Applications, 2020, 56(5):13-24. (in Chinese)
|
| [4] |
李艺春, 刘泽娇, 洪艺天, 等. 基于多智能体强化学习的博弈综述[J]. 自动化学报, 2025, 51 (3):540-558.
|
| [5] |
Li Y C, Liu Z J, Hong Y T, et al. Multi-agent reinforcement learning based game:a survey[J]. Acta Automatica Sinica, 2025, 51 (3):540-558. (in Chinese)
|
| [6] |
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
|
| [7] |
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
|
| [8] |
赵星宇, 丁世飞. 深度强化学习研究综述[J]. 计算机科学, 2018, 45(7):1-6.
|
| [9] |
Zhao X Y, Ding S F. Research on deep reinforcement learning[J]. Computer Science, 2018, 45(7):1-6. (in Chinese)
|
| [10] |
Brown N, Sandholm T. Superhuman Al for multiplayer poker[J]. Science, 2019, 365(6456):885-890.
|
| [11] |
Han H G, Zhang Y B, Huang Y T. Collision-free motion-con-strained path planning for multiple unmanned delivery vehicles based on heuristic deep reinforcement learning[J]. Neurocomputing, 2025, 648:130586.
|
| [12] |
Zou W R. Overview on reinforcement learning of multi-agent game[J]. Journal of Physics: Conference Series, 2023, 2646(1):012021.
|
| [13] |
张艳珠, 侯亢钧, 陈勇, 等. 基于强化学习的改进RRT*路径规划[J]. 沈阳理工大学学报, 2025, 44(4):1-6,12.
|
| [14] |
Zhang Y Z, Hou K J, Chen Y, et al. Improved RRT* pathplanning based on reinforcement learning[J]. Journal of Shenyang Ligong University, 2025, 44 (4):1-6,12. (in Chinese)
|
| [15] |
许可, 吉兰萍, 孙文娟, 等. 地对空武器-目标分配的多目标决策问题研究[J]. 沈阳理工大学学报, 2022, 41 (5):13-20.
|
| [16] |
Xu K, Ji L P, Sun W J, et al. Research on multi-target decision-making of ground-to-air weapon-target assignment[J]. Journal of Shenyang Ligong University, 2022, 41 (5):13-20. (in Chinese)
|
| [17] |
赵天亮, 张小俊, 张明路, 等. 基于深度强化学习的无人驾驶路径规划研究[J]. 河北工业大学学报, 2024, 53(4):21-30.
|
| [18] |
Zhao T L, Zhang X J, Zhang M L, et al. Unmanned drivingpath planning based on deep reinforcement learning[J]. Journal of Hebei University of Technology, 2024, 53(4):21-30. (in Chinese)
|
| [19] |
孙英博, 苗国英, 庄亚楠. 基于改进的深度强化学习多智能体协作方法[J]. 传感器与微系统, 2023, 42(9):25-29.
|
| [20] |
Sun Y B, Miao G Y, Zhuang Y N. Multi-agent collaborationmethod based on improved deep reinforcement learning[J]. Transducer and Microsystem Technologies, 2023, 42 (9):25-29. (in Chinese)
|
| [21] |
Cohen A, Teng E, Berges V P, et al. On the use and misuse of absorbing states in multi-agent reinforcement learning [PP/OL]. arXiv(2021-11-10)[2025-06-10]. https://doi.org/10.48550/arXiv.2111.05992.
|
| [22] |
张耐民, 蔡秉辰, 于洽, 等. 基于多智能体强化学习的对抗博弈技术综述[J]. 海军航空大学学报, 2024, 39(4):395-410.
|
| [23] |
Zhang N M, Cai B C, Yu H, et al. Review of adversarial gametechniques based on multi-agent reinforcement learning[J]. Journal of Naval Aviation University, 2024, 39(4):395-410. (in Chinese)
|
| [24] |
白天, 吕璐瑶, 李储, 等. 基于深度强化学习的游戏智能引导算法[J]. 吉林大学学报(理学版), 2025, 63(1):91-98.
|
| [25] |
Bai T, Lüi L Y, Li C, et al. Game intelligent guidance algo-rithm based on deep reinforcement learning[J]. Journal of Jilin University (Science Edition), 2025, 63 (1):91-98. (in Chinese)
|
| [26] |
曹毅, 郭银辉, 李磊, 等. 基于深度强化学习的机械臂避障轨迹规划研究[J]. 机械传动, 2023, 47(12):40-46, 96.
|
| [27] |
Cao Y, Guo Y H, Li L, et al. Deep reinforcement learning-based trajectory planning for manipulator obstacle avoidance[J]. Journal of Mechanical Transmission, 2023, 47(12):40-46,96. (in Chinese)
|
| [28] |
Wan K F, Wu D W, Zhai Y W, et al. An improved approach towards multi-agent pursuit-evasion game decision-making using deep reinforcement learning[J]. Entropy, 2021, 23(11):1433.
|
| [29] |
秦湖程, 黄炎焱, 陈天德, 等. 基于PPO算法的集群多目标火力规划方法[J]. 系统工程与电子技术, 2024, 46 (11):3764-3773.
|
| [30] |
Qin H C, Huang Y Y, Chen T D, et al. Clustermulti-target fireplanning method based on PPO algorithm[J]. Systems Engineering and Electronics, 2024, 46(11):3764-3773. (in Chinese)
|
| [31] |
Silver D, Hubert T, Schrittwieser J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Gothrough self-play[J]. Science, 2018, 362 (6419):1140-1144.
|
基金资助
沈阳理工大学引进高层次人才科研支持计划项目(1010147001225)
教育部供需对接就业育人项目(2023122570529)
辽宁省属本科高校基本科研业务费专项资金资助项目(SYLUGXTDO7)