Engineering Training Center,Shenyang Aerospace University,Shenyang 110136,China
Show less
文章历史+
Received
Accepted
Published
2024-12-24
2025-01-12
Issue Date
2025-10-30
PDF (1234K)
摘要
传统民间棋艺——藏久棋,是一种承载着深厚藏族文明与灿烂文化的完备信息博弈游戏。鉴于藏久棋规则体系的复杂性与棋局变化的多样性,传统博弈搜索算法难以有效应对其复杂决策需求。为提升藏久棋博弈的智能水平,提出了一种融合先验知识的蒙特卡洛树搜索(Monte Carlo tree search,MCTS)算法优化策略。在布局规划、行棋策略等关键阶段,基于深度强化学习,融合领域专家的先验知识设计了策略选择优化函数和评估函数。通过函数来有效指导MCTS的搜索过程,并训练出能够生成高质量着法的最佳模型。实验表明,改进的MCTS算法在对弈中取得显著效果。
Abstract
Tibetan Jiu Chess, a traditional folk chess game, is a complete information game that carries the profound Tibetan civilization and splendid culture. In view of the complexity of the rule system and the diversity of the game changes, the traditional game search algorithm is unable to cope with the vast game board and complex strategies. In order to improve the intelligence level of Tibetan Jiu Chess, a Monte Carlo tree search (MCTS) algorithm optimization strategy incorporating prior knowledge was proposed. The strategy was based on deep reinforcement learning in the key phases of layout planning and move strategy,and the strategy selection optimization function and evaluation function were designed by integrating the prior knowledge of domain experts. The search process of MCTS was efficiently guided by functions,and the best model for high-quality tessellation could be trained. Experimental results show that the improved MCTS algorithm achieves significant performance in the game.
目前,藏久棋博弈研究主要集中在传统的搜索技术范畴。诸如专家知识[6]、Alpha-Beta剪枝[7]及蒙特卡洛树搜索[8]等。Wang等[9]提出将改进上置信区间(unified communication transport,UCT)算法与神经网络结合。Li等[10]提出了一种藏久棋的阶段性博弈算法,布局阶段使用改进 UCT 算法,行棋阶段使用神经网络来指导MCTS。但由于传统的MCTS没有针对藏久棋规则和策略的内在机制,无法高效利用一些基于规则和经验的策略。
SilverD, HuangA, MaddisonC J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[3]
McGrathT, KapishnikovA, TomaševN,et al.Acquisition of chess knowledge in AlphaZero[J].Proceedings of the National Academy of Sciences of the United States of America,2022,119(47):e2206625119.
[4]
SchmidM, MoravčíkM, BurchN,et al.Student of games:a unified learning algorithm for both perfect and imperfect information games[J].Science Advances,2023,9(46):3256.
[5]
LiX L, DengS T.Review of research on computer games for Tibetan Jiu chess[C]//IEEE 14th International Conference on Dependable, Autonomic and Secure Computing. Auckland:IEEE,2016: 97-99.
[6]
LiX L, WangS, LvZ Y,et al.Strategy research based on chess shapes for Tibetan Jiu computer game[J].ICGA Journal,2018,40(3):318-328.
NaderzadehY, GrosuD, ChinnamR B.PPB-MCTS:a novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm[J].Journal of Parallel and Distributed Computing,2024,193:104944.
[9]
WangY J, LiangK, QiaoJ L,et al.The application of improved UCT combined with neural network in Tibetan Jiu chess[J].International Journal of Wireless and Mobile Computing,2022,23(1):22.
[10]
LiX L, ChenY D, ZhangY Y,et al.A phased game algorithm combining deep reinforcement learning and UCT for Tibetan Jiu chess[C]//2023 IEEE 47th Annual Computers,Software,and Applications Conference.Torino:IEEE, 2023:390-395.
WangQ, HeY Q, TangC L.Mastering construction heuristics with self-play deep reinforcement learning[J].Neural Computing and Applications,2023,35(6):4723-4738.
[13]
ShenQ W, DingM, LiS Q,et al.Research on jiuqi game strategy based on chess shape[C]//2020 3rd International Conference on Algorithms,Computing and Artificial Intelligence.Sanya:ACM,2020:1-5.
KhanM, OlivierJ.Regression to the mean:Estimation and adjustment under the bivariate normal distribution[J].Communications in Statistics-Theory and Methods,2023,52(19):6972-6990.
[16]
DongS, WangP, AbbasK.A survey on deep learning and its applications[J].Computer Science Review,2021,40:100379.
[17]
SilverD, SchrittwieserJ, SimonyanK,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359.
[18]
TanM X, LeQ V.Efficient Net:rethinking model scaling for convolutional neural networks[EB/OL].(2020-09-11)[2024-06-05].
[19]
SilverD, HubertT, SchrittwieserJ,et al.A general reinforcement learning algorithm that masters chess,shogi,and go through self-play[EB/OL].(2020-09-11)[2024-06-05].
[20]
RibeiroE S, AraújoL R G, ChavesG T L,et al.Distance-based loss function for deep feature space learning of convolutional neural networks[J].Computer Vision and Image Understanding,2024,249:104184.