To address the issues of insufficent safety performance and low learning inefficient in traditional autonomous driving systems, an autonomous driving safety decision-making model capable of continuous learning and understanding linguistic information was proposed. Referring to the reasoning decision-making and experience accumulation processes in human driving, this model leverages a large language model (LLM) as the decision-making agent, integrating chain-of-thought reasoning, a two-stage attention mechanism, and cognitive memory storage and retrieval into the contextual safety learning of the driving process. Meanwhile, a kinematic module is employed to convert LLM decisions into executable driving commands, enabling the continuous learning of safe driving experiences. Experimental results demonstrate that the proposed decision-making model significantly improves safety and efficiency compared to rule, reinforcement learning, and knowledge-based approaches, and possesses the capability of continuous learning and adapting driving behaviors based on human instructions, providing a reference for human-like autonomous driving.
大型语言模型(Large language model,LLM展现了迈向通用人工智能的初步迹象,在OOD推理、常识理解、知识检索、通过自然语言与人类沟通等方面具有突出能力。这些能力与自动驾驶、机器人技术等领域的需求高度契合。LLM在机器人操纵、多模态理解、终身学习等领域展现出强大潜力[5],在上下文理解、答案生成和复杂任务处理方面也表现非凡。将LLM集成到自动驾驶中已引起广泛关注,有效增强了自动驾驶汽车的决策能力[6]。LLM通过思维链[7]学习人类驾驶行为和轨迹规划,这种方法使LLM能像人类驾驶员一样理解复杂场景。将LLM集成到决策模块,可以显著提高用户的信任,并将驾驶经验推广到各种驾驶场景。本文利用LLM的短时学习能力和多源输入理解能力,探索其在持续学习技术中的潜力,助力自动驾驶系统快速、有效地适应不断变化的驾驶环境,实现安全最大化和持续学习的目标。
1 决策框架
1.1 模型架构
基于神经网络的自动驾驶系统缺乏与人类先验知识的直接兼容性,限制了其利用先验知识提高驾驶性能的潜力。为了应对这一挑战,本文提出了一种新的基于语言推理和认知记忆的决策方法(Language reasoning and cognitive memory method,LRCMM)。文本和符号具有内在的逻辑推理、知识检索和人类交流适用性[8],是发挥LLM能力的绝佳媒介。因此,本文以文本作为统一接口,连接神经网络和经验知识。模型总体结构如图1所示,主要包括:①与智能体交互的仿真环境;②具有回忆、推理能力的决策器;③保存和读取驾驶经验的记忆组件。智能体获取环境信息,查询存储模块的经验并执行决策,收集到的信息和决策进一步用于更新认知记忆。本文采用以安全为中心的方法,利用LLM在面对预测不确定性时的优势,为低级模型预测控制(Model predict control,MPC)制定安全约束。
MaYi-ning, JiangWei, WuJing-yu, et al. Self- evolution scenarios for simulation tests of autonomous vehicles based on different models of driving styles[J]. China Journal of Highway and Transport, 2023, 36(2): 216-228.
ZhuBo, ZhangJi-wei, TanDong-kui, et al. End-to-end autonomous driving method based on multi-source sensor and navigation map[J]. Journal of Automotive Safety and Energy, 2022, 13(4): 738-749.
[8]
ZhangQ X, ZhaoY H, WangY J, et al. Towards cross-task universal perturbation against black-box object detectors in autonomous driving[J]. Computer Networks, 2020, 180: No.107388.
[9]
WangS Y, ZhuY X, LiZ H, et al. ChatGPT as your vehicle co-pilot: An initial attempt[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(12): 4706-4721.
[10]
CuiY D, HuangS C, ZhongJ M, et al. DriveLLM: charting the path toward full autonomous driving with large language models[J]. IEEE Transactions on Intelligent Vehicles, 2023, 9(1): 1450-1464.
[11]
KojimaT, GuS S, ReidM, et al. Large language models are zero-shot reasoners[J]. Advances in Neural Information Processing Systems, 2022, 35: 22199- 22213.
WangXiang, TanGuo-zhen. Research on decision-making of autonomous driving in highway environment based on knowledge and large language model[J]. Journal of System Simulation, 2025(5): 1246-1255.
[14]
PengY F, TanG Z, SiH W, et al. DRL-GAT-SA: Deep reinforcement learning for autonomous driving planning based on graph attention networks and simplex architecture[J]. Journal of Systems Architecture, 2022, 126: No.102505.
HuHong-yu, ZhangHui-jun, YaoRong-han, et al. Driver's situational awareness in takeover process of L3 automated vehicles[J]. Journal of Jilin University (Engineering and Technology Edition), 2024, 54(2): 410-418.
[17]
NieX T, LiangY P, OhkuraK. Autonomous highway driving using reinforcement learning with safety check system based on time-to-collision[J]. Artificial Life and Robotics, 2023, 28(1): 158-165.
[18]
ChangM K, LeeS H, ChungC C. Comparative evaluation of dynamic and kinematic vehicle models[C]∥Conference on Decision and Control, Los Angeles, CA, USA, 2015: 648-653.
[19]
TreiberM, HenneckeA, HelbingD. Congested traffic states in empirical observations and microscopic simulations[J]. Physical Review E, 2000, 62(2): 1805.
[20]
XinL, KongY T, LiS E, et al. Enable faster and smoother spatio-temporal trajectory planning for autonomous vehicles in constrained dynamic environment[J].Journal of Automobile Engineering, 2021, 235(4): 1101-1112.
[21]
LiG F, LiS L, LiS, et al. Deep reinforcement learning enabled decision-making for autonomous driving at intersections[J]. Automotive Innovation, 2020, 3: 374-385.