未知模型参数下基于Q学习的卡尔曼状态估计算法

杨雯莹; 栾小丽; 刘飞

doi:10.12454/j.jsuese.202301029

工程科学与技术 ›› 2025, Vol. 57 ›› Issue (06) : 335 -343. DOI: 10.12454/j.jsuese.202301029

计算机科学与技术

未知模型参数下基于Q学习的卡尔曼状态估计算法

作者信息 +

Kalman State Estimation Algorithm Based on Q-learning Under Unknown Model Parameters

Author information +

文章历史 +

PDF (1803K)

摘要

高精度的卡尔曼状态估计算法要求模型参数及噪声统计特性精确已知，否则将导致估计性能显著下降，甚至滤波发散。针对这一局限性，在模型参数及噪声统计特性未知情况下，提出一种边学习边估计的卡尔曼状态估计算法。算法采用Q学习策略迭代方法来解决模型信息未知的问题，分为策略改进和策略评估两个部分。在策略改进阶段，首先定义一个可评价状态估计值的状态动作价值函数，即Q函数，并进行公式变换，使估计值仅依赖观测值而不依赖模型参数；然后通过Q函数推导出获取系统状态估计值的估计策略。在策略评估阶段，首先利用递推最小二乘法辨识Q函数的信息矩阵；然后基于辨识到的信息矩阵，遵循求得的估计策略，执行相应动作，更新状态变量的估计值；最后将所提算法应用于估计二态多项式系统的状态与四联水箱系统的水位，以此来验证算法的有效性与可行性，并将所提算法与联合估计算法进行对比。仿真结果表明，相较于联合估计算法，当系统噪声为高斯噪声时，所提算法针对两种系统状态的平均均方根误差分别降低了34.66%与79.93%，估计精度更高，面对参数的不确定性表现出更强的鲁棒性；当系统噪声为非高斯噪声时，所提算法的估计精度与联合估计算法相近。此外，在高斯噪声、非高斯噪声二态多项式系统与四联水箱系统的实验中，算法运行时间相较于联合估计算法分别降低了44.38%、45.03%与47.78%，有效提高了算法的实时性。本文提出的估计算法能够为卡尔曼状态估计算法在实际工程应用和拓展中提供思路和方法。

Abstract

Objective Kalman filtering (KF), as a widely used state estimation algorithm, plays a crucial role in estimating system state variables. The high-precision Kalman state estimation algorithm requires accurate knowledge of model parameters and noise statistical characteristics. Otherwise, estimation performance significantly degrades, and filter divergence can occur. However, in practical applications, many system model parameters and noise statistical characteristics are often unknown or inaccurate. Therefore, a Q-learning-based Kalman filtering (QL‒KF) algorithm is proposed that learns and estimates simultaneously when model parameters and noise statistical characteristics are unknown. Methods The Q-learning policy iteration algorithm, which was divided into two parts, policy improvement and policy evaluation, was employed to address the issue of unknown model information. In the policy improvement stage, a state-action value function (Q function) that evaluated the estimated state value was defined. Then, a formula transformation was utilized to ensure that the estimated value depended only on observed values rather than model parameters, eliminating the need for model parameters. In addition, two adjustable weight matrices were introduced to calculate the Kalman gain, avoiding reliance on the system noise statistical characteristics. Then, an estimation policy for obtaining system state estimates was derived from the Q function. In the policy evaluation stage, the estimation of the Q function was transformed into the estimation of its information matrix, and the recursive least squares algorithm was applied to identify the information matrix. Afterward, based on the identified information matrix, the estimation policy was followed to execute the corresponding actions and update the estimated values of the state variables. Finally, the proposed algorithm was applied to estimate the state of a two-state polynomial system and the water level of a quadruple water tank system to verify the effectiveness and feasibility of the algorithm. In addition, the proposed algorithm was compared to a joint state and parameter estimation algorithm. Results and Discussions The estimation performance of the QL‒KF algorithm was analyzed under conditions of unknown model parameters and noise statistical characteristics. A Monte Carlo experiment was conducted, and 50 Monte Carlo simulations were performed to enhance the credibility of the simulation. Uncertainty was introduced into each parameter to verify the robustness of the proposed algorithm. The root mean square error (RMSE) and the average RMSE (ARMSE) were used as performance evaluation metrics. For the two-state polynomial system, when both the system process noise and measurement noise were Gaussian noise, the simulation results showed that the RMSE of the QL‒KF algorithm exhibited a strong convergence trend, demonstrating the effectiveness of the algorithm. Because the initial estimates were randomly assigned and the Q-learning algorithm required some data accumulation during application, the initial RMSE was slightly larger and fluctuated, but showed a decreasing trend with an increasing number of iterations and gradually stabilized. Compared to the standard KF algorithm, when the model parameters were known in the KF algorithm, the RMSE value of the KF algorithm was low and very stable. However, when the model parameters of both algorithms were unknown, the proposed QL‒KF algorithm achieved significantly better estimation accuracy than the standard KF algorithm, demonstrating stronger robustness. Compared to the EVIU algorithm (joint state and parameter estimation algorithm), the RMSE of the QL‒KF algorithm was smaller and more stable after convergence, with an average ARMSE reduction of 34.66%, indicating higher estimation accuracy. It also demonstrated stronger robustness under parameter uncertainties. In addition, the algorithm required less computational time, reducing the average running time by 44.38%, and exhibited high real-time performance. When both the system process noise and measurement noise were non-Gaussian noise, the simulation results still showed that the RMSE of the QL‒KF algorithm exhibited a convergence trend, confirming the algorithm's effectiveness. When the system parameters were unknown, the estimation error of the proposed algorithm was lower than that of the KF algorithm and similar to that of the EVIU algorithm. The running time of the QL‒KF algorithm was reduced by 45.03% compared to the EVIU algorithm, indicating higher real-time performance. However, compared to the Gaussian noise system, the estimation error of the QL‒KF system increased, indicating that different types of noise affected the estimation accuracy of the proposed algorithm. For the quadruple water tank system, the RMSE of the QL‒KF algorithm for both state components showed favorable trends, demonstrating the effectiveness of the algorithm. Compared to the EVIU algorithm, the proposed algorithm exhibited stronger robustness under parameter uncertainties, with smaller estimation errors, an average ARMSE reduction of 79.93%, and a decrease in running time of 47.78%, indicating good real-time performance. Conclusions The findings indicate that the proposed QL‒KF algorithm can utilize only observations, without identifying system parameters, to estimate the internal state of systems when the model parameters and noise statistical characteristics are unknown. The estimation accuracy of the algorithm is influenced by the type of system noise. For Gaussian noise systems, the algorithm demonstrates high estimation accuracy, robust performance, and strong real-time capability. However, for non-Gaussian noise systems, the estimation accuracy decreases. Future work will focus on further improving estimation accuracy.

Graphical abstract

关键词

卡尔曼状态估计 / Q学习 / 未知模型参数 / 未知噪声统计特性

Key words

Kalman state estimation / Q-learning / unknown model parameters / unknown noise statistical characteristics

引用本文

引用格式 ▾

[Author(id=1261366804682093554, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=6221913011@stu.jiangnan.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1261366805097329656, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, authorId=1261366804682093554, language=EN, stringName=Wenying YANG, firstName=Wenying, middleName=null, lastName=YANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Key Laboratory for Advanced Process Control of Light Industry of Ministry of Education, Institute of Automation, Jiangnan University, Wuxi 214122, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261366805504177149, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, authorId=1261366804682093554, language=CN, stringName=杨雯莹, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=江南大学自动化研究所轻工过程先进控制教育部重点实验室，江苏无锡 214122, bio={"content":"

杨雯莹（1999—），女，硕士生. 研究方向：控制工程及应用. E-mail：6221913011@stu.jiangnan.edu.cn

"}, bioImg=null, bioContent=

杨雯莹（1999—），女，硕士生. 研究方向：控制工程及应用. E-mail：6221913011@stu.jiangnan.edu.cn

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261366804271051753, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, xref=null, ext=[AuthorCompanyExt(id=1261366804321383403, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, companyId=1261366804271051753, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Key Laboratory for Advanced Process Control of Light Industry of Ministry of Education, Institute of Automation, Jiangnan University, Wuxi 214122, China), AuthorCompanyExt(id=1261366804333966317, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, companyId=1261366804271051753, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=江南大学自动化研究所轻工过程先进控制教育部重点实验室，江苏无锡 214122)])]), Author(id=1261366805567090688, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=xlluan@jiangnan.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1261366805931995140, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, authorId=1261366805567090688, language=EN, stringName=Xiaoli LUAN, firstName=Xiaoli, middleName=null, lastName=LUAN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Key Laboratory for Advanced Process Control of Light Industry of Ministry of Education, Institute of Automation, Jiangnan University, Wuxi 214122, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261366805994909702, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, authorId=1261366805567090688, language=CN, stringName=栾小丽, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=江南大学自动化研究所轻工过程先进控制教育部重点实验室，江苏无锡 214122, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261366804271051753, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, xref=null, ext=[AuthorCompanyExt(id=1261366804321383403, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, companyId=1261366804271051753, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Key Laboratory for Advanced Process Control of Light Industry of Ministry of Education, Institute of Automation, Jiangnan University, Wuxi 214122, China), AuthorCompanyExt(id=1261366804333966317, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, companyId=1261366804271051753, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=江南大学自动化研究所轻工过程先进控制教育部重点实验室，江苏无锡 214122)])]), Author(id=1261366806343036936, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1261366806397562890, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, authorId=1261366806343036936, language=EN, stringName=Fei LIU, firstName=Fei, middleName=null, lastName=LIU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=Key Laboratory for Advanced Process Control of Light Industry of Ministry of Education, Institute of Automation, Jiangnan University, Wuxi 214122, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261366806758273037, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, authorId=1261366806343036936, language=CN, stringName=刘飞, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=江南大学自动化研究所轻工过程先进控制教育部重点实验室，江苏无锡 214122, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261366804271051753, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, xref=null, ext=[AuthorCompanyExt(id=1261366804321383403, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, companyId=1261366804271051753, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=Key Laboratory for Advanced Process Control of Light Industry of Ministry of Education, Institute of Automation, Jiangnan University, Wuxi 214122, China), AuthorCompanyExt(id=1261366804333966317, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1261365290227946107, companyId=1261366804271051753, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=江南大学自动化研究所轻工过程先进控制教育部重点实验室，江苏无锡 214122)])])] 杨雯莹,栾小丽,刘飞. 未知模型参数下基于Q学习的卡尔曼状态估计算法[J]. 工程科学与技术, 2025, 57(06): 335-343 DOI:10.12454/j.jsuese.202301029

登录浏览全文

4963

注册一个新账户忘记密码

本刊网刊

在实际工业过程中，受测量手段、成本等多方面限制，系统状态变量的真实值往往无法直接获取，需要通过一种成本低廉且运行可靠的状态估计方法来得到，卡尔曼滤波（KF）算法应运而生^[1‒2]。随着计算机技术的不断发展，KF算法凭借其简单、高效且估计结果为全局最优等优势得以广泛研究与应用^[3‒4]，并延伸出多种改进、拓展方法^[5‒7]。然而，KF算法的应用依赖于精确已知的系统模型参数与噪声统计特性，且噪声需为高斯白噪声。这导致一旦出现模型失配或存在有色噪声的情况，KF算法的估计性能将显著下降，甚至引起滤波发散^[8]。自适应KF算法能够处理模型失配或存在有色噪声时的状态估计问题^[9]，常用的方法有Sage‒Husa自适应KF、基于新息的自适应KF与多模型KF等^[10‒12]。

但是，此类自适应估计算法大多只考虑了环境与真实系统发生偏离时的情况。而在实际应用过程中，受系统复杂性、不确定性或者时间和资源的限制，系统的模型参数和噪声统计特性往往难以详细获取，甚至完全未知^[13‒14]。

针对模型信息未知时的状态估计问题，有两种主要解决方法。第一种解决方法是先辨识模型参数，再估计内部状态^[15‒16]。递归辨识^[17‒18]和迭代辨识^[19‒20]是两种重要的参数辨识方法。在最近的研究中，研究者应用负梯度搜索和关键项分离技术对Hammerstein输出误差系统的参数辨识问题进行了研究，提出了一种基于关键项分离的辅助模型递归梯度算法^[21]。Ding等^[22]提出了一种滤波辅助模型分层广义扩展随机梯度辨识算法和一种滤波辅助模型分级多创新广义扩展随机斜率辨识算法来辨识Box‒Jenkins系统的参数。Yang等^[23]针对非线性反馈系统的参数辨识问题，提出了一种基于层次梯度的迭代算法，以提高参数辨识精度。通过辨识算法得到精确的辨识参数，再结合状态估计算法，就能完成对系统内部状态的估计。另一种解决方法则是模型参数与状态的联合估计^[24‒25]。在最近的研究中，Aslan等^[26]提出了一种基于最大似然的粒子平滑期望最大化算法，联合估计血流动力学模型的状态和参数。Abolhasani等^[27]通过将系统状态和未知参数组合为增强状态，并利用鲁棒正则化最小二乘法来处理不确定性，提出了一种增强状态鲁棒正则化最小二乘滤波器；然而，如果模型和测量的不确定性很大，该算法可能会表现出高度的保守性。Marcos等^[28]采用随机的观点来减少保守性以解决此问题，提出了一种基于估计变化增加不确定性（EVIU）标准的类卡尔曼滤波器，称为EVIU滤波器，将未知参数视为状态增量，在模型参数未知的情况下进行状态估计。然而，上述研究均未考虑过程噪声与测量噪声统计特性未知的情况，且同时估计参数与状态将带来很大的计算负担。而时变且未知的噪声会对估计精度产生重大影响，针对噪声统计特性未知的情况，在最近的研究中，Wang等^[29]提出了一种基于Pearson type Ⅶ分布的自适应滑动窗口异常鲁棒KF算法实现联合估计。但该算法仅考虑了部分参数含有不确定性的情况，且尚未解决算法计算复杂度高、运行时间长的问题。

为解决现有算法存在的问题，提出一种基于Q学习的卡尔曼滤波（Q‒learning based KF，QL‒KF）状态估计算法。利用Q学习算法的无模型特性解决模型参数未知的问题：首先，定义一个Q函数来评价状态估计值；接着，通过Q函数推导出获取状态估计值的估计策略；然后，使用递推最小二乘法辨识Q函数信息矩阵；最后，基于辨识结果，遵循估计策略，更新状态估计值，实现模型参数未知情况下的状态估计。

1 模型参数已知的状态估计算法

模型参数已知的状态估计算法以KF及其拓展算法为主流。以KF算法为例，针对线性离散系统建立的状态空间模型如式（1）所示，分为系统模型与测量模型。

x k + 1 = A k x k + ω k, y k = C k x k + υ k

（1）

式中：

x k

为状态变量，下标

k

表示第

k

个时间步；

y k

为观测变量；

ω k

为过程噪声；

υ k

为测量噪声；

A k

为状态转移矩阵；

C k

为观测矩阵。过程噪声

ω k

与测量噪声

υ k

均满足均值为0的正态分布，且相互独立，是两个高斯白噪声序列，其统计特性如下。

E (ω k) = 0, E (ω k ω j T) = q k δ k j; E (υ k) = 0, E (υ k υ j T) = r k δ k j; E (ω k υ j T) = 0

(2)

式中：下标

j

表示第

j

个时间步；

E (⋅)

为

(⋅)

的期望值；

q k

为过程噪声

ω k

的协方差矩阵；

r k

为测量噪声

υ k

的协方差矩阵；

δ k j

为

K r o n e c k e r

δ

函数，满足式（3）：

δ k j = 1, k = j; 0, k ≠ j

(3)

KF算法使用1组递推公式，即式（4）～（8）得到最优估计^[30]，并不断调整卡尔曼增益，使估计的误差协方差最小。

x^k - = A k x^k - 1

（4）

P k - = A k P k - 1 A k T + q k

（5）

K k = P k - C k T (C k P k - C k T + r k) - 1

（6）

x^k = x^k - + K k (y k - C k x^- k)

（7）

P k = (I - K k C k) P k -

（8）

式（4）～（8）中：

x^k -

为先验估计值；

x^k

为后验估计值；

P k -

和

P k

分别为先验估计值和后验估计值与真实值之间的误差协方差矩阵；

I

为单位矩阵；

K k

为卡尔曼增益，

K k ∈ [0, C k - 1]

。

由式（4）～（8）可知，后验估计由先验估计和观测获得，而先验估计取决于模型参数，且卡尔曼增益的不断调整依赖于两个噪声的协方差矩阵。因此，使用KF算法进行状态估计需要准确了解模型参数和噪声统计特性，否则将导致模型误差，引起滤波发散^[31]。因此，针对复杂的实际工业过程，不依赖系统模型参数与噪声统计特性的状态估计算法亟待研究。

2 模型参数未知的状态估计算法

模型参数未知的状态估计算法采用策略迭代方法。

根据最优状态估计的要求，针对线性离散系统，依据式（1）定义代价函数^[32]如下。

c k = (x^k - x^k -) T E k - 1 (x^k - x^k -) + (y k - C k x^k) T F k - 1 (y k - C k x^k)

（9）

式中，

c k

为代价函数，

E k

为

(x^k - x^k -)

误差的协方差矩阵，

F k

为

(y k - C k x^k)

误差的协方差矩阵。在算法的实际应用中，

E k

和

F k

是两个可调的权重矩阵。

最小化代价函数可使估计值尽可能地同时接近先验值与观测值，与真实值之间的差异最小，实现最优估计。根据权重矩阵

E k

和

F k

，无须已知系统噪声统计特性，卡尔曼增益

K k

可以表示为：

K k = E k C k T (C k E k C k T + F k) - 1

（10）

在卡尔曼估计中，当前时刻的估计值只决定下一时刻的先验值，需与下一时刻的观测值共同完成估计。换言之，当前时刻的估计值只部分地影响下一时刻的估计值，但包含了先前所有时刻的信息。

基于此，可将二次型状态价值函数

V k

表示为：

V k = ∑ i = 0 k γ i c k - i = ∑ i = 0 k γ i [(x^k - i - x^k - i -) T E k - i - 1 (x^k - i - x^k - i -) + (y k - i - C k - i x^k - i) T F k - i - 1 (y k - i - C k - i x^k - i)]

（11）

式中，

γ

为折扣因子，

γ ∈ [0,1]

。

模型参数未知情况下，求解最优估计问题采用的策略迭代方法分为策略改进和策略评估两个部分。首先，使用Q函数生成一个改进的估计策略；接着，估计Q函数。两个步骤交替进行，实现状态估计值的增量更新。

根据贝尔曼最优性原理^[33]，将Q函数定义为：

Q k = c k + γ V k - 1 = c k + γ Q k - 1

（12）

将式（9）和（11）代入式（12）得：

Q k = c k + γ V k - 1 ≈ (x^k - x^k -) T E k - 1 (x^k - x^k -) + (y k - C k x^k) T F k - 1 (y k - C k x^k) + γ [(x^k - 1 - x^k - 1 -) T E k - 1 - 1 (x^k - 1 - x^k - 1 -) + (y k - 1 - C k - 1 x^k - 1) T F k - 1 - 1 (y k - 1 - C k - 1 x^k - 1)]

（13）

由于距当前时刻越远的信息，对当前估计值的影响越小，式（12）省略了

k - 1

时刻前的观测值与估计值。

与联合估计算法不同，QL‒KF算法的优化目标是找到能最小化累积代价，即最小化Q函数的估计策略，而非同时估计系统的状态与参数。通过取Q函数的偏导数，将其设置为0，就可得到Q函数取最小值时对应的最优估计值。相应的，获取系统状态估计值的最优策略为：

x^k = a r g m i n x^Q k

（14）

根据上述对KF算法的分析可知，状态估计值受先验值的影响，而先验值需要依据系统模型参数，通过式（4）求得。因此，在未知系统模型参数的情况下，可以根据式（7）进行公式变换，得到：

x^k - = (I - K k C k) - 1 x^k - (I - K k C k) - 1 K k y k

（15）

将先验值变换成不依赖模型参数而依赖观测值与估计值的形式，使得估计值的求取仅依赖观测值。

将式（15）代入Q函数，并将其展开，写成二次型：

Q k = [x^k - α k (x^k - K k y k)] T E k - 1 [x^k - α k (x^k - K k y k)] + (y k - C k x^k) T F k - 1 (y k - C k x^k) + γ [[A k - 1 α k (x^k - K k y k) - α k - 1 (x^k - 1 - K k - 1 y k - 1)] T ⋅ E k - 1 - 1 [A k - 1 α k (x^k - K k y k) - α k - 1 (x^k - 1 - K k - 1 y k - 1)] + [y k - 1 - C k A k - 1 α k (x^k - K k y k)] T ⋅ F k - 1 - 1 [y k - 1 - C k A k - 1 α k (x^k - K k y k)]] = [x^k, x^k - 1, y k, y k - 1] T H k [x^k, x^k - 1, y k, y k - 1]

（16）

其中，

α k = (I - K k C k) - 1

（17）

H 11, k = (I - α k) T E k - 1 (I - α k) + C k T F k - 1 C k + γ (A k - 1 α k) T E k - 1 - 1 A k - 1 α k + γ (C k A k - 1 α k) T F k - 1 - 1 C k A k - 1 α k

（18）

H 12, k = - γ (A k - 1 α k) T E k - 1 - 1 α k - 1

（19）

H 13, k = (I - α k) T E k - 1 α k K k - C k T F k - 1 - γ (A k - 1 α k) T E k - 1 - 1 A k - 1 α k K k - γ (C k A k - 1 α k) T F k - 1 - 1 C k A k - 1 α k K k

（20）

H 14, k = γ (A k - 1 α k) T E k - 1 - 1 α k - 1 - γ (C k A k - 1 α k) T F k - 1 - 1

（21）

式（16）～（21）中：

H k

为信息矩阵，是一个对称正定矩阵，包含了先验值的所有信息；

H 1 i, k

为矩阵

H k

中第1行的第

i

个分块矩阵，i=1,2,3,4；

α k

为过程变量。

将式（16）～（21）代入式（14）可得估计策略：

x^k = - (H 11, k) - 1 (H 12, k x^k - 1 + H 13, k y k + H 14, k y k - 1)

（22）

为根据估计策略获取状态估计值，需在策略评估阶段对Q函数进行估计。Q函数的估计与更新是Q学习的核心。在QL-KF算法中，每次更新只涉及当前状态、观测、代价与上一时刻状态，计算复杂度较低。

Q函数的估计问题可以转化为Q函数信息矩阵

H k

的辨识问题，使用递推最小二乘法辨识信息矩阵

H k

。

首先，将二次型转换为以下形式：

x T L x = x ¯ T Θ (L)

（23）

式中：

x ¯

是由向量

x

各元素的所有二次基函数构成的向量，且

x ¯ T = [x 12, x 1 x 2, ⋯, x 1 x n, x 22, x 2 x 3, ⋯, x 2 x n, ⋯, x n 2]

，

x i

为向量

x

的第i个分量,

i ∈ [1, n]

；向量

Θ (L)

中元素由矩阵

L

的

n

个对角元素和

n (n + 1) 2 - n

个不同双元素之和（

L i j + L j i

）组成。

相应地，根据式（12）和（16），得到代价函数在线辨识模型的一般形式：

c k (x^k, x^k - 1, y k, y k - 1) z Q k (x^k, x^k - 1, y k, y k - 1) - γ Q k - 1 (x^k - 1, x^k - 2, y k - 1, y k - 2) = x^k, x^k - 1, y k, y k - 1 ¯ T Θ (H k) - γ x^k - 1, x^k - 2, y k - 1, y k - 2 ¯ T Θ (H k) = ϕ k T θ k

（24）

式中：

x^k, x^k - 1, y k, y k - 1 ¯ T

为向量

x^k 、 x^k - 1 、 y k 、 y k - 1

中各元素的所有二次基函数构成的向量；

ϕ k

为信息向量，

ϕ k = x^k, x^k - 1, y k, y k - 1 ¯ - γ x^k - 1, x^k - 2, y k - 1, y k - 2 ¯

，由已知信息构成；

θ k

为待辨识的参数向量，

θ k = Θ (H k)

，包含所有未知信息，需使用以下递推最小二乘法公式进行辨识。

e k (s) = c k - ϕ k T θ^k (s - 1), θ^k (s) = θ^k (s - 1) + P k (s - 1) ϕ k e k (s) [1 + ϕ k T P k (s - 1) ϕ k] - 1, P k (s) = P k (s - 1) - P k (s - 1) ϕ k ϕ k T P k (s - 1) [1 + ϕ k T P k (s - 1) ϕ k] - 1, P k (0) = P 0

（25）

式中：

e k

为代价函数的误差；

θ^k (s)

为

θ k

的第s个估计值；

P 0

为

P k

的初始值，

P 0 = β I

，

β

是一个很大的正常数；

s

为辨识

H k

的时间步数。

利用辨识出的

H k

，根据最优策略，执行以下操作：

x^k + 1 = - (H 11, k) - 1 (H 12, k x^k + H 13, k y k + 1 + H 14, k y k)

（26）

便可得到下一时刻状态的最优估计值。

综上所述，在得到一系列系统观测值的情况下，无须使用系统模型参数，也无须已知系统噪声统计特性，只要赋予状态估计值及卡尔曼增益一些随机初值，就可以实现状态估计。

3 仿真验证

为验证本文所提估计算法在模型参数与噪声统计特性未知的情况下的有效性与可行性，将算法运用于估计二态多项式系统的状态和四联水箱系统的水位，给出相应仿真结果。使用均方根误差（RMSE，记为

E R M S E

）与平均均方根误差（ARMSE，记为

E A R M S E

）作为性能评价指标。

k

时刻状态估计的RMSE为：

E R M S E (k) = 1 M ∑ i = 1 M (x k, j i - x^k, j i) 2

（27）

式中，

M

为蒙特卡罗模拟次数，

x k, j i

为

x k

在第

i

个时间步的第

j

个分量的值，

j

代表状态变量的个数，即

x k = [x k, 1, x k, 2, ⋯, x k, j]

。

ARMSE定义为：

E A R M S E = 1 N ∑ i = 1 N E R M S E (i)

（28）

式中，N为RMSE的个数。

3.1 二态多项式系统

3.1.1 高斯噪声

将QL‒KF算法应用于二态多项式系统：

x k + 1 = A k x k + ω k, ω k ~ N (0, q k); y k = C k x k + υ k, υ k ~ N (0, r k)

（29）

式中，

A k = 1.618 1 - 0.618 0

，

C k = 10

。

假设过程噪声与测量噪声皆为零均值的高斯白噪声，且

q k = I

，

r k = 12

。为表征QL‒KF算法，对系统模型参数施加不确定性。表1为二项式多态系统的仿真环境。为增加模拟的可信度，引入蒙特卡罗实验，设置模拟次数

M = 30

，并将本文算法与标准KF算法、以EVIU算法^[28]为例的参数状态联合估计算法进行对比。

由于式（24）包含

x^k 、 x^k - 1 、 x^k - 2

3个时刻的状态估计值，因此，算法从时间步

k = 3

开始，并对

x^1 、 x^2 、 x^3

赋予随机初值。QL-KF算法流程始于

x 1 = [0.3,0.3]

，

x 2 = [0.5,0.5]

，

x 3 = [1,1]

，并对200个时间步内的状态进行估计。图1为高斯噪声二态多项式系统不同参数变化时各算法的RMSE。表2为高斯噪声下3种算法性能对比。

由图1和表1、2可见，面对时变且未知的各模型参数，QL-KF算法的RSME表现出良好的收敛趋势，体现了算法的有效性。随机给定的初始估计值导致QL‒KF算法初始时刻的RSME可能略大。并且，由于Q学习算法在应用过程中需要一定的数据累积，当估计时间步较小时，RMSE略有波动，但随着迭代次数的增加呈下降趋势，并逐渐趋于稳定。在第101个时间步，由于参数的变化，QL‒KF算法的RSME有所增加，但增幅较小且快速恢复到较低水平，最终稳定在特定值附近，显示出较强的鲁棒性。

对比标准KF算法与QL‒KF算法，标准KF算法在已知模型参数时，RMSE较小且十分平稳。但面对模型参数改变时，在两者均未知模型参数的情况下，本文提出的QL‒KF算法估计精度明显优于标准KF算法，且具有更强的鲁棒性。

与同样未知各模型参数的EVIU算法相比，在前100个时间步内，两者估计精度相当。在第101个时间步，面对各参数的变化，EVIU算法的RMSE增幅更大。并且，随着迭代次数的增加，QL‒KF算法稳定后的RMSE更小且更为平稳。此外，由于QL‒KF算法在每个时间步只需更新一次Q函数与估计策略，且为增量更新，算法计算量较小；而EVIU算法在每个时间步都要对新的观测数据进行全面处理，同时更新系统状态和参数估计，需要在每次迭代中进行大量的计算。

因此，与EVIU算法相比，QK‒KF算法的运行速度更快。综上所述，本文提出的QL‒KF算法的ARMSE相较于EVIU算法平均降低约34.66%，估计精度更高；面对参数的不确定性时表现出更强的鲁棒性；算法平均耗时较少，与标准KF算法相当，相较于EVIU算法降低约44.38%，实时性高。

3.1.2 非高斯噪声

以均匀噪声为例，观察QL‒KF算法针对非高斯噪声系统的估计性能。

假设二态多项式系统的过程噪声与测量噪声皆为均值为

(0 + 10) / 2

、方差为

(10 - 0) 2 / 12

的均匀噪声。实验仍使用表1中的仿真环境，设置蒙特卡罗模拟次数

M = 30

。QL‒KF算法流程始于

x 1 = [0.3,0.3]

，

x 2 = [0.5,0.5]

，

x 3 = [1,1]

，并对200个时间步内的状态进行估计。图2为非高斯噪声二态多项式系统不同参数变化时各算法的RMSE。表3为非高斯噪声下3种算法性能对比结果。

由图2和表1、3可见，QL‒KF算法对均匀噪声具有一定的鲁棒性，RMSE仍呈现收敛趋势。基于一定量的数据累积，估计器能够逐渐调整对状态的估计，使得误差逐步减小，并在长期运行后达到稳态。但非高斯噪声会导致QL‒KF算法的估计误差增大，不同参数变化下ARMSE平均约为14.28。算法的估计精度高于标准KF算法，与EVIU算法相当。平均耗时相较于EVIU算法降低约45.03%，仍具有较高的实时性。

3.2 四联水箱系统

本节使用四联水箱系统^[34]来验证算法的有效性。图3为四联水箱系统概述图。由图3可知，两台水泵使用两股分流将水送入不同的水箱，水箱3和4也分别将水送入水箱1和2。上部水箱中的水只能排入其下方的水箱，从水箱底部排出的水直接流入大型蓄水池。系统状态向量定义如下。

x k = [h 1 (k) - s 1, h 2 (k) - s 2, h 3 (k) - s 3, h 4 (k) - s 4] T

（30）

式中：

h i (k)

为水箱

i

的水位，

i ∈ [1,4]

；

s i

为水箱水位的设定值，

i ∈ [1,4]

，

s 1 = 0.122 6 m

，

s 2 = 0.260 2 m

，

s 3 = 0.010 1 m

s 4 = 0.040 8 m

。

于是，四联水箱系统的状态空间模型可用式（1）描述，参数为：

A k = 0.932 1 0 0.417 3 0 0 0.939 5 0 0.112 3 00 0.861 3 0 000 0.909 0

，

C k = 10010000

。

假设过程噪声与测量噪声皆为0均值的高斯白噪声，且

q k = d i a g (10 - 4, 10 - 4, 10 - 4, 10 - 4)

，

r k = I

。

对系统模型参数施加不确定性。表4为四联水箱系统的仿真环境。设置蒙特卡罗模拟次数

M = 30

。算法流程始于

x 1 = [0.3,0.3,0.3,0.3]

，

x 2 = [0.5,0.5,0.5,0.5]

，

x 3 = [1,1, 1,1]

，并对1 000个时间步内的状态进行估计。图4为2种算法针对四联水箱系统的RMSE。表5为四联水箱系统两种算法性能对比结果。

由图4与表4、5可见，QL‒KF算法针对两个状态分量的RMSE均表现出良好的趋势，验证了算法有效性。具体而言，在初始时刻，QL‒KF的RSME较大，但此后便迅速收敛；在后续时间步中，参数的不确定性使得RMSE有所增加，但随后快速减小，呈收敛趋势，并随着参数的稳定逐渐趋于稳态。而EVIU算法受参数不确定性影响较大。与其相比，QL‒KF算法鲁棒性更强，稳定后具有更高的精度，且RMSE更加平稳，ARMSE平均减小79.93%，算法平均耗时降低47.78%，有效提高了实时性。

4 结论

本文针对使用卡尔曼状态估计算法需要系统模型参数及系统噪声统计特性精确已知这一局限性，提出了一种基于Q学习的卡尔曼状态估计算法。此算法仅赋予状态一些随机的初始估计值，就能够在系统模型参数及噪声统计特性未知的情况下，实现状态估计。仿真结果表明，本文提出的QL‒KF算法能够在未知系统模型参数及噪声统计特性的情况下，无须辨识系统参数，仅利用观测值得到系统内部状态的估计。算法的估计性能受系统噪声类型的影响。面对高斯噪声系统时，本文算法估计精度高、鲁棒性强、实时性高。但面对非高斯噪声系统时，估计精度有所下降，未来将考虑进一步提升估计精度。

参考文献

原文顺序 | 出版日期 | 本文引用

[1]	Kordestani M, Safavi A A, Saif M.Recent survey of large-scale systems:Architectures,controller strategies,and industrial applications[J].IEEE Systems Journal,2021,15(4):5440‒5453. doi:10.1109/jsyst.2020.3048951

[2]	Tabacek J, Havlena V.Reduction of prediction error sensitivity to parameters in Kalman filter[J].Journal of the Franklin Institute,2022,359(3):1303‒1326. doi:10.1016/j.jfranklin.2021.12.019

[3]	Chen Yifan, Han Haiqian, Zhang Yi,et al.Dynamic inversion of hydrodynamic parameters of plain river network[J].Advanced Engineering Sciences,2019,51(2):13‒20.

[4]	陈一帆,韩海骞,张翼,等.城镇化平原河网水动力参数动态反演[J].工程科学与技术,2019,51(2):13‒20.

[5]	Li Qinwen, Wang Zhiqian, Wang Wenrui,et al.A model predictive obstacle avoidance method based on dynamic motion primitives and a Kalman filter[J].Asian Journal of Control,2023,25(2):1510‒1525. doi:10.1002/asjc.2946

[6]	Miao Kelei, Zhang Wenan, Qiu Xiang.An adaptive unscented Kalman filter approach to secure state estimation for wireless sensor networks[J].Asian Journal of Control,2023,25(1):629‒636. doi:10.1002/asjc.2783

[7]	Park G.Optimal vehicle position estimation using adaptive unscented Kalman filter based on sensor fusion[J].Mechatronics,2024,99:103144. doi:10.1016/j.mechatronics.2024.103144

[8]	Lu Xin, Liu Zhong, Zhang Hongxin,et al.A Gaussian nonlinear iterated update filter[J].Advanced Engineering Sciences,2017,49(4):111‒118.

[9]	陆欣,刘忠,张宏欣,等.一种高斯型非线性迭代更新滤波器[J].工程科学与技术,2017,49(4):111‒118.

[10]	Pan Cheng, Gao Jingxiang, Li Zengke,et al.Multiple fading factors-based strong tracking variational Bayesian adaptive Kalman filter[J].Measurement,2021,176:109139. doi:10.1016/j.measurement.2021.109139

[11]	Zeng Jingcong, Shi Yuanfeng, Dai Kaoshan,et al.Real-time structural displacement estimation by fusing acceleration and displacement data with adaptive Kalman filter[J].Advanced Engineering Sciences,2023,55(4):188‒196.

[12]	曾竞骢,施袁锋,戴靠山,等.基于自适应卡尔曼滤波加速度与位移融合的结构位移实时估计[J].工程科学与技术,2023,55(4):188‒196.

[13]	Dong Xiangxiang, Battistelli G, Chisci L,et al.A variational Bayes moving horizon estimation adaptive filter with guaranteed stability[J].Automatica,2022,142:110374. doi:10.1016/j.automatica.2022.110374

[14]	Narasimhappa M, Mahindrakar A D, Guizilini V C,et al.MEMS-based IMU drift minimization:Sage husa adaptive robust Kalman filtering[J].IEEE Sensors Journal,2020,20(1):250‒260. doi:10.1109/jsen.2019.2941273

[15]	Tripathi R P, Singh A K, Gangwar P.Innovation-based fractional order adaptive Kalman filter[J].Journal of Electrical Engineering,2020,71(1):60‒64. doi:10.2478/jee-2020-0009

[16]	Jiang Liuyang, Zhang Hai.Redundant measurement-based second order mutual difference adaptive Kalman filter[J].Automatica,2019,100:396‒402. doi:10.1016/j.automatica.2018.11.037

[17]	Liu Tong, Zhang Zengjie, Liu Fangzhou,et al.Adaptive observer for a class of systems with switched unknown parameters using DREM[J].IEEE Transactions on Automatic Control,2024,69(4):2445‒2452. doi:10.1109/tac.2023.3309228

[18]	Zhang Xianku, Zhao Baigang, Zhang Guoqing.Improved parameter identification algorithm for ship model based on nonlinear innovation decorated by sigmoid function[J].Transportation Safety and Environment,2021,3(2):114‒122. doi:10.1093/tse/tdab006

[19]	Zhao Baigang, Zhang Xianku.An improved nonlinear innovation-based parameter identification algorithm for ship models[J].Journal of Navigation,2021,74(3):549‒557. doi:10.1017/s0373463321000102

[20]	Shi Zhenwei, Yang Haodong, Dai Mei.The data-filtering based bias compensation recursive least squares identification for multi-input single-output systems with colored noises[J].Journal of the Franklin Institute,2023,360(7):4753‒4783. doi:10.1016/j.jfranklin.2023.01.040

[21]	Xu Huan, Ding Feng, Yang Erfu.Modeling a nonlinear process using the exponential autoregressive time series model[J].Nonlinear Dynamics,2019,95(3):2079‒2092. doi:10.1007/s11071-018-4677-0

[22]	Ge Zhengwei, Ding Feng, Xu Ling,et al.Gradient-based iterative identification method for multivariate equation-error autoregressive moving average systems using the decomposition technique[J].Journal of the Franklin Institute,2019,356(3):1658‒1676. doi:10.1016/j.jfranklin.2018.12.002

[23]	You Junyao, Liu Yanjun, Chen Jing,et al.Iterative identification for multiple-input systems with time-delays based on greedy pursuit and auxiliary model[J].Journal of the Franklin Institute,2019,356(11):5819‒5833. doi:10.1016/j.jfranklin.2019.03.018

[24]	Lv Lei, Sun Wei, Pan Jian.Two-stage and three-stage recursive gradient identification of Hammerstein nonlinear systems based on the key term separation[J].International Journal of Robust and Nonlinear Control,2024,34(2):829‒848. doi:10.1002/rnc.7007

[25]	Ding Feng, Xu Ling, Zhang Xiao,et al.Recursive identification methods for general stochastic systems with colored noises by using the hierarchical identification principle and the filtering identification idea[J].Annual Reviews in Control,2024,57:100942. doi:10.1016/j.arcontrol.2024.100942

[26]	Yang Dan, Liu Yanjun, Ding Feng,et al.Hierarchical gradient-based iterative parameter estimation algorithms for a nonlinear feedback system based on the hierarchical identification principle[J].Circuits,Systems,and Signal Processing,2024,43(1):124‒151. doi:10.1007/s00034-023-02477-1

[27]	Patil P V, Vachhani L, Ravitharan S,et al.Sequential state and unknown parameter estimation strategy and its application to a sensor fusion problem[J].IEEE Sensors Journal,2022,22(21):20665‒20675. doi:10.1109/jsen.2022.3199214

[28]	Molaei A, Nikoofard A, Sedigh A K,et al.Parameter and state estimation of managed pressure drilling system using the optimization-based supervisory framework[J].IEEE Transactions on Control Systems Technology,2023,31(6):2937‒2944. doi:10.1109/tcst.2023.3273192

[29]	Aslan S, Cemgil A T, Akın A.Joint state and parameter estimation of the hemodynamic model by particle smoother expectation maximization method[J].Journal of Neural Engineering,2016,13(4):046010. doi:10.1088/1741-2560/13/4/046010

[30]	Abolhasani M, Rahmani M.Robust deterministic least-squares filtering for uncertain time-varying nonlinear systems with unknown inputs[J].Systems & Control Letters,2018,122:1‒11. doi:10.1016/j.sysconle.2018.09.005

[31]	Fernandes M R, do Val J B R, Souto R F.Robust estimation and filtering for poorly known models[J].IEEE Control Systems Letters,2020,4(2):474‒479. doi:10.1109/lcsys.2019.2951611

[32]	Wang Ke, Wu Panlong, Li Xingxiu,et al.An adaptive outlier-robust Kalman filter based on sliding window and Pearson type Ⅶ distribution modeling[J].Signal Processing,2024,216:109306. doi:10.1016/j.sigpro.2023.109306

[33]	Hua Jinxing, Liu Ruirui, Hao Fei.Two-channel false data injection attacks on multi-sensor remote state estimation[J].Asian Journal of Control,2023,25(5):3776‒3791. doi:10.1002/asjc.3067

[34]	Ge Quanbo, Ma Zhongcheng, Li Jinglan,et al.Adaptive cubature Kalman filter with the estimation of correlation between multiplicative noise and additive measurement noise[J].Chinese Journal of Aeronautics,2022,35(5):40‒52. doi:10.1016/j.cja.2021.05.004

[35]	Xue Wei, Luan Xiaoli, Zhao Shunyi,et al.An online performance index for the Kalman filter[J].IEEE Transactions on Instrumentation and Measurement,2022,71:1007912. doi:10.1109/tim.2022.3212114

[36]	Zhang Tengfei, Jia Yingmin.Input-constrained optimal output synchronization of heterogeneous multiagent systems via observer-based model-free reinforcement learning[J].Asian Journal of Control,2024,26(1):98‒113. doi:10.1002/asjc.3183

[37]	Zhao Shunyi, Shmaliy Y S, Ahn C K,et al.Self-tuning unbiased finite impulse response filtering algorithm for processes with unknown measurement noise covariance[J].IEEE Transactions on Control Systems Technology,2021,29(3):1372‒1379. doi:10.1109/tcst.2020.2991609