基于随机森林算法的秦艽龙胆苦苷含量快速检测
Rapid detection of gentiopicin content in Gentiana macrophylla based on random forest algorithm
目的 基于近红外光谱技术,运用随机森林算法实现秦艽中龙胆苦苷含量的快速、准确、无损检测。 方法 采用HPLC法测定秦艽中龙胆苦苷的含量,正交信号校正结合小波压缩对原始光谱进行预处理,以抽取的小波系数作为光谱特征建立秦艽近红外光谱和龙胆苦苷含量之间的随机森林定量分析模型,同时对4种模型的预测结果进行了对比分析。 结果 原始光谱正交信号校正预处理后分别建立偏最小二乘和随机森林定量分析模型,偏最小二乘回归模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.246 9和0.936 8,随机森林定量分析模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.207 5和0.969 5。原始光谱正交信号校正后进行离散小波分解,抽取63个中低频小波系数分别建立偏最小二乘和随机森林定量分析模型,偏最小二乘回归模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.212 6和0.950 3,随机森林定量分析模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.166 3和0.980 4。 结论 通过小波多尺度分解降低了决策树之间的相关性,进一步提高了随机森林定量分析模型的泛化能力和稳健性,该定量分析模型可用于秦艽中龙胆苦苷含量的快速准确检测。
Objective Based on near infrared spectroscopy, the content of gentiopicin in Gentiana macrophylla was rapidly, accurately and non-destructively determined using a random forest algorithm. Method HPLC method was used to determine the content of gentiopicrin in G. macrophylla. Orthogonal signal correction combined with wavelet compression was used to preprocess the original spectra, and the extracted wavelet coefficients were used as spectral features to establish a random forest quantitative analysis model between NIR spectrum and gentiopicrin content. At the same time, the prediction results of the four models were compared and analyzed. Result The partial least squares and random forest quantitative analysis models were established after the spectral pre-processing of the orthogonal signal correction. The root mean square error (RMSEP) and coefficient of determination (R2) of the partial least squares regression model on the validation set were 0.246 9 and 0.936 8 respectively,and the root mean square error (RMSEP) and coefficient of determination (R2) of the random forest quantitative analysis model on the validation set were 0.207 5 and 0.969 5 respectively.After the orthogonal signal is corrected, discrete wavelet decomposition is performed, and 63 medium and low frequency wavelet coefficients are extracted to establish partial least squares and random forest quantitative analysis models respectively.The root mean square error (RMSEP) and coefficient of determination (R2) of the partial least squares regression model on the validation set are 0.212 6 and 0.950 3,respectively.The root mean square error (RMSEP) and coefficient of determination(R2) of the random forest quantitative analysis model on the validation set are 0.166 3 and 0.980 4,respectively. Conclusion The correlation of decision trees was reduced by wavelet multi-scale decomposition, and the generalization ability and robustness of the random forest quantitative analysis model were further improved.The quantitative analysis model can be used for the rapid and accurate determination of gentiopicin content in G.macrophylla.
秦艽 / 近红外光谱 / 龙胆苦苷 / 随机森林 / 小波变换
Gentiana macrophylla / near infrared spectroscopy / gentiopicrin / random forest / wavelet transform
| [1] |
张泽坤,王梓轩,李娅琦, |
| [2] |
王焱,曾文雪,宋小玲, |
| [3] |
张润,陈千良,胡河荷.干燥方法对秦艽药材中有效成分含量的影响[J].时珍国医国药,2019,30(6):1348-1351. |
| [4] |
王玲,郭志廷,熊琳, |
| [5] |
孙晓荣,王赋腾,刘翠玲, |
| [6] |
王冬,吴静珠,韩平, |
| [7] |
张敏,吴崇友,陈旭, |
| [8] |
谢有超,彭黔荣,杨敏, |
| [9] |
|
| [10] |
李四海,刘东玲.正交匹配追踪算法的近红外光谱定量分析[J].光谱学与光谱分析,2021,41(4):1097-1101. |
| [11] |
|
| [12] |
|
| [13] |
李盛芳,贾敏智,董大明.随机森林算法的水果糖分近红外光谱测量[J].光谱学与光谱分析,2018,038(6):1766-1771. |
| [14] |
王其滨,杨辉华,潘细朋, |
| [15] |
关晓蔷,王文剑,庞继芳, |
| [16] |
第五鹏瑶,卞希慧,王姿方, |
| [17] |
彭成,王松松,贺婧, |
| [18] |
武秀恒,秦嘉浩,杜岳峰, |
| [19] |
彭成,王松松,贺婧, |
| [20] |
李雪莹,李宗民,陈光源, |
甘肃省科技计划项目(21JR1RA272)
兰州市科技计划项目(2018-3-41)
/
| 〈 |
|
〉 |