1. School of Computer Science and Information Engineering, Hefei University of Technology,Hefei 230009, China
2. Anhui Institute of Optics Fine Mechanics, Chinese Academy of Sciences, Hefei 230031, China
3. School of Internet, Anhui University, Hefei 230039, China
4. Department of Electronics,Hefei University,Hefei 230061,China
Abstract:In recent years, deep learning has been studied more and more in the field of data mining, and the integrated learning algorithm in deep learning has been applied to classification and quantitative regression more and more, but the application of integrated learning in the field of infrared spectrum analysis is little. In this paper, an integrated learning quantitative regression algorithm based on Blending model is proposed. GBDT algorithm, linear kernel support vector machine (LinearSVM) and radial kernel support vector machine (RBF SVM) are used as the basic learners, and the prediction results of the basic learners are fused by LinearSVM. The first derivative preprocessing was carried out for the spectral data. The prediction results of the model were analyzed and compared by using the GBDT, LinearSVM, RBF SVM and the Blending integrated learning model respectively. RBF SVM model is the best model for predicting the content of active substance and hardness, R2 is the highest, the RMSEP is the smallest, and the RPD is the largest, and the GBDT model is the worst. The R2 of tablet quality predicted by Blending model is the highest, reaching 0.837 4, while the RMSEP of RBF SVM is the lowest, 2.140 6, and the RPD of RBF SVM, 7.487 8, is the largest. For the boiling point, flash point and total aromatics of diesel oil, Blending model is the best one, which is better than the single model. For the cetane number, GBDT model and RBF SVM model are better than Blending model. For the density property, the single model and the integrated model have better prediction results, except that the R2 of LinearSVM model is 0.944 5, R2 of other models are all higher than 0.99. For the prediction of freezing point properties, RBF SVM and LinearSVM are both better than Blending model. For the prediction of viscosity, only RBF SVM is better than Blending model. It can be seen from the results that the Blending model integrates the characteristics of GBDT, LinearSVM and RBF SVM model, compared with the single model, the prediction of Blending is better or optimal. It is proved that Blending integrated learning model has strong applicability for infrared quantitative regression, and has a high prediction accuracy and generalization ability. It is of great significance for further research on the application of integrated learning algorithm in infrared quantitative regression.
Key words:Integrated learning; Support vector machine; GBDT; Quantitative regression
蒋薇薇,鲁昌华,张玉钧,鞠 薇,汪济洲,偶春生,肖明霞. 集成学习算法的红外光谱定量回归模型[J]. 光谱学与光谱分析, 2021, 41(04): 1119-1124.
JIANG Wei-wei,LU Chang-hua,ZHANG Yu-jun,JU Wei,WANG Ji-zhou,OU Chun-sheng,XIAO Ming-xia. Research on a Quantitative Regression Model of the Infrared Spectrum Based on the Integrated Learning Algorithm. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(04): 1119-1124.
[1] Hepp T, Schmid M, Gefeller O, et al. Methods of Information in Medicine, 2019, 58(1): 60.
[2] Huang Guangbin, Zhu Qinyu, Siew Chee-Kheong. 8th Brazilian Symposium on Neural Networks, 2006, 70(1-3): 489.
[3] Padarian J, Minasny B, McBratney A B. Geoderma Regional, 2018, 15: e00198.
[4] Yoav Freund, Robert E Schapire. Journal of Computer & System Sciences, 1997, 55(1): 119.
[5] Zhang Xiaokang, Liu Huanjun, Yu Shengnan, et al. Geoderma, 2018, 320: 12.
[6] Breiman L. Machine Learning, 2001, 45(1): 5.
[7] Yang Tao, Chen Weiting, Cao Guitao. Biomedical Signal Processing & Control, 2016, 28(7): 50.
[8] RONG Nian-ci, HUANG Mei-zhen(戎念慈, 黄梅珍). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(1): 168.
[9] Boucher T F, Ozanne M V, Carmosino M L, et al. Spectrochimica Acta Part B: Atomic Spectroscopy, 2015, 107: 1.
[10] Friedman J, Hastie T, Tibshirani R. The Annals of Statistics, 2000, 28(2): 337.