Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection
JU Wei1, LU Chang-hua2, 3, ZHANG Yu-jun3, CHEN Xiao-jing1, JIANG Wei-wei2*
1. School of Internet, Anhui University, Hefei 230039, China
2. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
3. Hefei Institute of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
Abstract:The application of the ensemble learning method in the quantitative analysis of organic infrared spectra and the influence of the characteristic wavelength selection method on the modeling efficiency and prediction accuracy of infrared spectra ensemble learning is studied. Taking the cetane number and total aromatic hydrocarbon content of diesel infrared spectra as the research object, firstly, a two-layer stacking ensemble learning framework is established by using extreme random forest (ERT), linear kernel support vector machine (LinearSVM), radial basis kernel support vector machine (RBFSVM) and polynomial kernel support vector machine (polySVM) as baselearners, and LinearSVM as meta-learners. The quantitative regression accuracy of diesel infrared spectra by single base learners and ensemble learning model is analyzed and compared. Compared with the partial least squares (PLS) quantitative regression model, the prediction accuracy of the Stacking ensemble learning model for two organic compounds in diesel spectra is improved. The ERT model for cetane number content is the best (r=0.848, RMSEP=1.603, RDP=2.627), the prediction result of Stacking model for total aromatic content is the best (r=0.991, RMSEP=0.645, RDP=9.243). Further, the characteristic wavelengths of infrared spectra are selected using the combined partial least squares (SiPLS) and successive projections algorithm (SPA), and the ensemble learning quantitative regression model is established using the selected characteristic wavelengths. Among them, the prediction results of the SiPLS-ERT model for cetane number content are the best (r=0.893, RMSEP=1.013, RDP=3.051), and the prediction results of the SiPLS-Stacking model for total aromatic content are the best (r=0.998, RMSEP=0.354, RDP=11.475), and the average training time of the model is reduced by more than 50% compared with the full spectra training time, and the modeling speed is significantly improved. The results show that the characteristic wavelengths combined with ensemble learning quantitative regression modeling can be used in the quantitative analysis of organic infrared spectra. Compared with the traditional quantitative regression method, the modeling efficiency and prediction accuracy of this method are greatly improved, which provides relevant method support for the further study of the application of machine learning in the quantitative analysis of spectra.
鞠 薇,鲁昌华,张玉钧,陈晓静,蒋薇薇. 集成学习结合波长选取的有机物红外光谱定量回归方法研究[J]. 光谱学与光谱分析, 2023, 43(01): 239-247.
JU Wei, LU Chang-hua, ZHANG Yu-jun, CHEN Xiao-jing, JIANG Wei-wei. Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(01): 239-247.
[1] Zhao C H, Gao b, Zhang L J, et al. Infrared Physics and Technology, 2018, 95: 61.
[2] Zhang Y, Sui B, Shen H, et al. Computers and Electronics in Agriculture, 2019, 160: 23.
[3] Zhang X, Liu H, Yu S, et al. Geoderma, 2018,(320): 12.
[4] Liu G, Gousseau Y, Xia G S. Texture Synthesis Through Convolutional Neural Networks and Spectrum Constraints, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, 3234.
[5] LU Wei, CAI Miao-miao, ZHANG Qiang, et al(卢 伟,蔡苗苗,张 强,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2021, 41(7): 2196.
[6] Yu X, Lu Y H, Gao Q. International Journal of Pressure Vessels and Piping, 2021,189: 104249.
[7] Yun Y H, Li H D, Deng B C, et al. Trends in Analytical Chemistry, 2019, 113: 102.
[8] Yang Y, Wang L, Wu Y, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2017, 182: 73.
[9] Galvo R K, Fragoso W D. Chemometrics and Intelligent Laboratory Systems, 2008, 92(1): 83.
[10] Geurts P, Ernst D, Wehenkel L. Machine Learning, 2006, 63(1): 3.
[11] Liu X W, Zhu X Z, Li M M, et al. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2020, 42(5): 1191.
[12] Diego I M, Munoz A, Moguerza J M. Machine Leaning, 2010,78: 137.
[13] Alam K M R, Siddique N, Adeli H. Neural Computing and Applications, 2020,32(12): 8675.