|
|
|
|
|
|
Research on Quantitative Regression Method of IR Spectra of Organic Compounds Based on Ensemble Learning With Wavelength Selection |
JU Wei1, LU Chang-hua2, 3, ZHANG Yu-jun3, CHEN Xiao-jing1, JIANG Wei-wei2* |
1. School of Internet, Anhui University, Hefei 230039, China
2. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
3. Hefei Institute of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
|
|
|
Abstract The application of the ensemble learning method in the quantitative analysis of organic infrared spectra and the influence of the characteristic wavelength selection method on the modeling efficiency and prediction accuracy of infrared spectra ensemble learning is studied. Taking the cetane number and total aromatic hydrocarbon content of diesel infrared spectra as the research object, firstly, a two-layer stacking ensemble learning framework is established by using extreme random forest (ERT), linear kernel support vector machine (LinearSVM), radial basis kernel support vector machine (RBFSVM) and polynomial kernel support vector machine (polySVM) as baselearners, and LinearSVM as meta-learners. The quantitative regression accuracy of diesel infrared spectra by single base learners and ensemble learning model is analyzed and compared. Compared with the partial least squares (PLS) quantitative regression model, the prediction accuracy of the Stacking ensemble learning model for two organic compounds in diesel spectra is improved. The ERT model for cetane number content is the best (r=0.848, RMSEP=1.603, RDP=2.627), the prediction result of Stacking model for total aromatic content is the best (r=0.991, RMSEP=0.645, RDP=9.243). Further, the characteristic wavelengths of infrared spectra are selected using the combined partial least squares (SiPLS) and successive projections algorithm (SPA), and the ensemble learning quantitative regression model is established using the selected characteristic wavelengths. Among them, the prediction results of the SiPLS-ERT model for cetane number content are the best (r=0.893, RMSEP=1.013, RDP=3.051), and the prediction results of the SiPLS-Stacking model for total aromatic content are the best (r=0.998, RMSEP=0.354, RDP=11.475), and the average training time of the model is reduced by more than 50% compared with the full spectra training time, and the modeling speed is significantly improved. The results show that the characteristic wavelengths combined with ensemble learning quantitative regression modeling can be used in the quantitative analysis of organic infrared spectra. Compared with the traditional quantitative regression method, the modeling efficiency and prediction accuracy of this method are greatly improved, which provides relevant method support for the further study of the application of machine learning in the quantitative analysis of spectra.
|
Received: 2021-10-13
Accepted: 2022-04-19
|
|
Corresponding Authors:
JIANG Wei-wei
E-mail: jiangww@hfut.edu.cn
|
|
[1] Zhao C H, Gao b, Zhang L J, et al. Infrared Physics and Technology, 2018, 95: 61.
[2] Zhang Y, Sui B, Shen H, et al. Computers and Electronics in Agriculture, 2019, 160: 23.
[3] Zhang X, Liu H, Yu S, et al. Geoderma, 2018,(320): 12.
[4] Liu G, Gousseau Y, Xia G S. Texture Synthesis Through Convolutional Neural Networks and Spectrum Constraints, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, 3234.
[5] LU Wei, CAI Miao-miao, ZHANG Qiang, et al(卢 伟,蔡苗苗,张 强,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2021, 41(7): 2196.
[6] Yu X, Lu Y H, Gao Q. International Journal of Pressure Vessels and Piping, 2021,189: 104249.
[7] Yun Y H, Li H D, Deng B C, et al. Trends in Analytical Chemistry, 2019, 113: 102.
[8] Yang Y, Wang L, Wu Y, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2017, 182: 73.
[9] Galvo R K, Fragoso W D. Chemometrics and Intelligent Laboratory Systems, 2008, 92(1): 83.
[10] Geurts P, Ernst D, Wehenkel L. Machine Learning, 2006, 63(1): 3.
[11] Liu X W, Zhu X Z, Li M M, et al. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2020, 42(5): 1191.
[12] Diego I M, Munoz A, Moguerza J M. Machine Leaning, 2010,78: 137.
[13] Alam K M R, Siddique N, Adeli H. Neural Computing and Applications, 2020,32(12): 8675.
|
[1] |
WANG Jun-jie1, YUAN Xi-ping2, 3, GAN Shu1, 2*, HU Lin1, ZHAO Hai-long1. Hyperspectral Identification Method of Typical Sedimentary Rocks in Lufeng Dinosaur Valley[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(09): 2855-2861. |
[2] |
LI Quan-lun1, CHEN Zheng-guang1*, JIAO Feng2. Prediction of Oil Content in Oil Shale by Near-Infrared Spectroscopy Based on Stacking Ensemble Learning[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(04): 1030-1036. |
[3] |
WANG Ming-xuan, WANG Qiao-yun*, PIAN Fei-fei, SHAN Peng, LI Zhi-gang, MA Zhen-he. Quantitative Analysis of Diabetic Blood Raman Spectroscopy Based on XGBoost[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(06): 1721-1727. |
[4] |
HAO Yong1, WANG Qi-ming1, ZHANG Shu-min2. Study on Online Detection Method of “Yali” Pear Black Heart Disease Based on Vis-Near Infrared Spectroscopy and AdaBoost Integrated Model[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(09): 2764-2769. |
[5] |
LU Wei1, CAI Miao-miao1, ZHANG Qiang2, LI Shan3. Fast Classification Method of Black Goji Berry (Lycium Ruthenicum Murr.) Based on Hyperspectral and Ensemble Learning[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(07): 2196-2204. |
[6] |
JIANG Wei-wei1,LU Chang-hua1, 2,ZHANG Yu-jun2,JU Wei3,WANG Ji-zhou4,OU Chun-sheng1*,XIAO Ming-xia1. Research on a Quantitative Regression Model of the Infrared Spectrum Based on the Integrated Learning Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(04): 1119-1124. |
[7] |
WU Peng1, 2, LI Ying1, 2*, LIU Yu2, 3, CHEN Chen1, 2, RAN Ming-qu1, 2, LI Ya-fang1, 2, ZHAO Xin-da3. Study on the Origin Information Authentication Method of Apostichopus Japonicus Based on Amino Acids[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40(09): 2809-2814. |
[8] |
GE Xiang-yu1, 2, 3, DING Jian-li1, 2, 3*, WANG Jing-zhe4, SUN Hui-lan5, ZHU Zhi-qiang6. A New Method for Predicting Soil Moisture Based on UAV Hyperspectral Image[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40(02): 602-609. |
[9] |
LI Si-hai1, ZHAO Lei2. A Variable Selection Method Based on Ensemble-SISPLS for Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(04): 1047-1052. |
|
|
|
|