|
|
|
|
|
|
Variable Selection Method of NIR Spectroscopy Based on Least Angle Regression and GA-PLS |
YAN Sheng-ke1, YANG Hui-hua1, 2*, HU Bai-chao1, REN Chao-chao1, LIU Zhen-bing1 |
1. College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China
2. College of Automation, Beijing University of Posts & Telecommunications, Beijing 100876, China |
|
|
Abstract Near infrared (NIR) spectra usually have many wavelength variables. Direct or indirect variable selection is crucial to improve the stability and prediction performance of a model. Least angle regression (LAR) is a relatively new and efficient machine learning algorithm for regression analysis and variable selection. By combining LAR and genetic algorithm-partial least square (GA-PLS) algorithm, a wavelength selection method is proposed in this paper for spectral modeling applications, which can effectively screen a few wavelength points. Firstly, LAR is used to eliminate the multiple-collinearity among variables in the full spectrum region and get a reduced set of features, then GA-PLS is introduced to select the variables from the reduced set of features to achieve the purpose of further dimension reduction. In order to verify the validity of it, the method is carried out by making regression analysis on the NIR spectroscopy of tablets and gasoline. The pre-processing results of original spectra are used to select the variables and have modeled on the content of active ingredients (Tablets) and C10 (Gasoline). As a result, the optimal number of variables are just 7 in both of applications, and the predictive decision coefficient (R2p) reached 0.933 9 and 0.951 9 respectively. Moreover, by comparing with the full spectrum, elimination of uninformative variables (UVE) and successive projection algorithm (SPA) model, the result shows that this method needs less wavelength points and have more excellent in R2p and root mean square error of predication (RMSEP). Therefore, LAR combined with GA-PLS not only can picks out information variables from NIR spectroscopy to reduce the variable number for modeling and improve the prediction accuracy, but also has a better interpretation of the model. The method can provide as effective wavelength selection tool for designing of special spectrometer in particular area.
|
Received: 2016-03-09
Accepted: 2016-07-24
|
|
Corresponding Authors:
YANG Hui-hua
E-mail: 13718680586@139.com
|
|
[1] CHU Xiao-li, LU Wan-zhen(褚小立, 陆婉珍). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2014, 34(10): 2595.
[2] Centner V, Massart D L, Noord O E, et al. Analytical Chemistry, 1996, 68(21): 3851.
[3] Araújo M C U, Saldanha T C B, Galvo R K H, et al. Chemometrics & Intelligent Laboratory Systems, 2001, 57(2): 65.
[4] Nrgaard L, Saudland A, Wagner J, et al. Applied Spectroscopy, 2000, 54(3): 413.
[5] Efron B, Hastie T, Johnstone I, et al. The Annals of Statistics, 2004, 32(2): 407.
[6] Liu Cong, Simon X Yang, Deng Lie. Expert Systems with Applications, 2015, 42(22): 8497.
[7] Zhang Long, Li Kang. Automatica, 2015, 53: 94.
[8] Leardi R. Journal of Chemometrics, 2000, 14(5-6): 643.
[9] Liu Huan, Motoda H, Setiono R, et al. Hyderabad, 2010, 10: 4.
[10] Norris K H, Ritchie G E. Journal of Pharmaceutical & Biomedical Analysis, 2008, 48(3): 1037.
[11] CHU Xiao-li(褚小立). Molecular Spectroscopy Analytical Technology Combined with Chemometrics and Its Applications(化学计量学方法与分子光谱分析技术). Beijing: Chemical Industry Press(北京:化学工业出版社), 2011.
[12] LU Wan-zhen(陆婉珍). Modern Near Infrared Spectroscopy Analytical Technology(现代近红外光谱分析技术). Beijing: China Petrochemical Pres(北京:中国石化出版社), 2007.
[13] SONG Hao(宋 浩). China Petroleum and Chemical Standard and Quality(中国石油和化工标准与质量), 2013, (9): 15.
|
[1] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[2] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[3] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[4] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[5] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[6] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[7] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[8] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[9] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[10] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[11] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[12] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
[13] |
LIU Bo-yang1, GAO An-ping1*, YANG Jian1, GAO Yong-liang1, BAI Peng1, Teri-gele1, MA Li-jun1, ZHAO San-jun1, LI Xue-jing1, ZHANG Hui-ping1, KANG Jun-wei1, LI Hui1, WANG Hui1, YANG Si2, LI Chen-xi2, LIU Rong2. Research on Non-Targeted Abnormal Milk Identification Method Based on Mid-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3009-3014. |
[14] |
HUANG Hua1, LIU Ya2, KUERBANGULI·Dulikun1, ZENG Fan-lin1, MAYIRAN·Maimaiti1, AWAGULI·Maimaiti1, MAIDINUERHAN·Aizezi1, GUO Jun-xian3*. Ensemble Learning Model Incorporating Fractional Differential and
PIMP-RF Algorithm to Predict Soluble Solids Content of Apples
During Maturing Period[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3059-3066. |
[15] |
CHEN Jia-wei1, 2, ZHOU De-qiang1, 2*, CUI Chen-hao3, REN Zhi-jun1, ZUO Wen-juan1. Prediction Model of Farinograph Characteristics of Wheat Flour Based on Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3089-3097. |
|
|
|
|