|
|
|
|
|
|
A Variable Selection Method of the Selectivity Ratio Competitive Model Population Analysis for Near Infrared Spectroscopy |
WANG Yu-xi1, JIA Zhen-hong1*, YANG Jie2, Nikola K Kasabov3 |
1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
2. Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200240, China
3. Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland 1020, New Zealand |
|
|
Abstract Spectral analysis is an important application of chemometrics and has been widely used in various fields. Spectral variable selection is a key part of spectral analysis. Therefore, it is critical to study different variable selection methods to objectively identify useful information variables or eliminate irrelevant and interfering variables. In our study, a new variable selection method of the selectivity ratio competitive population analysis (SRCMPA) is proposed. This algorithm adopts the idea of selection ratio, adaptive weighted sampling and model population analysis, and combines the method of variable arrangement and exponential decline function. The key wavelength is defined as the wavelength with a high score value in the regression model. In this paper, the score value of the selection ratio under the PLS model is used as an index to evaluate the importance of each wavelength. Then, according to the importance of each wavelength, SRCMPA sequentially selects N wavelength subsets from Monte Carlo sampling, and runs in an iterative and competitive manner. In each sampling operation, the PLS model is built with a fixed ratio samples and the selection ratio value of each variable is calculated. Based on the score value of the ranking selection ratio and the normalized SR (selection ratio) score value as the weight, the key variables are selected by two steps: the compulsory selection of exponential decline function and the competitive selection of adaptive weighted sampling. Finally, cross validation (CV) method is applied to select the optimal subset with the lowest cross validation mean square root (RMSECV). The algorithm has been tested on wheat protein data set and beer data set, and compared with three efficient algorithms. Through the experimental results to evaluate the superiority of the algorithm, this algorithm can find the best combination of the key wavelength variables of the data set, and can be used to explain the chemical characteristics of interest, the evaluation results after modeling are also the best. Compared with the PLS model of full-spectrum beer data set, the number of variables in this algorithm has been reduced from 567 to about 42. And the RMSECV of model decreased from 0.622 to 0.115, RMSEP decreased from 0.823 to 0.363, and the prediction accuracy increased by 81.5% and 55.9%, respectively. Q2_CV and Q2_test also increased from 0.940, 0.852 to 0.994 and 0.995. For wheat protein data sets, Compared with the PLS model of full-spectrum wheat protein spectral data set, the number of variables has been reduced from 175 to about 18. And the RMSECV of the model decreased from 0.607 to 0.292, the RMSEP decreased from 0.519 to 0.234, and the prediction accuracy increased by 51.9% and 54.9%, respectively. Q2_CV and Q2_test also increased from 0.748, 0.774 to 0.931 and 0.839.
|
Received: 2019-04-08
Accepted: 2019-09-02
|
|
Corresponding Authors:
JIA Zhen-hong
E-mail: jzhh@xju.edu.cn
|
|
[1] QU Fang-fang, REN Dong, HOU Jin-jian, et al(瞿芳芳,任 东,侯金健,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2016, 36(2): 593.
[2] Zhang Ruoqiu, Zhang Feiyu, Chen Wanchao, et al. Chemometrics & Intelligent Laboratory Systems, 2018, 175: 47.
[3] Huang X, Luo Y P, Xu Q S, et al. Anal. Methods, 2017, 9(4): 672.
[4] Alfons A, Croux C, Gelper S. Computational Statistics & Data Analysis, 2016, 93(C): 421.
[5] Ge T, Wei B, Wu D, et al. Journal of Applied Spectroscopy, 2018, 85(1): 109.
[6] Ranzan C, Trierweiler L F, Hitzmann B, et al. Chemometrics and Intelligent Laboratory Systems, 2015, 142: 78.
[7] Cao H, Wang Y, Yang S, et al. Journal of Chemometrics, 2015, 29(5):289.
[8] Huang X, Luo Y P, Xu Q S, et al. Anal. Methods, 2017, 9(4): 672.
[9] Farrés Mireia, Platikanov S, Tsakovski S, et al. Journal of Chemometrics, 2015, 29(10): 528.
[10] Li C, Zhao T, Li C, et al. Food Chemistry, 2017, 221: 990.
[11] Bin J, Ai F, Fan W, et al. Chemometrics & Intelligent Laboratory Systems, 2016, 158: 1.
[12] Wang Y, Jiang F, Gupta B B, et al. IEEE Access, 2017, (99): 1.
[13] Deng B C, Yun Y H, Liang Y Z, et al. The Analyst, 2014, 139(19): 4836.
[14] Mahanty Biswanath. Chemometrics and Intelligent Laboratory Systems, 2018, 174: 45.
[15] ZHAO Huan,HUAN Ke-wei,SHI Xiao-guang,et al(赵 环, 宦克为, 石晓光, 等). Chinese J. Anal. Chem.(分析化学),2018,1(46): 136.
[16] Yun Y H, Wang W T, Deng B C, et al. Analytica Chimica Acta, 2015, 862: 14.
[17] Deng B C, Yun Y H, Cao D S, et al. Analytica Chimica Acta, 2016, 908: 63.
[18] Jiang H, Zhang H, Chen Q, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2015, 149: 1.
[19] Norgaard L, Saudland A, Wagner J, et al. Applied Spectroscopy, 2000, 54: 413.
[20] Wang Weiting, Yun Yonghuan, Deng Baichuan, et al. RSC Advances, 2015, 5: 95771.
[21] Farrés Mireia, Platikanov S, Tsakovski S, et al. Journal of Chemometrics, 2015, 29(10): 528. |
[1] |
LIU Bo-yang1, GAO An-ping1*, YANG Jian1, GAO Yong-liang1, BAI Peng1, Teri-gele1, MA Li-jun1, ZHAO San-jun1, LI Xue-jing1, ZHANG Hui-ping1, KANG Jun-wei1, LI Hui1, WANG Hui1, YANG Si2, LI Chen-xi2, LIU Rong2. Research on Non-Targeted Abnormal Milk Identification Method Based on Mid-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3009-3014. |
[2] |
XU Qi-lei, GUO Lu-yu, DU Kang, SHAN Bao-ming, ZHANG Fang-kun*. A Hybrid Shrinkage Strategy Based on Variable Stable Weighted for Solution Concentration Measurement in Crystallization Via ATR-FTIR Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(05): 1413-1418. |
[3] |
ZHAO Ting-ting1, 3, WANG Ke-jian1, 3*, SI Yong-sheng1, 3, SHU Ying2, HE Zhen-xue1, 3, WANG Chao1, 3, ZHANG Zhi-sheng2*. Freshness Detection of Lamb Based on AW-OPS Hyperspectral
Wavelength Selection Method[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 830-837. |
[4] |
XU Wei-xin, XIA Jing-jing, WEI Yun, CHEN Yue-yao, MAO Xin-ran, MIN Shun-geng*, XIONG Yan-mei*. Rapid Determination of Oxytetracycline Hydrochloride Illegally Added in Cattle Premix by ATR-FTIR[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 842-847. |
[5] |
ZHENG Kai-yi, SHEN Ye, ZHANG Wen, ZHOU Chen-guang, DING Fu-yuan, ZHANG Yang, ZHANG Rou-jia, SHI Ji-yong, ZOU Xiao-bo*. Interval Genetic Algorithm for Double Spectra and Its Applications in Calibration Transfer[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(12): 3783-3788. |
[6] |
GUO Yang1, GUO Jun-xian1*, SHI Yong1, LI Xue-lian1, HUANG Hua2, LIU Yan-cen1. Estimation of Leaf Moisture Content in Cantaloupe Canopy Based on
SiPLS-CARS and GA-ELM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(08): 2565-2571. |
[7] |
JIANG Xiao-yu1, 2, LI Fu-sheng2*, WANG Qing-ya1, 2, LUO Jie3, HAO Jun1, 2, XU Mu-qiang1, 2. Determination of Lead and Arsenic in Soil Samples by X Fluorescence Spectrum Combined With CARS Variables Screening Method[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(05): 1535-1540. |
[8] |
YOU Wen1, XIA Yang-peng1, HUANG Yu-tao1, LIN Jing-jun2*, LIN Xiao-mei3*. Research on Selection Method of LIBS Feature Variables Based on CART Regression Tree[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(10): 3240-3244. |
[9] |
ZHENG Kai-yi, FENG Yu-hang, ZHANG Wen, HUANG Xiao-wei, LI Zhi-hua, ZHANG Di, SHI Ji-yong, ZOU Xiao-bo*. Iterative Interval Backward Selection Algorithm and Its Application in Calibration Transfer of Near Infrared Spectra[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(06): 1789-1794. |
[10] |
ZHANG Feng1, TANG Xiao-jun1*, TONG Ang-xin1, WANG Bin1, TANG Chun-rui2, WANG Jie2. A Mid-Infrared Wavelength Selection Method Based on the Impact Value of Variables and Population Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(06): 1795-1799. |
[11] |
LI Si-hai1, LIU Dong-ling2. Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(04): 1097-1101. |
[12] |
ZHU Xiao-lin1, 2, LI Guang-hui1, 2*, ZHANG Meng1, 2. Prediction of Soluble Solid Content of Korla Pears Based on CARS-MIV[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(11): 3547-3552. |
[13] |
LI Si-hai1, ZHAO Lei2. A Variable Selection Method Based on Ensemble-SISPLS for Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(04): 1047-1052. |
[14] |
LI Si-hai1, YU Xiao-hui2, ZHAO Lei2, JIN Ling2*. Identification of Gentiana Macrophylla by FTIR Technology and Sparse Linear Discriminant Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2018, 38(08): 2390-2394. |
[15] |
GUI Ming-cheng1, ZHU Wei-hua1*, ZHU Feng2, GENG Ying3,HUA Wei-hao1, TANG Chun-mei1, ZHAO Zhi-min4*. Research on the Determination of Glucose Based on Human Serum Fluorescence Spectrum and Improved Variable Selection Strategy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2017, 37(09): 2817-2821. |
|
|
|
|