|
|
|
|
|
|
Wavelength Selection Method of Near-Infrared Spectrum Based on
Random Forest Feature Importance and Interval Partial
Least Square Method |
CHEN Rui1, WANG Xue1, 2*, WANG Zi-wen1, QU Hao1, MA Tie-min1, CHEN Zheng-guang1, GAO Rui3 |
1. College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China
2. Daqing Center of Inspection and Testing for Agricultural Products and Processed Products, Ministry of Agriculture and Rural Affairs, Daqing 163319, China
3. School of Electrical and Information, Northeast Agricultural University, Harbin 150030, China
|
|
|
Abstract In the rapidly establishing quantitative analysis model of near-infrared spectroscopy, feature wavelength selection is one of the more effective methods to improve prediction accuracy. Through selecting effective information, redundant data is reduced, and the effectiveness of the data set is improved. Random Forest (RF) is an integrated algorithm. The feature importance of spectroscopy wavelength can be calculated by using RF. And the mean square error average value is used as the feature importance result based on the mean decrease accuracy (MDA) method of Out-of-Bag data (OOB). The feature variables are selected to form the feature wave subset by setting the feature importance threshold. However, there is no theoretical basis for setting the threshold range. So it is necessary to explore the range of feature importance thresholds. On the other hand, due to the random characteristics of RF, invalid or even interfering variables may be included in the characteristic wavelength subset, and the selected effectiveness variables cannot be guaranteed. Therefore, the RF-iPLS feature wavelength selection algorithm is further proposed.The feature wavelength subset is divided into intervals by interval partial least squares (iPLS), which makes up for the problem of invalid variables caused by RF randomness and redundant information by iPLS. In order to illustrate the rationality of the RF-iPLS algorithm, the RF-MC-iPLS algorithm is constructed using by Monte Carlo (MC) method. The comparison feature subset is generated after 500 samples.Although the structure of RF-iPLS is similar to that of RF-MC-iPLS, its running time is shortened by 11.12%. The results show that the feature wavelength selection of the RF-iPLS algorithm is effective and has low time complexity in the prediction model. Furthermore, to verify the algorithm’s effectiveness, RF-iPLS was applied to grain protein near-infrared spectroscopy data sets and PLSR models were established. It is compared with the full spectrum PLSR and PLSR models based on different wavelength selection methods. The results show that compared with 117 wavelength points of the full spectrum, RF-iPLS selects 12 feature wavelength points. The RMSEC of the modeling set is reduced from 2.61 to 0.64. The prediction accuracy is improved by about 75.5%. The RMSEP of the prediction set is reduced from 2.63 to 0.69, and the prediction accuracy is improved by 73.8%. The prediction accuracy and optimal prediction results show that RF-iPLS is an effective feature wavelength selection method, and it can simplify the complexity of the near-infrared spectral quantitative analysis model and achieve efficient dimensionality reduction.
|
Received: 2022-02-07
Accepted: 2022-06-16
|
|
Corresponding Authors:
WANG Xue
E-mail: mtmwx@163.com
|
|
[1] HONG Ming-jian,WEN Quan,WEN Zhi-yu(洪明坚,温 泉,温志渝). Acta Optica Sinica(光学学报),2010,(12):3637.
[2] CHU Xiao-li,CHEN Pu,LI Jing-yan,et al(褚小立,陈 瀑,李敬岩,等). Journal of Instrumental Analysis(分析测试学报),2020,39(10):1181.
[3] GUO Zhi-ming,HUANG Wen-qian,PENG Yan-kun,et al(郭志明,黄文倩,彭彦昆,等). Chinese Journal of Analytical Chemistry(分析化学),2014,42(4):513.
[4] Lee S, Choi H, Cha K, et al. Microchemical Journal, 2013, 110(7):39.
[5] Epifanio I. BMC Bioinformatics, 2017, 18(1):230.
[6] Nicodemus K K, Malley J D, Strobl C, et al. BMC Bioinformatics, 2010, 11(1):110.
[7] SONG Shu-fang,HE Ru-yang(宋述芳,何入洋). Journal of National University of Defense Technology(国防科技大学学报),2021,43(2):25.
[8] WANG Qi-bin,YANG Hui-hua,PAN Xi-peng,et al(王其滨,杨辉华,潘细朋,等). Laser and Infrared(激光与红外),2020,50(9):7.
[9] QIN Yu-hua,GONG Hui-li,SONG Nan,et al(秦玉华,宫会丽,宋 楠,等). Tobacco Science & Technology(烟草科技),2014,(6):64.
[10] FANG Kuang-nan,WU Jian-bin,ZHU Jian-ping,et al(方匡南,吴见彬,朱建平,等). Statistics & Information Forum(统计与信息论坛),2011,26(3):32.
[11] YAO Deng-ju,YANG Jing,ZHAN Xiao-juan(姚登举,杨 静,詹晓娟). Journal of Jilin University(Engineering and Technology Editon)[吉林大学学报(工学版)],2014,(1):142.
[12] HAO Yong,SUN Xu-dong,WANG Hao(郝 勇,孙旭东,王 豪). Journal of Jiangsu University(Natural Science Edition)[江苏大学学报(自然科学版)],2013,34(1):49.
[13] WANG Xue,MA Tie-min,YANG Tao,et al(王 雪,马铁民,杨 涛,等). Transactions of the Chinese Society of Agricultural Engineering(农业工程学报),2018,34(13):203.
[14] Breiman L. Machine Learning, 2001, 45(1):5.
[15] YANG Qiong-zhu,REN Peng,LONG Shuai,et al(杨琼朱,任 鹏,龙 帅,等). Journal of Analytical Science(分析科学学报),2016, 32(4):485.
[16] Wang X, Ma T M, Yang T, et al. International Journal of Agricultural and Biological Engineering, 2019, 12(2):132.
[17] MA Yue,JIANG Qi-gang,MENG Zhi-guo,et al(马 玥,姜琦刚,孟治国,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2018,38(1):181.
[18] LI Na-na,WANG Yong,ZHOU Lin,et al(李娜娜,王 勇,周 林,等). Computer Science(计算机科学),2021,48(S1):464.
[19] LI Mao-gang,YAN Chun-hua,XUE Jia,et al (李茂刚,闫春华,薛 佳,等). Chinese Journal of Analytical Chemistry(分析化学), 2019,47(12):1995.
[20] XIE Huan,CHEN Zheng-guang(谢 欢,陈争光). Chinese Journal of Analytical Chemistry(分析化学),2019,47(12):1987.
[21] Liu J, Sun S, Tan Z, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2020, 242:118718.
[22] Ridgway C, Chambers J. Journal of the Science of Food & Agriculture, 2015, 71(2):251.
|
[1] |
LIU Hao-dong1, 2, JIANG Xi-quan1, 2, NIU Hao1, 2, LIU Yu-bo1, LI Hui2, LIU Yuan2, Wei Zhang2, LI Lu-yan1, CHEN Ting1,ZHAO Yan-jie1*,NI Jia-sheng2*. Quantitative Analysis of Ethanol Based on Laser Raman Spectroscopy Normalization Method[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3820-3825. |
[2] |
LIN Hong-jian1, ZHAI Juan1*, LAI Wan-chang1, ZENG Chen-hao1, 2, ZHAO Zi-qi1, SHI Jie1, ZHOU Jin-ge1. Determination of Mn, Co, Ni in Ternary Cathode Materials With
Homologous Correction EDXRF Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3436-3444. |
[3] |
HUANG Li, MA Rui-jun*, CHEN Yu*, CAI Xiang, YAN Zhen-feng, TANG Hao, LI Yan-fen. Experimental Study on Rapid Detection of Various Organophosphorus Pesticides in Water by UV-Vis Spectroscopy and Parallel Factor Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3452-3460. |
[4] |
LI Zhong-bing1, 2, JIANG Chuan-dong2, LIANG Hai-bo3, DUAN Hong-ming2, PANG Wei2. Rough and Fine Selection Strategy Binary Gray Wolf Optimization
Algorithm for Infrared Spectral Feature Selection[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3067-3074. |
[5] |
LIU Shu1, JIN Yue1, 2, SU Piao1, 2, MIN Hong1, AN Ya-rui2, WU Xiao-hong1*. Determination of Calcium, Magnesium, Aluminium and Silicon Content in Iron Ore Using Laser-Induced Breakdown Spectroscopy Assisted by Variable Importance-Back Propagation Artificial Neural Networks[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3132-3142. |
[6] |
KONG De-ming1, LIU Ya-ru1, DU Ya-xin2, CUI Yao-yao2. Oil Film Thickness Detection Based on IRF-IVSO Wavelength Optimization Combined With LIF Technology[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(09): 2811-2817. |
[7] |
ZHAO Yu-wen1, ZHANG Ze-shuai1, ZHU Xiao-ying1, WANG Hai-xia1, 2*, LI Zheng1, 2, LU Hong-wei3, XI Meng3. Application Strategies of Surface-Enhanced Raman Spectroscopy in Simultaneous Detection of Multiple Pathogens[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(07): 2012-2018. |
[8] |
CHENG Xiao-xiang1, WU Na2, LIU Wei2*, WANG Ke-qing2, LI Chen-yuan1, CHEN Kun-long1, LI Yan-xiang1*. Research on Quantitative Model of Corrosion Products of Iron Artefacts Based on Raman Spectroscopic Imaging[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(07): 2166-2173. |
[9] |
DENG Xiao-jun1, 2, MA Jin-ge1, YANG Qiao-ling3, SHI Yi-yin1, HUO Yi-hui1, GU Shu-qing1, GUO De-hua1, DING Tao4, YU Yong-ai5, ZHANG Feng6. Visualized Fast Identification Method of Imported Olive Oil Quality Grade Based on Raman-UV-Visible Fusion Spectroscopy Technology[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(04): 1117-1125. |
[10] |
WANG Hai-ping1, 2, ZHANG Peng-fei1, XU Zhuo-pin1, CHENG Wei-min1, 3, LI Xiao-hong1, 3, ZHAN Yue1, WU Yue-jin1, WANG Qi1*. Quantitative Determination of Na and Fe in Sorghum by LIBS Combined With VDPSO-CMW Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 823-829. |
[11] |
XU Wei-xin, XIA Jing-jing, WEI Yun, CHEN Yue-yao, MAO Xin-ran, MIN Shun-geng*, XIONG Yan-mei*. Rapid Determination of Oxytetracycline Hydrochloride Illegally Added in Cattle Premix by ATR-FTIR[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 842-847. |
[12] |
ZHENG Li-na1, 2, XUAN Peng1, HUANG Jing1, LI Jia-lin1. Development and Application of Spark-Induced Breakdown Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 665-673. |
[13] |
GAO Xi-ya1, 2, 3, ZHANG Zhu-shan-ying1, 2, 3*, LU Cui-cui1, 2, 3, MENG Yong-ji1, 2, 3, CAO Hui-min1, 2, 3, ZHENG Dong-yun1, 2, 3, ZHANG Li1, 2, 3, XIE Qin-lan1, 2, 3. Quantitative Analysis of Hemoglobin Based on SiPLS-SPA
Wavelength Optimization[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(01): 50-56. |
[14] |
JIANG Xiao-gang1, ZHU Ming-wang1, YAO Jin-liang1, LI Bin1, LIAO Jun1, LIU Yan-de1*, ZHANG Jian-yi2, JING Han-song2. Research on Parameter Optimization of Apple Sugar Model Based on Near-Infrared On-Line Device[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(01): 116-121. |
[15] |
HAO Jie, DONG Fu-jia, WANG Song-lei*, LI Ya-lei, CUI Jia-rui, LIU Si-jia, LÜ Yu. Rapid Detection of Pesticide Residues on Navel Oranges by Fluorescence Hyperspectral Imaging Technology Combined With Characteristic Wavelength Selection[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(12): 3789-3796. |
|
|
|
|