|
|
|
|
|
|
Characteristic Wavelength Selection Method and Application of
Near Infrared Spectrum Based on Lasso Huber |
GUO Tuo1, XU Feng-jie1, MA Jin-fang2*, XIAO Huan-xian3 |
1. College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an 710021, China
2. Department of Optoelectronics, Jinan University, Guangzhou 510632, China
3. Jiangxi Baoli Pharmaceutical Co., Ltd., Ganzhou 341900, China
|
|
|
Abstract In near-infrared spectroscopy ( NIRS ) wavelength screening, selecting characteristic wavelengths is challenging problem when the number of variables is much larger than the sample size. Lasso and Elastic Net algorithms are used for variable selection for large-dimensional small-sample data, but both use the least square error to measure the loss function to select characteristic variables. Therefore, when the sample contains outliers, the model established using Lasso or Elastic Net algorithms is more sensitive to outliers, resulting in the model shifting to outliers and reduced robustness. Because of the above problems, this paper uses the Huber function as the loss function and proposes the Lasso-Huber wavelength selection method for near-infrared characteristic wavelength selection. Combined with the partial least squares ( PLS ) method, the quantitative correction model of the quality control index components of Antai pills is established and compared with the model performance of full wavelength modeling, Lasso and Elastic-Net method wavelength selection. In this experiment, 116 NIRS data from 21 batches of Antai Pills were collected, of which 101 data were used as calibration sets. The model was internally verified by the leave-one-out cross-validation method, and the other 15 data were used as validation sets for external verification. The Mahalanobis distance method ( MD ) based on principal component analysis ( PCA ) was used for detection for outliers in the calibration set. Taking ferulic acid, one of the quality control index components of Antai pills, as an example, Lasso, Elastic-Net and Lasso-Huber methods were used to screen 69, 155 and 87 characteristic wavelength points in the normal spectra of Antai pill samples. The prediction model established by the Lasso-Huber method combined with PLS was the best, and the R2p and SEP of the prediction set were 0.953 1 and 0.058 7. In addition, the Lasso-Huber method was found to be more advantageous in modeling with outliers by comparing the prediction performance of calibration models normal spectra and outliers in the calibration set. The results showed that the optimal number of wavelength points selected by the Lasso-Huber algorithm was 88, and the performance R2v of the model combined with PLS was 0.967 3, while the R2v of the Lasso method is 0.840 5, the R2v of the Elastic-Net method was 0.834 7, the of the full wavelength modeling is 0.852 0. It can be seen that in the samples with outliers, the Lasso-Huber method not only reduces the number of characteristic bands but also reduces the algorithm's sensitivity to outliers, improving the accuracy and robustness of the model. From the perspective of the simplified model, the modeling time of Lasso and Elastic-Net is 61.826 0 and 79.959 9 s, while the modeling time of Lasso-Huber is only 1.360 8 s. Therefore, the algorithm is expected to be integrated into the near-infrared spectroscopy modeling software for practical production applications in the future.
|
Received: 2022-07-10
Accepted: 2022-12-15
|
|
Corresponding Authors:
MA Jin-fang
E-mail: majf0351@126.com
|
|
[1] LIU Yan-yun, HU Chang-qin(柳艳云,胡昌勤). Journal of Pharmaceutical Analysis(药物分析杂志), 2010, 30(5): 968.
[2] Yun Y H, Li H D, Deng B C, et al. TrAC Trends in Analytical Chemistry, 2019, 113: 102.
[3] LIU Xue-song, ZHANG Si-yu, ZHAO Man-qian, et al(刘雪松,张丝雨,赵曼茜,等). Acta Pharmacologica Sinica(药学学报), 2019, 54(1): 138.
[4] ZHANG Juan, YUAN Shuai, ZHANG Jun(张 娟,原 帅,张 骏). Journal of Analytical Science(分析科学学报), 2020, 36(1): 111.
[5] SHI Ji-yong, ZOU Xiao-bo, ZHAO Jie-wen, et al(石吉勇,邹小波,赵杰文,等). Journal of Infrared and Millimeter Waves(红外与毫米波学报), 2011, 30(5): 458.
[6] LIANG Long, FANG Gui-gan, WU Ting, et al(梁 龙,房桂干,吴 珽,等). Journal of Instrumental Analysis(分析测试学报), 2016, 35(1): 101.
[7] ZHANG Su-lan, HUANG Jin-long, QIN Lin, et al(张素兰,黄金龙,秦 林,等). Transactions of the Chinese Society for Agricultural Machinery(农业机械学报), 2019, 50(4): 196.
[8] ZHOU Meng-ran, SUN Lei, BIAN Kai, et al(周孟然,孙 磊,卞 凯,等). Laser Journal(激光杂志), 2020, 41(7): 13.
[9] Pereira Rainha K, Tristão do Carmo Rocha J, Tavares Rodrigues R R, et al. Analytical Letters, 2019, 52(18): 2914.
[10] ZHANG Jian-qiang, LIU Wei-juan, HOU Ying, et al(张建强,刘维涓,侯 英,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2018, 38(S1): 23.
[11] XU Yun-juan, LUO You-xi(许赟娟,罗幼喜). Statistics and Decision(统计与决策), 2021, 37(4): 31.
[12] WANG Kai-yi, YANG Sheng, GUO Cai-yun, et al(王恺怡,杨 盛,郭彩云,等). Journal of Instrumental Analysis(分析测试学报), 2022, 41(3): 398.
[13] MO Yun, LIANG Guo-fu, LU Zhong-wei, et al(莫 云,梁国富,路仲伟,等). Foreign Electronic Measurement Technology(国外电子测量技术), 2022, 41(5): 9.
[14] Zou H, Hastie T. Journal of the Royal Statistical Society. Series B (Methodological), 2005, 67(1): 301.
[15] LI Xia, LIU Chao(李 霞,刘 超). Statistics and Decision(统计与决策), 2008, (5): 30.
[16] Sun Q, Zhou W X, Fan J. Journal of the American Statistical Association, 2020, 115(529): 254.
[17] MA Jin-fang, WANG Xue-li, XIAO Xue, et al(马晋芳,王雪利,肖 雪,等). Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology(世界科学技术-中医药现代化), 2018, 20(5): 651.
|
[1] |
MENG Qi1, 3, ZHAO Peng2, HUAN Ke-wei2, LI Ye2, JIANG Zhi-xia1, 3, ZHANG Han-wen2, ZHOU Lin-hua1, 3*. Non-Invasive Blood Glucose Measurement Based on Near-Infrared
Spectroscopy Combined With Label Sensitivity Algorithm and
Support Vector Machine[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(03): 617-624. |
[2] |
TANG Jie1, LUO Yan-bo2, LI Xiang-yu2, CHEN Yun-can1, WANG Peng1, LU Tian3, JI Xiao-bo4, PANG Yong-qiang2*, ZHU Li-jun1*. Study on One-Dimensional Convolutional Neural Network Model Based on Near-Infrared Spectroscopy Data[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(03): 731-736. |
[3] |
LIU Tao, LI Bo, XIA Rui*, LI Rui, WANG Xue-wen. Study on Coal and Gangue Recognition by Vis-NIR Spectroscopy Under Different Working Conditions[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(03): 821-828. |
[4] |
ZHANG Zhong-xiong1, 2, 3, LIU Hao-ling1, 3, WEI Zi-chao1, 2, PU Yu-ge1, 3, ZHANG Zuo-jing1, 2, 3, ZHAO Juan1, 2, 3*, HU Jin1, 2, 3*. Comparison of Different Detection Modes of Visible/Near-Infrared
Spectroscopy for Detecting Moldy Apple Core[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(03): 883-890. |
[5] |
LIU Zhao-hai1, AN Xin-chen1, 3, TAO Zhi1, 2, LIU Xiang1, 2*. Multicomponent Trace Gas Detecting and Identifying System Based on MEMS-FPI on-Chip Spectral Device[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(02): 359-366. |
[6] |
SUN He-yang1, ZHOU Yue1, 2, LI Si-jia1, 2, LI Li1, YAN Ling-tong1, FENG Xiang-qian1*. Identification of Ancient Ceramic by Convolution Neural Network[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(02): 354-358. |
[7] |
ZHANG Wei-gang, PAN Lu-lu, LÜ Dan-dan. Study on Near-Infrared Spectroscopy, Mechanics and Salt Water
Resistance of Epoxy Resin-Based Near-Infrared Absorbing Coatings[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(02): 439-445. |
[8] |
SONG Ge1, 2, KONG Xiang-shi3*. Spectroscopic Characteristics of Soil Humus Components Extracted With Acetone Hydrochloric Acid Mixture[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(02): 474-479. |
[9] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[10] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[11] |
LI Wei1, TAN Feng2*, ZHANG Wei1, GAO Lu-si3, LI Jin-shan4. Application of Improved Random Frog Algorithm in Fast Identification of Soybean Varieties[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3763-3769. |
[12] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[13] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[14] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[15] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
|
|
|
|