Characteristic Wavelength Selection Method and Application of
Near Infrared Spectrum Based on Lasso Huber
GUO Tuo1, XU Feng-jie1, MA Jin-fang2*, XIAO Huan-xian3
1. College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an 710021, China
2. Department of Optoelectronics, Jinan University, Guangzhou 510632, China
3. Jiangxi Baoli Pharmaceutical Co., Ltd., Ganzhou 341900, China
Abstract:In near-infrared spectroscopy ( NIRS ) wavelength screening, selecting characteristic wavelengths is challenging problem when the number of variables is much larger than the sample size. Lasso and Elastic Net algorithms are used for variable selection for large-dimensional small-sample data, but both use the least square error to measure the loss function to select characteristic variables. Therefore, when the sample contains outliers, the model established using Lasso or Elastic Net algorithms is more sensitive to outliers, resulting in the model shifting to outliers and reduced robustness. Because of the above problems, this paper uses the Huber function as the loss function and proposes the Lasso-Huber wavelength selection method for near-infrared characteristic wavelength selection. Combined with the partial least squares ( PLS ) method, the quantitative correction model of the quality control index components of Antai pills is established and compared with the model performance of full wavelength modeling, Lasso and Elastic-Net method wavelength selection. In this experiment, 116 NIRS data from 21 batches of Antai Pills were collected, of which 101 data were used as calibration sets. The model was internally verified by the leave-one-out cross-validation method, and the other 15 data were used as validation sets for external verification. The Mahalanobis distance method ( MD ) based on principal component analysis ( PCA ) was used for detection for outliers in the calibration set. Taking ferulic acid, one of the quality control index components of Antai pills, as an example, Lasso, Elastic-Net and Lasso-Huber methods were used to screen 69, 155 and 87 characteristic wavelength points in the normal spectra of Antai pill samples. The prediction model established by the Lasso-Huber method combined with PLS was the best, and the R2p and SEP of the prediction set were 0.953 1 and 0.058 7. In addition, the Lasso-Huber method was found to be more advantageous in modeling with outliers by comparing the prediction performance of calibration models normal spectra and outliers in the calibration set. The results showed that the optimal number of wavelength points selected by the Lasso-Huber algorithm was 88, and the performance R2v of the model combined with PLS was 0.967 3, while the R2v of the Lasso method is 0.840 5, the R2v of the Elastic-Net method was 0.834 7, the of the full wavelength modeling is 0.852 0. It can be seen that in the samples with outliers, the Lasso-Huber method not only reduces the number of characteristic bands but also reduces the algorithm's sensitivity to outliers, improving the accuracy and robustness of the model. From the perspective of the simplified model, the modeling time of Lasso and Elastic-Net is 61.826 0 and 79.959 9 s, while the modeling time of Lasso-Huber is only 1.360 8 s. Therefore, the algorithm is expected to be integrated into the near-infrared spectroscopy modeling software for practical production applications in the future.
Key words:Near-infrared spectroscopy; Wavelength selection; Large dimension and small sample; Quantitative calibration model; Lasso-Huber
郭 拓,徐凤捷,马晋芳,肖环贤. 基于Lasso-Huber的近红外光谱特征波长选择方法及应用[J]. 光谱学与光谱分析, 2024, 44(03): 737-743.
GUO Tuo, XU Feng-jie, MA Jin-fang, XIAO Huan-xian. Characteristic Wavelength Selection Method and Application of
Near Infrared Spectrum Based on Lasso Huber. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(03): 737-743.
[1] LIU Yan-yun, HU Chang-qin(柳艳云,胡昌勤). Journal of Pharmaceutical Analysis(药物分析杂志), 2010, 30(5): 968.
[2] Yun Y H, Li H D, Deng B C, et al. TrAC Trends in Analytical Chemistry, 2019, 113: 102.
[3] LIU Xue-song, ZHANG Si-yu, ZHAO Man-qian, et al(刘雪松,张丝雨,赵曼茜,等). Acta Pharmacologica Sinica(药学学报), 2019, 54(1): 138.
[4] ZHANG Juan, YUAN Shuai, ZHANG Jun(张 娟,原 帅,张 骏). Journal of Analytical Science(分析科学学报), 2020, 36(1): 111.
[5] SHI Ji-yong, ZOU Xiao-bo, ZHAO Jie-wen, et al(石吉勇,邹小波,赵杰文,等). Journal of Infrared and Millimeter Waves(红外与毫米波学报), 2011, 30(5): 458.
[6] LIANG Long, FANG Gui-gan, WU Ting, et al(梁 龙,房桂干,吴 珽,等). Journal of Instrumental Analysis(分析测试学报), 2016, 35(1): 101.
[7] ZHANG Su-lan, HUANG Jin-long, QIN Lin, et al(张素兰,黄金龙,秦 林,等). Transactions of the Chinese Society for Agricultural Machinery(农业机械学报), 2019, 50(4): 196.
[8] ZHOU Meng-ran, SUN Lei, BIAN Kai, et al(周孟然,孙 磊,卞 凯,等). Laser Journal(激光杂志), 2020, 41(7): 13.
[9] Pereira Rainha K, Tristão do Carmo Rocha J, Tavares Rodrigues R R, et al. Analytical Letters, 2019, 52(18): 2914.
[10] ZHANG Jian-qiang, LIU Wei-juan, HOU Ying, et al(张建强,刘维涓,侯 英,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2018, 38(S1): 23.
[11] XU Yun-juan, LUO You-xi(许赟娟,罗幼喜). Statistics and Decision(统计与决策), 2021, 37(4): 31.
[12] WANG Kai-yi, YANG Sheng, GUO Cai-yun, et al(王恺怡,杨 盛,郭彩云,等). Journal of Instrumental Analysis(分析测试学报), 2022, 41(3): 398.
[13] MO Yun, LIANG Guo-fu, LU Zhong-wei, et al(莫 云,梁国富,路仲伟,等). Foreign Electronic Measurement Technology(国外电子测量技术), 2022, 41(5): 9.
[14] Zou H, Hastie T. Journal of the Royal Statistical Society. Series B (Methodological), 2005, 67(1): 301.
[15] LI Xia, LIU Chao(李 霞,刘 超). Statistics and Decision(统计与决策), 2008, (5): 30.
[16] Sun Q, Zhou W X, Fan J. Journal of the American Statistical Association, 2020, 115(529): 254.
[17] MA Jin-fang, WANG Xue-li, XIAO Xue, et al(马晋芳,王雪利,肖 雪,等). Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology(世界科学技术-中医药现代化), 2018, 20(5): 651.