Abstract:Preprocessing is an important step in constructing a near-infrared (NIR) spectroscopy detection model, which significantly affects the accuracy of the detection process. Various preprocessing methods are available, each designed to address specific types of noise and irrelevant information, thereby improving the signal-to-noise ratio. Optimizing the preprocessing combination is essential for achieving the desired model results. This study proposes a strategy for the combinatorial optimization of pre-processing methods for the calibration of near-infrared spectroscopy models, which includes selecting eight commonly used preprocessing methods to establish a library of preprocessing methods, building a quantitative model using the partial least squares (PLS) method.Then selecting preprocessing combinations from the library that have an excellent calibration capability for the model simply and efficiently, using the root-mean-square-error-of-cross-verification of the model (RMSECV) as an iterative criterion.The strategy's structural design employs the greedy algorithm for optimization and achieves global optimization by searching for the optimal preprocessing method at each step. This enables the selection of preprocessing combinations for spectral data to be completed simply and efficiently. Tests were conducted on publicly available datasets such as wheat and meat, and the proposed strategy was compared with a similar stacked strategy (Stacked) and sequential orthogonal fusion of multi-block data strategy (SPORT). The results show that on the wheat dataset, the proposed strategy reduced the root mean square error of calibration (RMSEC) by 12%, 6%, and the root mean square error of prediction (RMSEP) by 32%, 17% compared to the Stacked and SPORT strategies, respectively. On the meat dataset, the proposed strategy reduced the RMSEC compared to the Stacked and SPORT strategies by 49% and 48%, and RMSEP was reduced by 46% and 41%, respectively.These results demonstrate good calibration performance.Finally, this analysis examines the contribution of the preprocessing methods selected by the strategy in model calibration. It also discusses the strategy's potential in terms of model interpretability and prevention of overfitting. The strategy presents a new approach to selecting preprocessing methods for NIR spectroscopy.
[1] Prieto N, Pawluczyk O, Dugan M E R, et al. Applied Spectroscopy, 2017, 71(7): 1403.
[2] Kandpal L M, Tewari J, Gopinathan N, et al. Infrared Physics & Technology, 2017, 85: 300.
[3] Tao L Y, Via B, Wu Y J, et al. Vibrational Spectroscopy, 2019, 102: 31.
[4] Geladi P. Spectrochimica Acta Part B, 2003, 58 (5): 767.
[5] Bro R. Analytica Chimica Acta, 2003, 500(1-2): 185.
[6] Rinnan Å, Van Den Berg F, Engelsen S B. Trends in Analytical Chemistry, 2009, 28(10): 1201.
[7] CHU Xiao-li, CHEN Pu, LI Jing-yan, et al(褚小立, 陈 瀑, 李敬岩, 等). Journal of Instrumental Analysis(分析测试学报), 2020, 39(10): 1181.
[8] Torniainen J, Afara I O, Prakash M, et al. Analytica Chimica Acta, 2020, 1108: 1.
[9] Mishra P, Verkleij T, Klont R. Infrared Physics & Technology, 2021, 113: 103643.
[10] Mishra P, Nordon A, Roger J M. Journal of Pharmaceutical and Biomedical Analysis, 2021, 192: 113684.
[11] DIWU Peng-yao,BIAN Xi-hui, WANG Zi-fang, et al(第五鹏瑶, 卞希慧, 王姿方, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2019, 39(9): 2800.
[12] CHU Xiao-li, YUAN Hong-fu, LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Progress in Chemistry(化学进展), 2004, 16(4): 528.
[13] Martyna A, Menzyk A, Damin A, et al. Chemometrics and Intelligent Laboratory Systems, 2020, 202: 104029.
[14] Xu L, Zhou Y P, Tang L J, et al. Analytica Chimica Acta, 2008, 616(2): 138.
[15] Mishra P, Biancolillo A, Roger J M, et al. Trends in Analytical Chemistry, 2020, 132: 116045.
[16] Roger J M, Biancolillo A, Marini F. Chemometrics and Intelligent Laboratory Systems, 2020, 199: 103975.
[17] Mishra P, Roger J M, Marini F, et al. Chemometrics and Intelligent Laboratory Systems, 2022, 222: 104497.
[18] Xie Z H, Feng X, Chen X J. Chemometrics and Intelligent Laboratory Systems, 2022, 221: 104486.
[19] Pedersen D K, MartensH, Nielsen J P, et al. Applied Spectroscopy, 2002, 56(9): 1206.
[20] Borggaard C, Thodberg H H. Analytical Chemistry, 1992, 64(5): 545.
[21] Xie Z H, Feng X A, Li L M, et al. Journal of Chemometrics, 2022, 36 (8): e3433.
[22] WANG Zhu, LÜ Cui-cui, CHEN Jian-hui(王 翥, 吕翠翠, 陈建辉). Application Research of Computers(计算机应用研究), 2014, 31(2): 485.
[23] Xie Z H, Feng X A, Chen X J. Knowledge-Based Systems, 2022, 245: 108661.
[24] Xie Z H, Chen X J, Roger J-M, et al. Analytica Chimica Acta, 2024, 1298: 342404.
[25] Engel J, Gerretzen J, Szymańska E, et al. Trends in Analytical Chemistry, 2013, 50: 96.