|
|
|
|
|
|
A Combinatorial Optimization Strategy for Near-Infrared Spectral Data Preprocessing |
ZHOU Yu-kun, CHEN Xiao-jing, XIE Zhong-hao, SHI Wen*, YUAN Lei-ming, CHEN Xi, HUANG Guang-zao |
School of Electrical and Electronic Engineering, Wenzhou University, Wenzhou 325000, China
|
|
|
Abstract Preprocessing is an important step in constructing a near-infrared (NIR) spectroscopy detection model, which significantly affects the accuracy of the detection process. Various preprocessing methods are available, each designed to address specific types of noise and irrelevant information, thereby improving the signal-to-noise ratio. Optimizing the preprocessing combination is essential for achieving the desired model results. This study proposes a strategy for the combinatorial optimization of pre-processing methods for the calibration of near-infrared spectroscopy models, which includes selecting eight commonly used preprocessing methods to establish a library of preprocessing methods, building a quantitative model using the partial least squares (PLS) method.Then selecting preprocessing combinations from the library that have an excellent calibration capability for the model simply and efficiently, using the root-mean-square-error-of-cross-verification of the model (RMSECV) as an iterative criterion.The strategy's structural design employs the greedy algorithm for optimization and achieves global optimization by searching for the optimal preprocessing method at each step. This enables the selection of preprocessing combinations for spectral data to be completed simply and efficiently. Tests were conducted on publicly available datasets such as wheat and meat, and the proposed strategy was compared with a similar stacked strategy (Stacked) and sequential orthogonal fusion of multi-block data strategy (SPORT). The results show that on the wheat dataset, the proposed strategy reduced the root mean square error of calibration (RMSEC) by 12%, 6%, and the root mean square error of prediction (RMSEP) by 32%, 17% compared to the Stacked and SPORT strategies, respectively. On the meat dataset, the proposed strategy reduced the RMSEC compared to the Stacked and SPORT strategies by 49% and 48%, and RMSEP was reduced by 46% and 41%, respectively.These results demonstrate good calibration performance.Finally, this analysis examines the contribution of the preprocessing methods selected by the strategy in model calibration. It also discusses the strategy's potential in terms of model interpretability and prevention of overfitting. The strategy presents a new approach to selecting preprocessing methods for NIR spectroscopy.
|
Received: 2024-01-18
Accepted: 2024-05-17
|
|
Corresponding Authors:
SHI Wen
E-mail: shiwen@wzu.edu.cn
|
|
[1] Prieto N, Pawluczyk O, Dugan M E R, et al. Applied Spectroscopy, 2017, 71(7): 1403.
[2] Kandpal L M, Tewari J, Gopinathan N, et al. Infrared Physics & Technology, 2017, 85: 300.
[3] Tao L Y, Via B, Wu Y J, et al. Vibrational Spectroscopy, 2019, 102: 31.
[4] Geladi P. Spectrochimica Acta Part B, 2003, 58 (5): 767.
[5] Bro R. Analytica Chimica Acta, 2003, 500(1-2): 185.
[6] Rinnan Å, Van Den Berg F, Engelsen S B. Trends in Analytical Chemistry, 2009, 28(10): 1201.
[7] CHU Xiao-li, CHEN Pu, LI Jing-yan, et al(褚小立, 陈 瀑, 李敬岩, 等). Journal of Instrumental Analysis(分析测试学报), 2020, 39(10): 1181.
[8] Torniainen J, Afara I O, Prakash M, et al. Analytica Chimica Acta, 2020, 1108: 1.
[9] Mishra P, Verkleij T, Klont R. Infrared Physics & Technology, 2021, 113: 103643.
[10] Mishra P, Nordon A, Roger J M. Journal of Pharmaceutical and Biomedical Analysis, 2021, 192: 113684.
[11] DIWU Peng-yao,BIAN Xi-hui, WANG Zi-fang, et al(第五鹏瑶, 卞希慧, 王姿方, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2019, 39(9): 2800.
[12] CHU Xiao-li, YUAN Hong-fu, LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Progress in Chemistry(化学进展), 2004, 16(4): 528.
[13] Martyna A, Menzyk A, Damin A, et al. Chemometrics and Intelligent Laboratory Systems, 2020, 202: 104029.
[14] Xu L, Zhou Y P, Tang L J, et al. Analytica Chimica Acta, 2008, 616(2): 138.
[15] Mishra P, Biancolillo A, Roger J M, et al. Trends in Analytical Chemistry, 2020, 132: 116045.
[16] Roger J M, Biancolillo A, Marini F. Chemometrics and Intelligent Laboratory Systems, 2020, 199: 103975.
[17] Mishra P, Roger J M, Marini F, et al. Chemometrics and Intelligent Laboratory Systems, 2022, 222: 104497.
[18] Xie Z H, Feng X, Chen X J. Chemometrics and Intelligent Laboratory Systems, 2022, 221: 104486.
[19] Pedersen D K, MartensH, Nielsen J P, et al. Applied Spectroscopy, 2002, 56(9): 1206.
[20] Borggaard C, Thodberg H H. Analytical Chemistry, 1992, 64(5): 545.
[21] Xie Z H, Feng X A, Li L M, et al. Journal of Chemometrics, 2022, 36 (8): e3433.
[22] WANG Zhu, LÜ Cui-cui, CHEN Jian-hui(王 翥, 吕翠翠, 陈建辉). Application Research of Computers(计算机应用研究), 2014, 31(2): 485.
[23] Xie Z H, Feng X A, Chen X J. Knowledge-Based Systems, 2022, 245: 108661.
[24] Xie Z H, Chen X J, Roger J-M, et al. Analytica Chimica Acta, 2024, 1298: 342404.
[25] Engel J, Gerretzen J, Szymańska E, et al. Trends in Analytical Chemistry, 2013, 50: 96. |
[1] |
WANG Shuo1, 2, XIE Zhen-kun1, 2, WEI Zhi-peng1*. DMD-Based Hadamard Transform Near-Infrared Spectrometer Design and Implementation of Fast Processing System[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(01): 133-138. |
[2] |
CHEN Xu, CAO Si-heng, YANG Ren-min, CHEN Qiu-yu, LI Jian-guo, XU Lu*. Using Spectroscopy to Predict Soil Properties on Coastal Wetlands Invaded by Spartina Alterniflora[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(01): 197-203. |
[3] |
LI Rong1, 2, HAO Lu4, YUAN Hong-fu4, HE Gui-mei1, 2, DENG Tian-long1, DU Biao4, 5, GONG Li4, YUE Xin2, 3*. An Evaluation Method of Quantitative Analysis Software for Near-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(01): 213-221. |
[4] |
LI Ling-qiao1, WANG Zhuo-jian1, CHEN Jiang-hai1, LU Feng1, HUANG Dian-gui2, YANG Hui-hua3, LI Quan2*. A Model Transfer Method Based on Transfer Component Analysis and
Direct Correction[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(12): 3399-3405. |
[5] |
ZHUANG Peng-yan1, NIU Jia-shun1, CHENG Jun3, LU Jing-yi1, SUN Jian-ping1*, HE Tuo2*. Spectral Recognition of Sandalwood Based on Peak and Valley Feature
Extraction Technique[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(12): 3463-3472. |
[6] |
XU Jing-yu1, BAO Ni-sha1, 2*, LANG Jie-shuang3, LIU Shan-jun1, 2, MAO Ya-chun1, 2, HE Li-ming1, 2. A Hyperspectral Recognition Method for Camouflaged Targets Based on Background Dictionary Sparse Representation[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(12): 3534-3542. |
[7] |
HE Shuai, ZHOU Jie, ZHANG Fu-lin, MU Guo-qing*. Moisture Content Online Detection in Fluidized Bed Drying Process Based on Near Infrared Spectroscopy and XGBoost[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(12): 3347-3352. |
[8] |
TANG Yan1, 3, WU Jia1, XU Jian-jie2*, GUO Teng-xiao2, HU Jian-bo1, 4, ZHANG Hang4, LIU Yong-gang5*, YANG Yun-fan4. Analysis of Near-Infrared Anharmonic Vibration Spectra of Amino Acids
Using Density Functional Theory[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3149-3156. |
[9] |
YU Xin-ran1, 3, ZHAO Peng2, HUAN Ke-wei2, LI Ye2, JIANG Zhi-xia1, 3, ZHOU Lin-hua1, 3*. Research on Intelligent Algorithm of Near-Infrared Spectroscopy
Non-Invasive Detection Based on GA-SVR Method[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3020-3028. |
[10] |
WANG Hong-en, FENG Guo-hong*, XU Hua-dong, ZHANG Run-ze. Identification of Blueberry Ripeness Based on Visible-Near Infrared
Spectroscopy and Deep Forest[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3280-3286. |
[11] |
ZHAO Gao-kun1, LI Jia-chen2, WU Yu-ping1*, LI Jun-hui2, KONG Guang-hui1, ZHANG Guang-hai1, YAO Heng1, LI Wei1, GAO Yan-lan1. Application of Near-Infrared Spectroscopy to Analyze the Similarity of Cigar Tobacco From Different Origins[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3195-3198. |
[12] |
WANG Xue1, 2, 4, WANG Zi-wen1, ZHANG Guang-yue1, MA Tie-min1, CHEN Zheng-guang1, YI Shu-juan3, 4, WANG Chang-yuan2. A Universal Model for Quantitative Analysis of Near-Infrared
Spectroscopy Based on Transfer Component Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3213-3221. |
[13] |
GUO Hong-xu1, WANG Long1, YANG Kai1, WU Fan1, DENG Yi-rong2, TANG Chang-cheng1, CHEN Zhi-liang3*, XIAO Rong-bo1*. Research on Combination Optimization of Hyperspectral Inversion Model for Soil Cr Contamination[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3273-3279. |
[14] |
MAO Li-yu1, 2, BIN Bin1*, ZHANG Hong-ming2*, LÜ Bo2, 3*, GONG Xue-yu1, YIN Xiang-hui1, SHEN Yong-cai4, FU Jia2, WANG Fu-di2, HU Kui5, SUN Bo2, FAN Yu2, ZENG Chao2, JI Hua-jian2, 3, LIN Zi-chao2, 3. Development of Wheat Component Detector Based on Near Infrared
Spectrum[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(10): 2768-2777. |
[15] |
JIANG Xiao-gang1, 2, HE Cong1, 2, JIANG Nan3, LI Li-sha1, ZHU Ming-wang1, LIU Yan-de1, 2*. Discrimination of Apple Origin and Prediction of SSC Based on
Multi-Model Decision Fusion[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(10): 2812-2818. |
|
|
|
|