Abstract:Xylose, as a functional oligosaccharide, possesses health benefits such as antioxidant properties and promoting intestinal health, and is widely used in food, medicine, and biofuels. There is still a lack of effective rapid detection methods for xylose content. An online detection method based on near-infrared spectroscopy technology is proposed to address the issue of content detection during xylose production. Firstly, sample solutions are collected and scanned using a near-infrared spectrometer to obtain raw spectra. The raw spectra are then preprocessed using first derivative and smoothing filter methods to remove noise and baseline drift effects. Subsequently, the random frog algorithm is employed for feature selection of spectral variables, and the prediction relative analysis error is used to search for the optimal number of features. The results show that the model's predictive performance is optimal when the number of features is between 20 and 30. Considering other indicators, the number of features is selected as 25, determining the wavelength characteristics representing xylose content. Due to the random subset selection and random forest regression characteristics of the random frog algorithm, this algorithm has obvious advantages in performing the task of feature wavelength screening for high-dimensional xylose data, but also has the defect of low result reproducibility. After obtaining the wavelength features, the results are weighted and accumulated to weaken the impact of the algorithm's uncertainty on the final model. Then, a predictive model for xylose content is established using data measured by a liquid chromatograph as labels. Finally, the method is used to rapidly determine the xylose content of samples collected from the process site, and the prediction effects are compared with those of the PLS and Lasso models. The results indicate that the training set determination coefficient R2=0.937 7, and the test set determination coefficient R2p=0.933 5, with R2 and R2p close to 1, indicating that the model can explain the training set data well and has good generalization performance. The prediction root mean square error RMSEP=5.844 6, and the prediction relative analysis error RPD=3.879 2>2.5, indicating that the model can predict the xylose content of samples relatively accurately. Through comparison, it is found that the RJFA-PLS model's evaluation indicators are superior to those of the PLS model, with RMSEP reduced by 112.7%, and R2, RPD, and R2p increased by 21.8%, 52.5%, and 24.6%, respectively. However, the Lasso algorithm performs poorly predicting xylose content based on this dataset. Under the experimental conditions of this study, the model established using the above method is more suitable for predicting xylose content than the PLS and Lasso models. The proposal of this method solves the problem of lag in xylose content detection results and also provides a prerequisite for the research of online detection technology for xylose.
Key words:Xylose content; Near infrared spectroscopy; Random jumping frog algorithm; Rapid determination
[1] TANG Hui, CHENG Qiang, ZHANG Xue-rong(唐 辉, 陈 强, 张学荣). Journal of Food Safety & Quality(食品安全质量检测学报), 2020, 11(3): 955.
[2] YANG Nan-lin, CHENG Yi-yu, QU Hai-bin(杨南林, 程翼宇, 瞿海斌). Acta Chimica Sinica(化学学报), 2003, 61(5): 742.
[3] Lafuente V, Herrera L J, Pérez M D M, et al. Journal of the Science of Food and Agriculture, 2015, 95(10): 2033.
[4] Mishra P, Roger J M, Rutledge D N, et al. Postharvest Biology and Technology, 2020, 170: 111326.
[5] Dharmawan A, EviMasithoh R, Amanah H Z. Foods, 2023, 12(11): 2112.
[6] Cruz-Tirado J P, Vieira M S D S, Correa O O V, et al. Journal of Food Composition and Analysis, 2024, 126: 105901.
[7] ZHAO Jin-yi, CHEN Zheng-guang, YI Shu-juan(赵瑾熠, 陈争光, 衣淑娟). Chinese Journal of Analytical Chemistry(分析化学), 2024, 52(7): 1028.
[8] XU Hong-fa, LIU Zheng-hui, ZHANG Hong-mei, et al(徐宏发, 刘正辉, 张红梅, 等). Journal of Zhejiang University (Agricalture and Life Sciences)[浙江大学学报(农业与生命科学版)], 2024, 50(3): 393.
[9] JIA Wen-shen, LÜ Hao-lin, ZHANG Shang, et al(贾文珅, 吕浩林, 张 上, 等). Smart Agriculture(智慧农业), 2024, 6(1): 89.
[10] TANG Zi-ye, WEN Tao, DAI Xing-yong, et al(唐子叶, 文 韬, 代兴勇, 等). Food & Machinery(食品与机械), 2024, 40(6): 124.
[11] Felizardo P, Baptista P, Menezes J C, et al. Analytica Chimica Acta, 2007, 595(1): 107.
[12] Kulcsár T, Sárossy G, Bereznai G, et al. Periodica Polytechnica Chemical Engineering, 2013, 57(1-2): 15.
[13] YANG Zhen-fa, XIAO Hang, ZHANG Lei, et al(杨振发, 肖 航, 张 雷, 等). Chinese Journal of Analytical Chemistry(分析化学), 2020, 48(2): 275.
[14] Nioka S, Chance B. NIR Spectroscopic Detection of Breast Cancer. Technology in Cancer Research & Treatment. 2005, 4(5): 497.
[15] SUN Zhi-xing, ZHAO Zhong-gai, LIU Fei(孙志兴, 赵忠盖, 刘 飞). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2022, 42(3): 749.