Origin Discrimination and Soluble Protein Content Prediction of Dried Daylily Based on Near Infrared Spectroscopy
ZHANG Xue-li1, 2, YANG Hao1, 2, LI Chen-fei1, 2, SUN Yi-le1, 2, LIU Zong-lin1, 2, ZHENG De-cong1, 2, SONG Hai-yan1, 2*
1. College of Agricultural Engineering, Shanxi Agricultural University, Taigu 030801, China
2. Dryland Farm Machinery Key Technology and Equipment Key Laboratory of Shanxi Province, Taigu 030801, China
Abstract:Daylily is rich in nutrients and has high edible, medicinal, and economic value. It has many producing areas in China. The origin discrimination and soluble protein content prediction of daylilies are of great significance to the quality management of daylilies, the establishment of an agricultural product brand, and the development of the local economy. Because fresh daylily contains a variety of alkaloids, it is not suitable to eat in large quantities. Therefore, most of the daylilies on the market are dried daylilies. In this paper, the origin discrimination model and soluble protein content prediction model for dried daylily were established based on near-infrared spectroscopy. To address the issues of low discrimination accuracy and inaccurate content prediction in the original algorithm, the model was enhanced, resulting in a significant improvement in accuracy through the combination of various preprocessing methods and characteristic wavelength screening algorithms. In this study, Partial Least Squares Discriminant Analysis (PLS-DA), Random Forest (RF), and Support Vector Machine (SVM) were combined with Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) and Savitzky-Golay smoothing (SG) respectively to establish the origin discrimination models of dried daylily and compare the model discrimination results. The experimental results show that PLS-DA combined with MSC has the best effect on origin discrimination, with an accuracy of 93.33%. The precision and recall of the three origins are all above 85%, with an average precision of 91.9% and an average recall of 91.9%. It demonstrates that the model exhibits good accuracy and stability, and can effectively distinguish the origin of dried daylilies. At the same time, Partial Least Squares Regression (PLSR) was combined with a variety of preprocessing methods and three characteristic wavelength screening algorithms: Unobserved Variable Elimination (UVE), Competitive Adaptive Reweighted Sampling (CARS) and Successive Projections Algorithm (SPA), respectively, to establish the prediction models of soluble protein content of dried daylily and compare the prediction results. The results show that the model established by PLSR, combined with SG and CARS, has the best predictive effect. The determination coefficient R2 reached 0.981 5, and the Root Mean Square Error of Prediction (RMSEP) was 0.021 4 g·kg-1. Compared with the original PLSR, the R2 increased by 0.12, and the RMSEP decreased by 0.033 1 g·kg-1. This prediction model can well predict the soluble protein content of dried daylily.
张雪莉,杨 浩,李晨斐,孙一乐,刘宗霖,郑德聪,宋海燕. 基于近红外光谱的干制黄花菜产地判别及可溶性蛋白质含量预测[J]. 光谱学与光谱分析, 2025, 45(09): 2491-2495.
ZHANG Xue-li, YANG Hao, LI Chen-fei, SUN Yi-le, LIU Zong-lin, ZHENG De-cong, SONG Hai-yan. Origin Discrimination and Soluble Protein Content Prediction of Dried Daylily Based on Near Infrared Spectroscopy. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(09): 2491-2495.