1. School of Electronic Engineering,Xi’an University of Posts and Telecommunications,Xi’an 710121,China
2. Key Laboratory of Spectral Imaging Technology of Chinese Academy of Sciences, Xi’an Institute of Optics and Precision Mechanics,Chinese Academy of Sciences,Xi’an 710119,China
摘要: 针对传统光谱法检测鸡蛋新鲜度存在的效率低、准确率不够高等问题,提出采用可见-近红外光谱结合极度提升树(XGBoost)等算法对鸡蛋新鲜度分类进行研究,以期在保证足够高准确度的同时大幅提高检测效率。将不同储存条件下的鸡蛋作为样本,并分别划分为训练集和测试集,采用训练集的综合评价指标(F-measure)和准确率(Accuracy)评估分类模型的性能。具体地,首先利用可见-近红外光谱系统采集鸡蛋的反射光谱,将所得的光谱数据经过不同预处理后再结合随机森林(random forest,RF)、偏最小二乘(partial least squares,PLS)、支持向量机(support vector machine ,SVM)、多层感知机(muhi-layer perception ,MLP)以及XGBoost等分类算法构建鸡蛋新鲜度分类评估模型,并对比各模型性能指标。分析结果发现,经Savitzky-Golay一阶导(Savitzky Golay first-order derivative,SG-1st-Der)预处理后的RF、SVM、XGBoost模型和经标准正态变量(standardized normal variate,SNV)预处理后的PLS、MLP模型具有较好的训练结果。为进一步提高模型精度和运算效率,提出利用区间偏最小二乘法(interval partial least squares,IPLS)对SG-1st-Der和SNV预处理后的光谱数据首先进行降维,然后再分别建立基于RF、SVM、XGBoost、PLS及MLP等算法的预估模型,最后通过测试集对模型进行验证。结果发现原始光谱数据经SG-1st-Der预处理后所建立的IPLS-XGBoost分类模型性能最优,在不同储藏条件下测试集的F-measure分别为92.33%和90%,Accuracy分别达到94.44%和91.67%,而程序运行时间均不超过0.6 s。表明,可见-近红外光谱结合IPLS-XGBoost分类算法可应用于鸡蛋新鲜度评估,该方法在模型分类性能、准确度评估、运行速度等方面比传统方法更具优越性。
关键词:可见/近红外光谱技术;XGBoost算法;区间偏最小二乘法;鸡蛋新鲜度
Abstract:In view of the low efficiency and accuracy of the traditional spectral method for egg freshness testing, we propose and demonstrate the study of egg freshness by using the VIS-NIR spectroscopy testing method combined with XGBoost and other algorithms. In our experiments, eggs were under different storage conditions as samples were divided into the training set and testing set for model building and evaluation. The harmonic weighted average (F-measure) and Accuracy were used as the performance evaluation indexes of the classification model. A VIS-NIR spectroscopy system collected the reflection spectra of eggs. The obtained spectral data werethen preprocessed and used to build different models for egg freshness evaluation. Various classification algorithms,including random forest (RF), least square regression (PLS), support vector machine (SVM), Multi-layer Perceptual Model (MLP) and XGBoost algorithm, were used. The performance of each modelwas evaluated in detail. The analysis shows that better training results are obtained in the RF, SVM and XGBoost models with data preprocessed by Savitzky Golay first-derivative (SG-1st-Der) and the PLS and MLP models with data preprocessed by standard normal variables (SNV).The interval partial least squares (IPLS) method was used to select a working waveband for data dimension reduction for models with the raw spectral data preprocessed by SG-1st-Der combing with the RF, SVM and XGBoost algorithms and models with the raw spectral data preprocessed by SNV combining with PLS and MLP algorithms, respectively. Based on the verification using the test set, it can be seen that the IPLS-XGBoost classification model after SG-1st-Der pretreatment performs best.For the conditions of room temperature storage and cold storage, the F-measure reached 92.33% and 90% respectively, and the Accuracy reached 94.44% and 91.67% respectively. Moreover, the computing time of the model for the prediction of test set samples takes only 0.6 s. The results show that the visible-near infrared spectroscopy method combined with the IPLS-XGBoost classification algorithm can be applied in egg freshness evaluation. Compared with traditional methods, this method has advantages in model classification performance, evaluation accuracy and running speed.
Key words:VIS-NIR spectrum;Interval partial least squares;XGBoost algorithm;Egg freshness
[1] WANG Qiao-hua, MA Yi-xiao, FU Dan-dan(王巧华, 马逸霄, 付丹丹). Transactions of the Chinese Society of Agricultural Engineering(农业工程学报), 2021, 40(6): 220.
[2] Qi L, Zhao M C, Li Z, et al. SN Applied Sciences, 2020, 2(6): 1113.
[3] CHU Xiao-li, SHI Yun-ying, CHEN Pu, et al(褚小立, 史云颖, 陈 瀑, 等). Progress in Chemical(化学进展), 2019, 38(5): 603.
[4] ZHANG Hui-e, YE Ping, LI Guang, et al(张慧娥, 叶 萍, 李 光, 等). Chinese Journal of Pharmaceutical Analysis(药物分析杂志), 2021, 41(8): 1360.
[5] ZHANG Lin-ying, LI Jing, RAO Hong-hui, et al(章琳颖, 黎 静, 饶洪辉, 等). Laser & Optoelectronics Progress(激光与光电子学进展), 2020, 57(23): 371.
[6] Cheng C W, Jung S Y, Lai C, et al. Journal of Supercomputing, 2020, 76(3): 1680.
[7] DUAN Yu-fei, WANG Qiao-hua, MA Mei-hu, et al(段宇飞, 王巧华, 马美湖, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2016, 36(4): 981.
[8] YANG Xiao-yu, DING Jia-xing, FANG Meng-meng, et al(杨晓玉, 丁佳兴, 房盟盟, 等). Chinese Journal of Luminescence(发光学报), 2018, 39(3): 394.
[9] Dong X G, Dong J, Li Y L, et al. Computers and Electronics in Agriculture, 2019, 156: 669.
[10] Cruz-Tirado J P, Medeiros M L D, Barbin D F. Journal of Food Engineering, 2021, 306: 110643.
[11] Yao K S, Sun J, Zhou X, et al. Journal of Food Process Engineering, 2020, 43(7): e13422.
[12] Dong X, Zhang B, Dong J, et al. Spectroscopy Letters, 2020, 53(7): 512.
[13] Li X L, Ma L F, Cheng P, et al. Energy Reports, 2022, 8(55): 1087.
[14] Zhang Y, Chen P, Gao Y, et al. Combinatorial Chemistry & High Throughput Screening, 2022, 25(1): 3.
[15] HU Jian, FENG Yao-ze, WANG Yi-jian, et al(胡 建, 冯耀泽, 王益健, 等). Acta Optica Sinica(光学学报), 2022, 42(1): 265.
[16] Ding Y, Xia G Y, Ji H W, et al. Analytical Methods, 2019, 11(29): 3657.