Study on Sugar Content Detection of Kiwifruit Using Near-Infrared
Spectroscopy Combined With Stacking Ensemble Learning
GUO Zhi-qiang1, ZHANG Bo-tao1, ZENG Yun-liu2*
1. College of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
2. National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, National R&D Center for Citrus Preservation, Wuhan 430070, China
Abstract:In this study, we employ near-infrared spectroscopy with Stacking ensemble learning to perform non-destructive sugar content analysis in kiwifruit. Our research focuses on the “Yunhai No.1” kiwifruit variety from Hubei. Using an infrared analyzer, we gathered spectral data from 280 samples, spanning 1 557 wavelengths in the 4 000~10 000 cm-1 range, and measured sugar content with a refractometer. Outliers were identified and excluded using a singular sample identification algorithm that combines Monte Carlo random sampling with a T-test. The SPXY algorithm was then employed to split the data into training and testing sets in a 4∶1 ratio. Data preprocessing involved multiple scattering corrections (MSC), Savitzky-Golay smoothing (SG), de-trending (DT), vector normalization (VN), and standard normal variable (SNV) transformations. Feature wavelengths were initially selected using uninformative variable elimination (UVE), competitive adaptive reweighted sampling (CARS), and interval variable iterative space shrinkage approach (iVISSA), followed by a secondary selection with the successive projections algorithm (SPA) to remove collinear variables. To address the limitations of single models in generalization, we designed an integrated learning model using the Stacking algorithm. This model incorporated Bayesian ridge regression (BRR), partial least squares regression (PLSR), support vector regression (SVR), and artificial neural networks (ANN) as base learners, with linear regression (LR) serving as the meta-learner. We assessed the performance of various ensemble model combinations and analyzed the influence of base learners on ensemble performance using the Pearson correlation coefficient. Experimental results indicated that vector normalization was the most effective among the five preprocessing methods. The VN-CARS-PLSR model demonstrated superior performance, with R2P of 0.805 and RMSEP of 0.498, identifying 177 feature wavelengths and reducing data volume by 88.6% compared to the original spectrum. Comparisons of different base learner combinations in the Stacking algorithm revealed that the PLS+SVR+ANN integrated model achieved the highest predictive accuracy, with R2P of 0.853 and RMSEP of 0.433. The study concludes that the stacking ensemble model offers more comprehensive modeling capabilities and superior generalization than single models, providing valuable technical support for non-destructive sugar quality detection in kiwifruit.
[1] GUO Lin-lin, PANG Rong-li, WANG Rui-ping, et al(郭琳琳,庞荣丽,王瑞萍,等). Journal of Fruit Science(果树学报), 2022, 39(10): 1864.
[2] LU Yu-dan, LIU Xiao-chi, FENG Xin, et al(路喻丹,刘晓驰,冯 新,等). Southeast Horticulture(东南园艺), 2022, 10(2): 137.
[3] ZHAO Zhi-lei, WANG Xue-mei, LIU Dong-dong, et al(赵志磊,王雪妹,刘冬冬,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2022, 42(9): 2836.
[4] WANG Shu-xian, XIAO Hang, YANG Zhen-fa, et al(王淑贤,肖 航,杨振发,等). Laser & Optoelectronics Progress(激光与光电子学进展), 2022, 57(23): 392.
[5] Tan B, You W, Huang C, et al. Electronics, 2022, 11(21): 3504.
[6] Zhang K, Jiang H, Zhang H, et al. Agriculture, 2022, 12(4): 489.
[7] Chen H, Lin B, Cai K, et al. Infrared Physics & Technology, 2021, 112: 103582.
[8] SU Fu, LUO Hai-bo(苏 赋,罗海波). Computer Engineering & Science(计算机工程与科学), 2022, 44(12): 2153.
[9] DING Lan, LUO Pin-liang(丁 岚,骆品亮). Review of Investment Studies(投资研究), 2017, 36(4): 41.
[10] LI Shuai, CHANG Jin-cai, LI Lü-mu-zhi, et al(李 帅,常锦才,李吕牧之,等). Computer Engineering & Science(计算机工程与科学), 2022, 44(8): 1402.
[11] SUN Zhao, LI Yun, JIANG Yu-wu, et al(孙 昭,李 云,江毓武,等). Marine Forecasts(海洋预报), 2023, 40(1): 39.
[12] SONG Hui-juan, CHEN Yao-deng, OUYANG Lin,et al(宋慧娟,陈耀登,欧阳霖,等). Journal of the Meteorological Sciences(气象科学), 2022, 42(5): 569.
[13] Tan Z, Zhang J, He Y, et al. IEEE Access, 2020, 8: 227719.
[14] SHI Jia-qi, ZHANG Jian-hua(史佳琪,张建华). Proceedings of the CSEE(中国电机工程学报), 2019, 39(14): 4032.
[15] Deng B C, Yun Y H, Ma P, et al. The Analyst, 2015, 140(6): 1876.