Abstract:The essence of using near-infrared spectroscopy to realize non-destructive detection of agricultural products and food quality is to establish a machine learning model between sample spectral information and sample quality parameters. In order to obtain a machine learning model with good generalization performance, a large number of labeled samples are usually required. However, it is relatively easy to obtain spectral information of samples, but labeling samples quality parameters often involves a large amount of time and economic costs and is destructive. Active learning is a method to reduce the number of labeled samples in training set by selecting the most valuable samples for labeling instead of random selection. Therefore, active learning can control which samples are added to the training set, and the model no longer passively accepts samples for modeling. There have been many active learning algorithms in classification tasks. There are relatively few researches in regression tasks. Moreover, most of the existing active learning algorithms for regression tasks are supervised. That is, a small number of labeled samples are needed to train the initial model. In this paper, a training sample selection strategy based on unsupervised active learning is proposed. Firstly, the method divides the diversity of unlabeled (standard value) spectral datasets through hierarchical agglomerative clustering to obtain different clustering clusters. Then, the locally linear reconstruction method selects the most representative samples in each clustering cluster to form a training sample set and establish the partial least squares regression model based on the training set to predict the unlabeled samples. In this paper, partial least squares prediction models for soluble solids content and firmness prediction were constructed to evaluate the proposed method’s performance, using the near infrared spectrum data of three varieties of apples from two years. The experimental results show that the method proposed in this paper is superior to the existing sample selection strategy, which can effectively improve the model accuracy and reduce destructive physical and chemical experiments in model training. Meanwhile, compared with random sampling (RS), traditional Kennard-Stone (KS) and joint x-y distances (SPXY), the proposed method achieved the optimal performance. The root mean square error of the soluble solid content prediction models based on the unsupervised active learning algorithm proposed in this paper, which selects 200 samples as the training set, is reduced by 2.0%~13.2% compared with the other three algorithms, and the root means square error of the firmness prediction models is reduced by 1.2%~15.7%.
Key words:Spectroscopy; Quality detection; Active learning; Training sample selection
赵小康,赵 鑫,朱启兵,黄 敏. 一种基于无监督主动学习的苹果品质光谱无损检测模型构建方法[J]. 光谱学与光谱分析, 2022, 42(01): 282-291.
ZHAO Xiao-kang, ZHAO Xin, ZHU Qi-bing, HUANG Min. A Model Construction Method of Spectral Nondestructive Detection for Apple Quality Based on Unsupervised Active Learning. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(01): 282-291.
[1] Li X N, Huang J C, Xiong Y J, et al. Computers and Electronics in Agriculture, 2018, 155: 23.
[2] MAO Bo-hui, SUN Hong, LIU Hao-jie, et al(毛博慧, 孙 红, 刘豪杰, 等). Transactions of the Chinese Society for Agricultural Machinery(农业机械学报), 2017, 48(S1): 160.
[3] TANG Jin-ya, HUANG Min, ZHU Qi-bing(唐金亚, 黄 敏, 朱启兵). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2015, 35(8): 2136.
[4] GUO Wen-chuan, ZHU De-kuan, ZHANG Qian, et al(郭文川, 朱德宽, 张 乾, 等). Transactions of the Chinese Society for Agricultural Machinery(农业机械学报), 2020, 51(9): 350.
[5] MA Wen-qiang, ZHANG Man, LI Yuan, et al(马文强, 张 漫, 李 源, 等). Chinese Journal of Analytical Chemistry(分析化学), 2020, 48(12): 1737.
[6] WANG Li-guo, SHANG Hui, SHI Yao(王立国, 商 卉, 石 瑶). Journal of Harbin Engineering University(哈尔滨工程大学学报), 2020, 41(5): 731.
[7] DAI Xiang, HUANG Xi-feng, TANG Rui, et al(代 翔, 黄细凤, 唐 瑞, 等). Journal of South China University of Technology·Natural Science Edition(华南理工大学学报·自然科学版), 2019, 47(8): 84.
[8] Zhang L J, Chen C, Bu J J, et al. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(10): 2026.
[9] Mendoza F, Lu R F, Cen H Y. Postharvest Biology and Technology, 2012, 73: 89.
[10] Li H D, Liang Y Z, Xu Q S, et al. Analytica Chimica Acta, 2009, 648(1): 77.
[11] LIU Zi-ang, JIANG Xue, WU Dong-rui(刘子昂, 蒋 雪, 伍冬睿). Aata Automatica Sinica(自动化学报), https://doi.org/10.16383/j.aas.c200071.
[12] YAN Yue, ZHANG Hong-guang, LU Jian-gang, et al(鄢 悦, 张红光, 卢建刚, 等). Computers and Applied Chemistry(计算机与应用化学), 2017, 34(5): 351.