Abstract:Hawthorns from different origins have uneven quality due to the differences in growth environment and geographic climate, so determining the geographic origin of hawthorns is of great significance. A combined identification model based on error reciprocal weighting was proposed to improve the stability and accuracy of the hawthorn origin traceability model. Firstly, the hyperspectral information of 456 hawthorns was collected using hyperspectral imaging technology; and by comparing Savitzky-Golay Convolutional Smoothing (SG), Multiplicative Scattering Correction (MSC), and Standard Normal Variables (SNV) three preprocessing methods, and used the preprocessed data and the original data to construct BP Neural Network (BPNN) and Random Forest (RF) models, the preprocessing method with SNV as the average spectral value was determined based on their accuracy. Then, the hyperspectral image of the hawthorn was subjected to principal component analysis, and the 1st principal component image was selected; at the same time, six feature wavelengths were screened based on the weight coefficients under the full wavelength band, and then the corresponding average spectral value was used as the representation value of the spectral information. Secondly, the texture features corresponding to the 1st principal component image and the feature wavelengths grayscale images were extracted, respectively, and the spectral representation values of the feature wavelengths were combined with the texture representation values of these feature wavelengths grayscale images as well as the texture representation values of the principal component image to construct the input vectors of the origin traceability identification model. Finally, three methods of BPNN, RF, and weighted combination model (BPNN-RF) were selected for the identification model construction, and two evaluation indexes, namely, accuracy (Acc) and macroF1 score (macroF1) were selected to evaluate and analyze the hawthorn origin identification models constructed by different input vectors. The results showed that the accuracy and macroF1 score of the BPNN-RF model with the same input vector were mostly better than those of the BPNN model and the RF model, in which the accuracy of the actual test data set of the BPNN-RF model with the input vector consisting of three kinds of representation values was increased from 89.01% to 98.90%. The macroF1 score was also increased from 89.32% to 98.95%. This indicates that the combined BPNN-RF model based on the error inverse assignment has the strongest discriminative ability and the best effect on the identification of hawthorn origin, which is better than the single discriminative model such as BPNN or RF. This study provides methodological support for the traceability of hawthorn origin without relying on physicochemical analysis and only relying on hyperspectral information.
[1] YANG Xiao-ning, SUN Xin-guang, ZHOU Li-juan, et al(杨晓宁,孙欣光,周丽娟,等). Chinese Journal of Information on Traditional Chinese Medicine(中国中医药信息杂志), 2022, 29(7): 105.
[2] ZHENG Shao-ming, YANG Jian-yu, LI Yang, et al(郑绍明,杨建宇,李 杨,等). Guangming Journal of Chinese Medicine(光明中医), 2020, 34(14): 2263.
[3] LIAO Bao-sheng, SONG Jing-yuan, XIE Cai-xiang, et al(廖保生,宋经元,谢彩香,等). China Journal of Chinese Materia Medica(中国中药杂志), 2014, 39(20): 3881.
[4] Gong S, Liu J R, Liu Y S, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2023, 292: 122394.
[5] Liu C L, Zuo Z T, Xu F R, et al. Frontiers in Plant Science, 2023, 13: 1009727.
[6] Zhang L Z, Dai H M, Zhang J L, et al. Foods, 2023, 12(3): 499.
[7] Li Q X, Zhou W H, Wang Q H, et al. Foods, 2023, 12(9): 1900.
[8] Mao X J, Ren N, Dai P Y, et al. Computers and Electronics in Agriculture, 2024, 219: 108818.
[9] XUE Shu-ning, YIN Yong, YU Hui-chun, et al(薛书凝,殷 勇,于慧春,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(12): 3871.
[10] YU Hui-chun, WANG Run-bo, YIN Yong, et al(于慧春,王润博,殷 勇,等). Journal of Nuclear Agricultural Sciences(核农学报), 2018, 32(3): 523.
[11] Mishra P, Nordon A, Mohd Asaari M S, et al. Journal of Food Engineering, 2019, 249: 40.
[12] WANG Yao-nan, LI Shu-tao(王耀南,李树涛). Control and Decision(控制与决策), 2001, 16(5): 518.
[13] Sharif M, Khan M A, Iqbal Z, et al. Computers and Electronics in Agriculture, 2018, 150: 220.
[14] WANG Zeng-mao, DU Bo, ZHANG Liang-pei, et al(王增茂,杜 博,张良培,等). Acta Photonica Sinica(光子学报), 2014, 43(8): 0810002.
[15] Fauvel M, Benediktsson J A, Chanussot J, et al. IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(11): 3804.
[16] LING Li-wen, ZHANG Da-bin(凌立文,张大斌). Statistics and Decision(统计与决策), 2019, 35(1): 18.
[17] Wu N, Weng S Z, Chen J X, et al. Computers and Electronics in Agriculture, 2022, 196: 106850.
[18] Li J B, Chen L P, Huang W Q, et al. Postharvest Biology and Technology, 2016, 112: 121.