Infrared Spectral Study on the Origin Identification of Boletus Tomentipes Based on the Random Forest Algorithm and Data Fusion Strategy
HU Yi-ran1, LI Jie-qing1, LIU Hong-gao2, FAN Mao-pan1*, WANG Yuan-zhong3*
1. College of Resources and Environment, Yunnan Agricultural University, Kunming 650201, China
2. College of Agronomy and Biotechnology, Yunnan Agricultural University, Kunming 650201, China
3. Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences, Kunming 650200, China
Abstract:Boletus tomentipes Earleas a kind of healthy food is favored by the majority of consumers. The nutrient accumulation of the fruiting body is affected by the growth environment (altitude, climate, etc. ). There is a significant difference in the content of nutrient between different regionsIt is urgent to establish an accurate, rapid and cheap origin identification technology. In this paper, a data fusion strategy combined with random forest algorithm (RF) was used to identify the origin of B. tomentipes, and the effects of various eigenvalue extraction methods on the classification of RF models were compared. Fourier transform near infrared and Fourier transform mid-infrared spectra of 87 samples from 4 producing areas (north subtropics, north temperate zones, south subtropical zones and middle subtropical zones) were scanned to analyze their spectral characteristics. All the sampleswere divided into two thirds of the training set (58) and a third of the validation set (29) by the kennard-stone algorithm. Based on 4 kinds of infrared spectra ( near-infrared average spectra of stipes (N-b), near-infrared average spectra of caps (N-g), mid-infrared average spectra of stipes (M-b), mid-infrared average spectra of caps (M-g)) and three data fusion strategies (low-level fusion strategies, mid-level fusion strategies, high-level fusion strategies) of data, combining with the RF building identification model, the effects of different characteristic value (variable importance in projection, Boruta, latent variables) on the classification results of the model are compared. Among them, the optimal ntree and mtrywere selected according to oob. The classification performance of the model was evaluated with specificity, sensitivity, training set correctness, and validation set accuracy. Finally, the best method to identify the origin of B. tomentipes was found by multiple evaluation indicators. The results showed that (1) near infrared and middle infrared spectra could identify the origin of B. tomentipes. (2) It is not ideal for establish a discriminant model with a single spectrum combined with RF. (3) All three fusion strategies can improve the origin identification effect of B. tomentipes. Theresults of origin identification from good to bad are in order of high-level fusion, mid-level fusion, low-level fusion. By scanning the near infrared and middle infrared spectra of B. tomentipes, a high-level fusion strategy based on characteristic value LV was adopted, and the identification model of B. tomentipes from different regions was established with RF, which has high verification set accuracy (99.6%), high sensitivity (0.969) and high specificity (0.986). As a reliable method, it can identify the geographical origin of B. tomentipes quickly and accurately.
Key words:Boletus tomentipes; Geographic origin identification; Data fusion; Fourier transform mid-infrared spectrum; Fourier transform near infrared spectrum
胡翼然,李杰庆,刘鸿高,范茂攀,王元忠. 红外光谱的随机森林算法与数据融合策略对绒柄牛肝菌产地鉴别[J]. 光谱学与光谱分析, 2020, 40(05): 1495-1502.
HU Yi-ran, LI Jie-qing, LIU Hong-gao, FAN Mao-pan, WANG Yuan-zhong. Infrared Spectral Study on the Origin Identification of Boletus Tomentipes Based on the Random Forest Algorithm and Data Fusion Strategy. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40(05): 1495-1502.
[1] Wang X, Zhang J, Wu L, et al. Food Chemistry, 2014, 151: 279.
[2] YANGBAI Qiu-xiu, CHEN Xun, LIU Xiao-fei(杨白秋秀,陈 旭,刘晓飞). Edible Fungi of China(中国食用菌), 2017, 36(5): 13.
[3] LU Yong-xin, TIAN Hou-ming, YANG Hai-shu, et al(鲁永新,田侯明,杨海抒,等). Chinese Journal of Eco-Agriculture(中国生态农业学报), 2015, 23(6): 748.
[4] Falandysz J, Zhang J, Wiejak A, et al. Ecotoxicology and Environmental Safety, 2017, 142: 497.
[5] YANG Tian-wei, CUI Bao-kai, ZHANG Ji, et al(杨天伟,崔宝凯,张 霁,等). Mycosystema(菌物学报), 2014, 33(2): 262.
[6] Chen Y, Yan Y, Xie M, et al. Journal of Pharmaceutical and Biomedical Analysis, 2008, 47(3): 469.
[7] Wang X, Zhang J, Li T, et al. Journal of Analytical Methods in Chemistry, 2015, 2015: http://dx.doi.org/10.1155/2015/165412.
[8] Li Y, Zhang J, Wang Y. Analytical and Bioanalytical Chemistry, 2018, 410(1): 91.
[9] Wang Y, Zuo Z T, Huang H Y, et al. Royal Society Open Science, 2019, 6(5): 190399.
[10] He P, Xu X, Zhang B, et al. Estimation of Leaf Chlorophyll Content in Winter Wheat Using Variable Importance for Projection (VIP) with Hyperspectral Data. Proceedings of SPIE, 2015, 9637: 963708.
[11] CHEN Yi-jie, TANG Jia-shan(陈逸杰,唐加山). Software Guide(软件导刊), 2019, 18(4): 69.
[12] Mellado-Mojica E, López M G. Food Chemistry, 2015, 167: 349.