Near-Infrared Spectroscopy Combined With Random Forest Algorithm: A Fast and Effective Strategy for Origin Traceability of Fuzi
GONG Sheng1, ZHU Ya-ning2, ZENG Chen-juan3, MA Xiu-ying3, PENG Cheng1, GUO Li1*
1. State Key Laboratory of Southwestern Chinese Medicine Resources, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
2. Yaan Sanjiu Pharmaceutical Co., Ltd., Yaan 625000, China
3. Sichuan Jianengda Panxi Pharmaceuticals Industry Co., Ltd., Butuo 616350, China
Abstract:Effective and reliable methods of origin certification are essential for protecting high-value Chinese medicinal materials (e.g geo-authentic Chinese medicinal materials, geographical indication products, etc.) from designated regions. As a famous traditional Chinese medicine and a geo-authentic Chinese medicinal material produced in Sichuan Province, Aconiti Lateralis Radix Praeparata (Fuzi) has a remarkable curative effect and wide clinical application is in great demand in domestic and international markets. The efficacy and price of the Fuzi of different origins vary, and it is difficult for the public to identify them through traditional experience accurately. Mass spectrometry-based on plant metabolomics is a tedious and lengthy test sample preparation process, complicated operation, long detection time, and low reproducibility. Near-infrared (NIR) spectroscopy, a mature, fast and nondestructive detection technique was integrated with machine learning to bring new ways for online quality supervision and control of Chinese medicinal materials. Therefore, a non-destructive identification model based on NIR spectroscopy combined with a random forest (RF) algorithm was developed for different origins of Fuzi. A total of 255 samples of Fuzi were collected from the major cultivation regions of Sichuan, Shaanxi and Yunnan, and the diffuse reflectance spectral information of all samples was obtained using Fourier transform NIR spectroscopy. Single and combined spectral preprocessing methods are used to eliminate multiple interferences in the spectra, and the best preprocessing method is screened and used as an input indicator to build an RF model. The comprehensive performance of the RF model was evaluated using sensitivity, specificity and balanced accuracy. The results showed that Savitzky-Golay 11-point smoothing combined with multivariate scattering correction was the best preprocessing method.Using only the full wavelength data, the prediction accuracy of the RF model for the three groups of provincial samples was also checked over 90%, and the prediction accuracy after preprocessing reached 98.39%. For the city/county level samples, the RF model also had the excellent discriminative ability, greater than 75% accuracy. The RF model achieved 100% recognition rate for samples from cultivation areas around the traditional production areas. The top 100 feature wave numbers were filtered out, and the model was re-optimized, and the recognition accuracy of the model for each city/county level region was over 85%, especially for some samples from the highlands was significantly improved. In this study, an environment-friendly traceability strategy with faster analysis, less sample loss and higher precision was adopted, providing a new model for the rapid and efficient identification of Fuzi of different origins and a reference for the subsequent identification and traceability of Fuzi and its related processed products.
Key words:Fuzi;Origin;Traceability;Near-infrared spectroscopy; Machine learning; Random forest
龚 圣,朱雅宁,曾陈娟,马秀英,彭 成,郭 力. 近红外光谱结合随机森林算法:一种快速有效的附子产地溯源策略[J]. 光谱学与光谱分析, 2022, 42(12): 3823-3829.
GONG Sheng, ZHU Ya-ning, ZENG Chen-juan, MA Xiu-ying, PENG Cheng, GUO Li. Near-Infrared Spectroscopy Combined With Random Forest Algorithm: A Fast and Effective Strategy for Origin Traceability of Fuzi. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(12): 3823-3829.
[1] PENG Cheng(彭 成). Genuine Chinese Medicinal in China(中华道地药材). Beijing: Chinese Medicine Press(北京:中国中医药出版社), 2011. 1293.
[2] REN Pin-an, LI Xiao-lin, HUANG Jing, et al(任品安,李晓林,黄 晶,等). Chinese Traditional and Herbal Drugs(中草药), 2019, 50(13): 3255.
[3] QIAN Chang-min, SONG Zhao-hui, ZHANG Lan-lan, et al(钱长敏,宋兆辉,张兰兰,等). China Journal of Chinese Materia Medica(中国中药杂志), 2013, 38(17): 2761.
[4] ZHANG Ding-kun, HAN Xue, LI Rui-yu, et al(张定堃,韩 雪,李瑞煜,等). Journal of Chinese Materia Medica(中国中药杂志), 2016, 41(3): 463.
[5] PENG Cheng(彭 成). Safety Evaluation and Application of Toxic Traditional Chinese Medicines Aconiti Laleralis Radix Praeparata, Aconiti Radix, and Aconiti Kusenzoffii Radix(有毒中药附子、川乌、草乌的安全性评价与应用). Chengdu:Sichuan Science and Technology Press(成都: 四川科学技术出版社), 2014. 10.
[6] FAN Lin-hong, HE Lin, TAN Chao-qun, et al(范林宏,何 林,谭超群,等). Chinese Journal of Experimental Traditional Medical Formulae(中国实验方剂杂志), 2022, 28(3): 131.
[7] Dong F, Lin J T, You J H, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2020, 226:117555.
[8] Pan W, Wu M, Zheng Z Z, et al. Journal of Food Science, 2020, 85(7): 2004.
[9] LEI Xiao-qing, LI Geng, WANG Xiu-li, et al(雷晓晴, 李 耿, 王秀丽, 等). Chinese Traditional and Herbal Drugs(中草药), 2018, 49(11): 2653.
[10] Qi L M, Zhong F R, Chen Y, et al. J. Pharm. Anal., 2020, 10(4): 356.
[11] Xue J T, Yang Q W, Jing Y, et al. Pharmacognosy Magazine, 2016, 12(47): 188.
[12] Liu P, Wang J, Li Q, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2019, 206: 23.
[13] Koljonen J, Nordling T E M, Alander J T. Journal of Near Infrared Spectroscopy, 2008, 16(3): 189.
[14] Brusco M J. Computational Statistics & Data Analysis, 2014, 77: 38.
[15] Rahman A, Kondo N, Ogawa Y, et al. Biosystems Engineering, 2016, 141: 12.
[16] Fan W, Shan Y, Li G, et al. Food Analytical Methods, 2012, 5(3): 585.
[17] Zhang B, Li J, Fan S, et al. Computers and Electronics in Agriculture, 2015, 114: 14.
[18] Speiser J L, Miller M E, Tooze J, et al. Expert Systems with Applications, 2019, 134: 93.
[19] Breiman L. Machine Learning, 2001, 45(1): 5.
[20] Ziegler A, Konig I R. Wiley Interdiscip Rev-Data Mining Knowl Discov, 2014, 4(1): 55.
[21] Masetic Z, Subasi A. Comput Methods Programs Biomed, 2016, 130: 54.
[22] Sapir-Pichhadze R, Kaplan B. Transplantation, 2020, 104(5): 905.
[23] Hariharan R. Urban Climate, 2021, 36: 100780.
[24] Bertolini F, Galimberti G, Schiavo G, et al. Animal, 2018, 12(1): 12.
[25] Roguet A, Eren A M, Newton R J, et al. Microbiome, 2018, 6(1): 185.
[26] De Santana F B, Borges Neto W, Poppi R J. Food Chemistry, 2019, 293: 323.
[27] Wu X M, Zhang Q Z, Wang Y Z. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2018, 205: 479.
[28] Qi L, Li J, Liu H, et al. Food & Function, 2018, 9(11): 5903.
[29] Dhanoa M S, Lister S J, Sanderson R, et al. Journal of Near Infrared Spectroscopy, 1994, 2(1): 43.
[30] LIN Jun-zhi, ZHANG Ding-kun, PEI Jin, et al(林俊芝,张定堃,裴 瑾,等). Chinese Journal of Experimental Traditional Medical Formulae(中国实验方剂学杂志), 2017, 23(13): 35.
[31] YU Dai-xin, GUO Sheng, YANG Jian, et al(余代鑫,郭 盛,杨 健,等). China Journal of Chinese Materia Medica(中国中药杂志), 2022, 47(4): 862.