Identification of the Age of Puerariae Thomsonii Radix Based on Hyperspectral Imaging and Principal Component Analysis
HU Hui-qiang1, WEI Yun-peng1, XU Hua-xing1, ZHANG Lei2, MAO Xiao-bo1*, ZHAO Yun-ping2*
1. School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China
2. National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100020, China
Abstract:Puerariae Thomsonii Radix is a medicinal and edible plant with an extremely high medicinal and edible value containing puerarin, starch, cellulose, vitamins, etc. Extensive research has shown that the content of chemical components in Puerariae Thomsonii Radix is closely related to the growth period. However, much of the research up to now has been descriptive. The main disadvantage of traditional techniques is that the operation cycle is long, and the destructiveness is large, which cannot be tested on a large scale. The development of hyperspectral imaging (HIS) has provided new insights for the rapid non-destructive identification of Puerariae Thomsonii’ s age.In order to avoid the quality problems caused by the insufficient growth years of Pueraria, hyperspectral imaging technology combined with machine learning was used in this experiment to identify the years of Pueraria accurately. However, in fact, one major drawback of this approach is that there is a great deal of redundant information in hyperspectral image data. What is more, the huge amount of data and highly correlated between characteristic bands directly increases the difficulty of sample identification. Principal Component Analysis (PCA) has been taken to extract features from the data to avoid an impact on subsequent classification effects. Based on the full band and PCA dimensionality reduction data to achieve accurate identification of different years of age, there are four classification models currently being adopted in research, including support vector machines (SVM), logistic regression (LR), multi-layer perceptron (MLP) and random forest (Random Forest, RF).When using full-band data modeling, the accuracy of four different classification models under different lenses is 78.09%, 77.03%, 81.43%, 72.09% and 93.11%, 93.79%, 94.23%, 89.77% respectively. The MLP model achieved the best effect under both SN0605VNIR(VNIR) and N3124SWIR(SWIR) lenses. When using PCA dimensionality reduction data modeling, the test set accuracy of four different classification models under two lenses is 96.12%, 87.53%, 95.02%, 93.41% and 99.26%, 97.09%, 99.16%, 97.91% respectively, in which SVM has achieved the optimal prediction accuracy under both VNIR and SWIR lenses. In summary, these results show that the method of PCA can effectively improve the model’s prediction accuracy. In addition, in order to explore the influence of principal component content on prediction accuracy, the authors analyzed the model parameters further, and the experimental results showed that under the VNIR lens, the principal components of the four models accounted for 65%, 75%, 80% and 45% when the accuracy of the test set reached the highest. Under the SWIR lens, when the accuracy of the test set of the four models reached the highest, the proportion of principal components was 20%, 60%, 35% and 30%, respectively. Among them, the PCA-SVM performed the best comprehensive effect, and high prediction accuracy (99.28%) was achieved with 20% principal components. Therefore, the findings of hyperspectral imaging technology combined with machine learning will be of interest to realisingrapid, non-destructive and high-precision identification of the age of Puerariae Thomsonii Radix.
Key words:Hyperspectral imaging; Identification of Puerariae Thomsonii Radix growth years; Machine learning; Principal component analysis
胡会强,位云朋,徐华兴,张 蕾,毛晓波,赵宇平. 基于高光谱成像技术和主成分分析对粉葛年限的鉴别[J]. 光谱学与光谱分析, 2023, 43(06): 1953-1960.
HU Hui-qiang, WEI Yun-peng, XU Hua-xing, ZHANG Lei, MAO Xiao-bo, ZHAO Yun-ping. Identification of the Age of Puerariae Thomsonii Radix Based on Hyperspectral Imaging and Principal Component Analysis. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(06): 1953-1960.
[1] Chinese Pharmacopoeia Commission(国家药典委员会). Pharmacopoeia of the People’s Republic of China(中华人民共和国药典). Beijing: China Traditional Chinese Medicine Publishing House(北京:中国中医药出版社), 2020, 347.
[2] Lai Y, Quan H, Shao F, et al. Phytochemistry Letters, 2020, 39: 90.
[3] Sun Y, Zhang H, Cheng M, et al. Natural Product Research, 2019, 33(24): 3485.
[4] Wu K, Liang T, Duan X, et al. Food and Chemical Toxicology, 2013, 60: 341.
[5] Ahmad B, Rehman S U, Azizullah A, et al. Chemical Biology & Drug Design, 2021, 97(4): 914.
[6] ZHU Wei-feng, LI Jia-li, MENG Xiao-wei, et al(朱卫丰, 李佳莉, 孟晓伟, 等). China Journal of Chinese Materia Medica(中国中药杂志), 2021, 46(6): 1311.
[7] ZHANG Ming-yue(张明月). Master Dissertation(硕士论文). Jiangxi University of Traditional Chinese Medicine(江西中医药大学), 2021.
[8] Liu D, Ma L, Zhou Z, et al. Food Chemistry, 2021, 343: 128445.
[9] Yamazaki T, Hosono T, Matsushita Y, et al. International Journal of Clinical Pharmacology Research, 2002, 22(1): 23.
[10] Wong K H, Razmovski-Naumovski V, Li K M, et al. Journal of Ethnopharmacology, 2015, 164: 53.
[11] LI Gui-hua, FU Mei, LUO Wen-long, et al(李桂花, 符 梅, 罗文龙,等). Guangdong Agricultural Sciences(广东农业科学), 2021, 48(9): 72.
[12] MA Xi-ling, ZHANG Rui, YUAN Ren-wen, et al(马喜玲, 张 蕊, 袁仁文,等). Northern Horticulture(北方园艺), 2020,(20): 107.
[13] LIU Ji-quan, SONG Qiang, YANG Wen-zhen, et al(刘计权, 宋 强, 杨文珍,等). China Pharmacy(中国药房), 2015, 26(3): 366.
[14] XIONG Hui-jiang, LIU Shen-liang(熊惠江, 刘甚良). Journal of Medicine & Pharmacy of Chinese Minoritie(中国民族医药杂志), 2009, 15(7): 45.
[15] WANG De-li, SUN Ying, ZHANG Xing-cui(王德立, 孙 滢, 张兴翠). Research and Practice on Chinese Medicines(现代中药研究与实践), 2008,(1): 23.
[16] YUE Shi-yan, ZHOU Rong-rong, NAN Tie-gui, et al(岳世彦, 周荣荣, 南铁贵,等). China Journal of Chinese Materia Medica(中国中药杂志), 2022, 47(10): 2689.
[17] ZHOU Li-shi, PAN Xiao-yan, QIU Wen-xi, et al(周礼仕, 潘小燕, 邱雯曦,等). Journal of Chinese Medicinal Materials(中药材), 2021,(10): 2382.
[18] Chang C I. Springer Science & Business Media, 2003, 1.
[19] Buckley S J, Kurz T H, Howell J A, et al. Computers & Geosciences, 2013, 54: 249.
[20] Lu B, Dao P D, Liu J, et al. Remote Sensing, 2020, 12(16): 2659.
[21] Stuart M B, McGonigle A J S. Sensors, 2019, 19(14): 3071.
[22] Calin M A, Parasca S V, Savastru D, et al. Applied Spectroscopy Reviews, 2014, 49(6): 435.
[23] He J, Chen L, Chu B, et al. Molecules, 2018, 23(9): 2395.
[24] Vermaak I, Viljoen A, Lindström S W. Journal of Pharmaceutical and Biomedical Analysis, 2013, 75: 207.
[25] Tankeu S, Vermaak I, Chen W, et al. Phytochemistry, 2016, 122: 213.
[26] Tan W, Sun L, Yang F, et al. Optik, 2018, 154: 581.
[27] Bajorski P. IEEE Journal of Selected Topics in Signal Processing, 2011, 5(3): 438.
[28] Guo C, Liu F, Kong W, et al. Journal of Food Engineering, 2016, 179: 11.
[29] Jamal M Z, Lee D H, Hyun D J. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 1.
[30] DUAN Long, YAN Tian-xing, WANG Jiang-li, et al(段 龙, 鄢天荥, 王江丽,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2021, 41(12): 3857.
[31] Meng Z, Zhao F, Liang M. Remote Sensing, 2021, 13(20): 4060.
[32] Xia J, Ghamisi P, Yokoya N, et al. IEEE Transactions on Geoscience and Remote Sensing, 2017, 56(1): 202.