Application of High-Dimensional Infrared Spectral Data Preprocessing in the Origin Identification of Traditional Chinese Medicinal Materials
JIN Cheng-liang1, WANG Yong-jun2*, HUANG He2, LIU Jun-min3
1. School of Information and Engineering, Wenzhou Business College, Wenzhou 325035, China
2. School of Artificial Intelligence, Wenzhou Polytechnic, Wenzhou 325035, China
3. School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China
Abstract:To improve the effectiveness of identifying the origin of Chinese Medicinal Materials based on infrared spectroscopic data with high dimensions, appropriate data preprocessing(DP) should be firstly used, and advanced algorithms can be considered secondly if necessary. Faced with the dataset consists of 658 samples with wavelengths from 551 to 3 998 nm, with the help of support vector machine (SVM) algorithm, ten sample-based DP methods (namelynon-DP, maximum and minimum normalization, standardization, centralization, moving average smoothing, SG smoothing filtering, multivariate scattering correction, regularization, first order derivative followed by second order derivative calculation), five spectral feature based methods (i. e., non-DP, centralization, maximum and minimum normalization, standardization and regularization) and their combinations (50 kinds in total) were investigated accord to the prediction effectiveness and stability. Numerical results show that the right DP is conducive to improving the model accuracy. Moreover the standard variate and Max-Min average DP methods achieve higher scores (the coefficient R2 is approximately 85%) among 10 sample based methods. Feature based only methods get little model improvement. The sample based only and feature-based only methods get the approximately equal average ratio of 64%. The combined methods of standard normal variate or normalization processing followed by second order derivative DP achieve the relatively highest prediction score with R2 of nearly 94%. However, the DP approach of data regularization added to centralization performs most poorly. The suggestions are also given. The research is valuable for further analysis of medicinal efficacy and chemical composition. Furthermore, it can be a reference to infrared spectral data analysis. Moreover, the research also provides references for modeling data with high dimensional small samples.
Key words:Origin identification of Chinese medicinal materials; Infrared spectroscopic data; Data preprocessing; High dimensional small sample; SVM algorithm
金承亮,王永军,黄 河,刘军民. 高维红外光谱数据预处理在中药材产地鉴别中的应用[J]. 光谱学与光谱分析, 2023, 43(07): 2238-2245.
JIN Cheng-liang, WANG Yong-jun, HUANG He, LIU Jun-min. Application of High-Dimensional Infrared Spectral Data Preprocessing in the Origin Identification of Traditional Chinese Medicinal Materials. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(07): 2238-2245.
[1] LI Zhi-gang(李志刚). Spectral Data Processing and Quantitative Analysis Technology(光谱数据处理与定量分析技术). Beijing: Beijing University of Posts and Telecommunications Press(北京:北京邮电大学出版社), 2017.
[2] LIU Shu-hua, ZHANG Xue-gong, SUN Su-qin(刘沭华, 张学工, 孙素琴). Chinese Science Bulletin(科学通报) , 2005,50(4): 393.
[3] ZHU Yan, CUI Xiu-ming, SHI Li-ping(朱 艳, 崔秀明, 施莉屏). Research and Practice on Chinese Medicines(现代中药研究与实践), 2006,20(1): 58.
[4] WANG Yong, LI Hao, WANG Jing(汪 勇, 李 好, 王 静). Statistics & Decision(统计与决策), 2020, 36(24): 15.
[5] WANG Xin(王 欣). Science & Technology Information(科技资讯), 2013, 336(15): 2.
[6] WANG Zhi-hong, LIU Jie, WANG Jing-ru, et al(王智宏, 刘 杰, 王婧茹, 等). Journal of Jilin University(Engineering and Technology Edition)[吉林大学学报(工学版)], 2013, 43(4): 1017.
[7] LÜ Mei-rong, REN Guo-xing, LI Xue-ying, et al(吕美蓉,任国兴,李雪莹, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(8): 2409.
[8] Windig W, Shaver J, Bro R, et al. Applied Spectroscopy, 2008, 62(10): 1153.
[9] WANG Jian-feng, ZHANG Lei, CHEN Guo-xing, et al(王健峰, 张 磊, 陈国兴, 等). Applied Science and Technology(应用科技), 2012, 39(3): 28.