Abstract:The blood contains many biological information, such as hormones, enzymes, blood sugar and other components. High blood sugar will cause diabetes, which has many complications, such as cerebral infarction, cerebral hemorrhage, kidney damage, fundus damage, peripheral neuropathy and a series of diseases.At present, the routine blood component detection and analysis cycle are too long, the resulting feedback is slow, and it is not easy to achieve rapid and continuous detection. Optical detection technology can identify the chemical composition and relative content of the substance according to the spectrum of the substance to be tested. Because of its advantages, such as high sensitivity, strong applicability, and fast analysis speed, it gradually exerts its advantages in blood non-invasive detection. With the continuous advancement of laser technology, Raman spectroscopy technology, as a nonlinear scattering spectroscopy technology, has been widely used in blood detection technology. In order to improve the prediction accuracy of Raman spectroscopy in this paper, the XGBoost algorithm was firstly applied to the blood glucose concentration of Raman spectroscopy to improve the prediction accuracy. 106 sets of experimental blood samples and real concentrations were provided by the First Hospital of Qinhuangdao City, Hebei Province. Bruker’s Multi RAM spectrometer was used to measure blood Raman spectroscopy data. In the experiment, the power of the 1 064 nm excitation light source was 400 mW, the spectral resolution was 6 cm-1, the scanning rate was 10 kHz, and the scanning range was 400~4 000 cm-1. Each sample is collected 10 times, and the average value is calculated as the original spectrum to ensure the accuracy and repeatability of the experiment. In this paper, the method does not require preprocessing of the data. Firstly, the spectral data were randomly divided into a training and test sets with a ratio of 7∶3. The training set was used to train the model and determine the model parameters. The test set was used to verify the stability and prediction accuracy of the model. Then, the XGBoost model was established, and grid search and k-fold cross-validation were used to optimize the model parameters. We adopted model evaluation indicators and a Clark grid error analysis chart to analyze the prediction of blood glucose concentration of the XGBoost model. Finally, the XGBoost model was compared with Decision Tree (DT), Random Forest (RF) and Support Vector Machine Regression (SVR) models.The experimental results showed that the quantitative regression model established by XGBoost had the best effect. The model’s coefficient of determination was 0.999 99, the mean square error of the calibration set was 0.007 49, the mean square error of the prediction set was 0.007 17, and the relative analysis error was 331.973 18. The prediction points fell in area A of the Clark grid error analysis chart. The results prove that the application of the XGBoost algorithm to the quantitative analysis of blood components in Raman spectroscopy has high prediction accuracy, and the data is not pre-processed, which can effectively shorten the program’s running time. It has broad development prospects in Raman spectroscopy and near-infrared spectroscopy quantitative analysis.
王铭萱,王巧云,骈斐斐,单 鹏,李志刚,马振鹤. 基于XGBoost的糖尿病血液拉曼光谱定量分析法[J]. 光谱学与光谱分析, 2022, 42(06): 1721-1727.
WANG Ming-xuan, WANG Qiao-yun, PIAN Fei-fei, SHAN Peng, LI Zhi-gang, MA Zhen-he. Quantitative Analysis of Diabetic Blood Raman Spectroscopy Based on XGBoost. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(06): 1721-1727.
[1] Sinclair A J, Abdelhafiz A H, Forbes A, et al. Diabet. Med., 2019, 36(4): 399.
[2] Kai Li, Heinz Rüdiger, Tjalf Ziemssen. Frontiers in Neurology, 2019, 10: 545.
[3] Chen T C. Guestrin C. XGBoost: A Scalable Tree Boosting System. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. 785.
[4] LI Zhan-shan,LIU Zhao-geng(李占山,刘兆赓). Journal on Communications(通信学报), 2019, 40(10): 2019154.
[5] CHEN Ming-hua,LIU Qun-ying, ZHANG Jia-shu, et al(陈明华, 刘群英, 张家枢,等). Power System Technology(电网技术), 2020, 44(3): 1026.
[6] LIU Bo, QIN Chuan, JU Ping, et al. (刘 波, 秦 川, 鞠 平, 等). Electric Power Automation Equipment(电力自动化设备), 2020, 40(3):147.
[7] LI Sheng-fang,JIA Min-zhi,DONG Da-ming(李盛芳,贾敏智,董大明). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2018,38(6): 1766.
[8] Zhong Jianchen,Sun Yusui,Peng Wei,et al. IEEE Trans Nanobioscience, 2018, 17(3): 243.
[9] Xiupeng Shi, Yiik Diew Wong, Michael Zhi-Feng Li, et al. Accident Analysis and Prevention, 2019, 129: 170.