Abstract:Raman spectroscopy equipment comes into use in the front line of public security gradually, which is mainly used for the detection of inflammable,explosive and easily-made drug chemicals. However, workers without professional knowledge may not be able to perform detection in full accordance with the best conditions. Frequent problems such as defocusing, offsetting and short sampling time may cause a great influence on the final comparison. In this article, five mainstream machine learning algorithms were used to train and classify the original data collected during the actual inspection and handling of the case. Also, the accuracy comparisons was given in this paper. According to the result, algorithm with the best accuracy will be used to improve the Raman spectroscopy in the future. The collected data were all from the EVA3000 Raman spectrometer developed by the Third Research Institute of the Ministry of Public Security. The spectrometer had been equipped in certain provinces, cities, prefectures and counties across the country. Front-line inspection personnel would periodically transmit the raw data back to the EVA3000’s back office management system. Through the management system, the raw data generated during the actual inspection was collected. A total of 160 cases including phenylacetic acid, methylene chloride, ephedrine and nitrobenzene, which had been qualitatively determined, were randomly extracted from the uploaded database. The 40-, 60-, 100-, 150-, 200-, 300-, 500-time trainings and predictions with decision trees, random forests, AdaBoost, support vector machines and artificial neural networks were executed to calculate average accuracy respectively. From the experimental results, we can see that among the five learning algorithms, the ranking of the prediction accuracy to actual samples is roughly random forest≈AdaBoost>decision tree>SVM>ANNs. The verification results are generally consistent with the experimental ones. The accuracy of random forest is similar to AdaBoost because both algorithms constantly build new training data sets from the original ones and improve the weight of the wrong samples in the next training. On the other hand, SVM and ANNs are perceptron-based algorithms. It can be seen that in the current mainstream algorithms, bootstrap aggregating method is more suitable for the sampling training of actual samples. In the next step, the research team will continue to optimize existing algorithms and implement them in the back office management system for on-line detection. The results of this paper are of great significance for further using machine learning algorithms to the practical applications in the field of the front line of public security.
Key words:Raman spectroscopy; Flammable and explosive chemicals; Easily-made drug chemicals; Decision tree; Random forest; Adaboost; SVM; ANNs; Public security
李志豪,沈 俊,边瑞华,郑 健. 机器学习算法用于公安一线拉曼实际样本采样学习及其准确度比较[J]. 光谱学与光谱分析, 2019, 39(07): 2171-2175.
LI Zhi-hao, SHEN Jun, BIAN Rui-hua, ZHENG Jian. Accuracy Comparison of the Machine Learning Algorithm Used to Raman Real Sample Collection in the Front Line of Public Security. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(07): 2171-2175.
[1] CHENG Cheng(成 诚). 2016 Paper Collection for Seminar on Infrared and Remote Sensing Technology and Applications(2016年红外、遥感技术与应用研讨会暨交叉学科论坛论文集),2016. 84.
[2] JIANG Lin-hua, SHEN Jun, YU Zhi-hao, et al(蒋林华, 沈 俊, 余治昊, 等). Optical Instruments(光学仪器), 2018, 40(2): 31.
[3] Rafael Pino-Mejías, María-Dolores Jiménez-Gamero, María-Dolores Cubiles-de-la-Vega, et al. Pattern Recognition Letters, 2008, 29(3): 265.
[4] Suykens J A K, Vandewalle J. Neural Processing Letters,1999,9(3):293.
[5] Rubin Daniel B. Statistical Applications in Genetics and Molecular Biology, 2011, 10(1):54.
[6] Srinivas Y, Stanley Raj A, Hudson Oliver D, et al. Geoscience Frontiers, 2012, 3(5): 729.