机器学习算法用于公安一线拉曼实际样本采样学习及其准确度比较

doi:10.3964/j.issn.1000-0593(2019)07-2171-05

摘要
参考文献
相关文章 (15)

全文: PDF (2327 KB)
输出: BibTeX | EndNote (RIS)

摘要：拉曼光谱设备在公安一线中正逐渐得到普及，主要用于检测易燃易爆及易制毒化学品。但在实际应用中，一线人员不会对拉曼设备进行非常准确的使用和操作，不具备专业知识条件的工作人员无法完全按照最佳条件进行检测，经常会发生离焦、偏移、采样时间过短等一系列问题，而检测结果也不可能完全符合标准测试库的算法，给最终结果比对造成非常大的影响。利用五种主流机器学习算法对实际检查、办案过程中采集到的原始数据进行学习分类，通过比较相应的准确度将最佳算法用于改善一线执法、检查过程中拉曼光谱设备的准确性。采集的数据均来自于公安部第三研究所自行研制的EVA3000型拉曼光谱仪，该光谱仪目前已在全国各省、市、地、县进行了一定的配备，一线检测人员会定期将采集的原始数据回传到EVA3000的后台管理系统。通过该管理系统，在线收集实际检查过程中产生的原始数据，以两类易制毒化学品和易燃易爆化学品为例，随机抽取已定性判定的苯乙酸、二氯甲烷、麻黄碱和硝基苯各40例共计160例，并分别利用决策树、随机森林、AdaBoost、支持向量机和人工神经网络算法各进行40，60，100，150，200，300和500次的交叉训练、预测、求取平均准确度。从实验结果可以看出，在五种学习算法中，对于实际样本的预测准确度排序大致为随机森林≈AdaBoost>决策树>SVM>人工神经网络。实际测试的结果与实验过程中的平均预测准确度大体一致。其中随机森林与AdaBoost的准确度相近，其原因在于两者的算法本质都是不断构建新的训练数据集并提高对于错误样本在下次学习中的权重，而SVM 和人工神经网络算法的本质都是基于感知器的算法。可见目前几种主流学习算法中，采用自举汇聚(bootstrap aggregating)方式的算法更适应于对实际样本的采样学习，其准确度也较高。在下一步的工作当中，将继续优化现有的算法，将其实现在后台管理系统上，并测试算法对于目前检测中无法定性物质的在线检测功能。该结果对于进一步将机器学习算法用于实际应用、在线分析，改善一线操作人员非正确使用设备对比对结果造成影响，具有重要意义。

关键词：拉曼光谱；易燃易爆及易制毒化学品；决策树；随机森林；Adaboost；神经网络；支持向量机; 公安一线

Abstract：Raman spectroscopy equipment comes into use in the front line of public security gradually, which is mainly used for the detection of inflammable，explosive and easily-made drug chemicals. However, workers without professional knowledge may not be able to perform detection in full accordance with the best conditions. Frequent problems such as defocusing, offsetting and short sampling time may cause a great influence on the final comparison. In this article, five mainstream machine learning algorithms were used to train and classify the original data collected during the actual inspection and handling of the case. Also, the accuracy comparisons was given in this paper. According to the result, algorithm with the best accuracy will be used to improve the Raman spectroscopy in the future. The collected data were all from the EVA3000 Raman spectrometer developed by the Third Research Institute of the Ministry of Public Security. The spectrometer had been equipped in certain provinces, cities, prefectures and counties across the country. Front-line inspection personnel would periodically transmit the raw data back to the EVA3000’s back office management system. Through the management system, the raw data generated during the actual inspection was collected. A total of 160 cases including phenylacetic acid, methylene chloride, ephedrine and nitrobenzene, which had been qualitatively determined, were randomly extracted from the uploaded database. The 40-, 60-, 100-, 150-, 200-, 300-, 500-time trainings and predictions with decision trees, random forests, AdaBoost, support vector machines and artificial neural networks were executed to calculate average accuracy respectively. From the experimental results, we can see that among the five learning algorithms, the ranking of the prediction accuracy to actual samples is roughly random forest≈AdaBoost>decision tree>SVM>ANNs. The verification results are generally consistent with the experimental ones. The accuracy of random forest is similar to AdaBoost because both algorithms constantly build new training data sets from the original ones and improve the weight of the wrong samples in the next training. On the other hand, SVM and ANNs are perceptron-based algorithms. It can be seen that in the current mainstream algorithms, bootstrap aggregating method is more suitable for the sampling training of actual samples. In the next step, the research team will continue to optimize existing algorithms and implement them in the back office management system for on-line detection. The results of this paper are of great significance for further using machine learning algorithms to the practical applications in the field of the front line of public security.

Key words：Raman spectroscopy; Flammable and explosive chemicals; Easily-made drug chemicals; Decision tree; Random forest; Adaboost; SVM; ANNs; Public security

收稿日期: 2018-06-10 修订日期: 2018-10-22

中图分类号:

TP39

基金资助: 国家“十三五”重点研发计划项目(2016YFC0801304)资助

通讯作者: 沈俊 E-mail: 63368207@qq.com

作者简介: 李志豪，1985年生，公安部第三研究所助理研究员 e-mail: lizhihao559@hotmail.com

引用本文:

李志豪，沈俊，边瑞华，郑健. 机器学习算法用于公安一线拉曼实际样本采样学习及其准确度比较[J]. 光谱学与光谱分析, 2019, 39(07): 2171-2175.
LI Zhi-hao, SHEN Jun, BIAN Rui-hua, ZHENG Jian. Accuracy Comparison of the Machine Learning Algorithm Used to Raman Real Sample Collection in the Front Line of Public Security. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(07): 2171-2175.

链接本文:

https://www.gpxygpfx.com/CN/10.3964/j.issn.1000-0593(2019)07-2171-05 或 https://www.gpxygpfx.com/CN/Y2019/V39/I07/2171