深度卷积网络的多品种多厂商药品近红外光谱分类

doi:10.3964/j.issn.1000-0593(2019)11-3606-08

摘要
参考文献
相关文章 (15)

全文: PDF (4224 KB)
输出: BibTeX | EndNote (RIS)

摘要：近红外光谱（NIR）分析具有分析高效、样品无损、环境无污染以及可现场检测等优点，特别适合药品的快速建模分析。但NIR存在吸收强度弱以及谱带重叠等缺点，需要建立稳健可靠的化学计量学模型对其进行分析。深度卷积神经网络是深度学习方法中一个重要分支，它通过逐层抽取数据特征并进行组合、转换，形成更高层的语义特征，具有极强的建模能力，广泛应用于计算机视觉、语音识别等领域，而在药品NIR分析方面尚未见报道。基于深度卷积网络模型，对药品NIR多分类建模进行研究。针对药品NIR数据的特点，设计若干个面向多品种、多厂商药品NIR分类的一维深度卷积网络模型。模型中卷积层和池化层交叠排列用于逐层抽取NIR数据特征，输出层连接softmax分类器，对药品NIR数据进行分类概率预测。在输出层之前采用全局最大池化层，将特征图进行整体池化，形成一个特征点，用于解决全连接层存在的限制输入维度大小，参数过多的问题。同时，在网络模型中引入批处理操作和dropout机制，以防止梯度消失和减小网络过拟合的风险。在网络模型的设计过程中，通过设计不同的卷积网络层数以及不同的卷积核尺寸大小，分析其对建模效果的影响，同时分析五种经典数据预处理方法对NIR分析的影响。以我国7个厂商生产的头孢克肟片和11个厂商生产的苯妥英钠片样本NIR为实验对象，建立药品的多品种、多厂商分类模型，该模型在二分类、多分类实验中取得了良好的分类效果。在十八分类实验中，当训练集与测试集比例为7∶3时，分类准确率为99.37±0.45，比SVM, BP, AE和ELM算法取得更优的分类性能。同时，深度卷积神经网络模型推理速度较快，优于SVM和ELM算法，但训练速度慢于二者。大量实验结果表明，深度卷积神经网络可对多品种、多厂商药品NIR数据准确、可靠地判别分类，且模型具有良好的鲁棒性和可扩展性。该方法也可推广到烟草、石化等其他领域的NIR数据分类应用中。

关键词：深度卷积神经网络；近红外光谱；药品鉴别；多分类

Abstract：As near infrared spectroscopy (NIR) has many advantages, such as high efficiency, being non-destructive and environment-friendly and on-site detection, it is especially suitable for rapid modeling and analysis of drugs. However, there are some shortcomings such as weak absorption intensity and overlapping bands. It is necessary to establish a robust and reliable chemometrics model to analyze NIR. Deep convolution neural network (DCNN) is an important branch of deep learning method, which extracts data features layer by layer, combines and transforms them to form higher-level semantic features. It is widely used in computer vision, speech recognition and other fields, and has achieved great success, but has not been reported in drug NIR analysis yet. Based on the deep convolution network model, this paper studies the multi-class modeling of drug NIR. According to the characteristics of drug NIR data, several one-dimensional deep convolution network models for multi-class and multi-manufacturer drug NIR classification are designed. The overlapping arrangement of convolution layer and pool layer in the model is employed to extract NIR data features layer by layer, and the output layer is connected with the softmax classifier to predict the classification probability of NIR data. Before the output layer, the global maximum pooling layer is used to solve the problem of restricting the size of input dimension and too many parameters in the full connection layer. At the same time, batch normalization and dropout are introduced in the network model to prevent the gradient vanishing and reduce the risk of network overfitting. The impact on the modeling effect with different convolutional network layers and different convolution kernel sizes is analyzed. At the same time, the influence of five classical data preprocessing methods is explored. Taking NIR samples of cefixime and phenytoin tablets as experimental datasets, a multi-class and multi-manufacturer classification model of drugs is established. The model achieved good classification results in the experiments of binary-classification and multi-classification. In eighteen classification experiments, when the ratio between training set and test set was 7∶3, the classification accuracy was 99.37±0.45, which achieved better classification performance than SVM, BP, AE and ELM. At the same time, inference speed of deep convolution neural network was faster than SVM and ELM, but training speed was slower than both. A large number of experimental results showed that the deep convolutional neural network can accurately and reliably distinguish the NIR data of multi-class and multi-manufacturer drugs, with good robustness and scalability. The proposed method can also be extended to the application of NIR data classification in tobacco, petrochemical and other fields.

Key words：Deep convolution neural network; Near infrared spectroscopy; Pharmaceutical discrimination; Multi-classification

收稿日期: 2019-03-04 修订日期: 2019-07-11

中图分类号:

TP391

基金资助: 国家自然科学基金项目(21365008, 61562013)资助

通讯作者: 冯艳春，杨辉华 E-mail: fyc@nifdc.org.cn；yhh@bupt.edu.cn

作者简介: 李灵巧，1986年生，北京邮电大学自动化学院博士研究生 e-mail: 54pe@163.com
潘细朋，1985年生，北京邮电大学自动化学院博士研究生 e-mail: pxp201@bupt.edu.cn
李灵巧, 潘细朋：并列第一作者

引用本文:

李灵巧，潘细朋，冯艳春，尹利辉，胡昌勤，杨辉华. 深度卷积网络的多品种多厂商药品近红外光谱分类[J]. 光谱学与光谱分析, 2019, 39(11): 3606-3613.
LI Ling-qiao, PAN Xi-peng, FENG Yan-chun, YIN Li-hui, HU Chang-qin, YANG Hui-hua. Deep Convolution Network Application in Identification of Multi-Variety and Multi-Manufacturer Pharmaceutical. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(11): 3606-3613.

链接本文:

https://www.gpxygpfx.com/CN/10.3964/j.issn.1000-0593(2019)11-3606-08 或 https://www.gpxygpfx.com/CN/Y2019/V39/I11/3606