|
|
|
|
|
|
Stacked Contractive Auto-Encoders Application in Identification of Pharmaceuticals |
GAN Bo-rui1, YANG Hui-hua1,2*, ZHANG Wei-dong1, FENG Yan-chun3, YIN Li-hui3, HU Chang-qin3 |
1. College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China
2. College of Automation, Beijing University of Posts & Telecommunications, Beijing 100876, China
3. National Institutes for Food and Drug Control, Beijing 100050, China |
|
|
Abstract As near-infrared spectroscopy has many advantages, such as fast analysis, non-destructive testing and field detection, it has been widely used in many fields. However, there are some shortcomings such as low signal-to-noise ratio, weak absorption intensity and overlapping peaks in near-infrared spectroscopy. NIR spectroscopy can not be qualitatively/quantitatively obtained from the spectrum. Therefore, NIR spectroscopy can only be used as an indirect analytical technique. The research of infrared spectral modeling method becomes the core of analyzing near infrared spectroscopy. Deep learning is a new branch of machine learning and has been successfully applied in many fields. The network structure of deep learning and the non-linear activation ability make the model especially suitable for high-dimensional and nonlinear large-scale data modeling. In order to further enrich the NIRS modeling method and improve the accuracy of NIRS, it is necessary to develop a new modeling method using NIRS. The qualitative analysis of near-infrared spectroscopy is studied in this paper. A model based on Stacked Contractive Auto-Encoders(SCAE) is proposed to identify the same drugs produced by different manufacturers on the market. With contractive Auto-Encoder (CAE) based on Auto-Encoder network by adding Jacobi matrix as a constraint, self-coding network is used to reduce the dimension of the data to learn the internal characteristics of the data, and Jacobi matrix contains information in all directions. The extracted features can be invariant to a certain degree of perturbation of the input data and improve the ability of self-encoding network to extract features. SCAE is a multi-layer CAE neural network. As the input layer of the latter layer of CAE network, all the parameters of the network are obtained by adopting the layer-by-layer greedy training method. After the training, all the networks are regarded as a whole, Fine-tuning by backpropagation algorithm, and finally using Logistic/Softmax classifier for qualitative analysis. The experimental data were collected by the National Institutes for Food and Drug Control, with Cefixime Capsules as the second classification experimental data and Isosorbide Dinitrate Tablets as a multi-classification experimental data. The spectral curves were obtained by measuring the absorbance of each sample at different wavelengths with a Bruker Matix spectrometer, and then the deviation from the spectral samples was obtained by OPUS software to eliminate the drift and other factors. Next, we established the model by experimentally determining the Lamda of the constrained Jacobi matrix ratio coefficient of 0.003. The modeling process was divided into five stages, namely: pre-treatment stage, pre-training stage, fine-tuning stage, testing stage and contrast stage. In order to verify the performance of SCAE network in terms of classification accuracy, algorithm stability and modeling time, the algorithm was compared with BP neural network, SVM algorithm, sparse Auto-Encoders (SAE), Denoising Auto-Encoders(DAE) for comparison. In terms of classification accuracy, stack compression self-coding network has the highest classification accuracy and algorithm stability at different ratios of training set to test set. In terms of modeling time, SVM algorithm has a great advantage over other algorithms in terms of running time because it does not need pre-training and feature extraction. However, stack compression self-coding network modeling speed is better than other contrast algorithms except SVM. In summary, the use of stack compression self-coding network for drug identification is effective and feasible.
|
Received: 2017-12-07
Accepted: 2018-04-11
|
|
Corresponding Authors:
YANG Hui-hua
E-mail: 13718680586@139.com
|
|
[1] LI Zhen, ZHOU Li-hong, YE Zheng-liang(李 真, 周立红, 叶正良). Drug Evaluation Research(药物评价研究), 2016, 39(4): 686.
[2] Fontalvo-GóMez M, Colucci J A, Velez N, et al. Applied Spectroscopy, 2013, 67(10): 1142.
[3] Deconinck E, Sacré P Y, Coomans D, et al. Journal of Pharmaceutical & Biomedical Analysis, 2012, 57(1): 68.
[4] Zou T T, Dou Y, Wang Y, et al. Science & Technology of Food Industry, 2013,17: 317.
[5] Lecun Y, Bengio Y, Hinton G. Nature, 2015, 521(7553): 436.
[6] Krizhevsky A, Sutskever I, Hinton G E. International Conference on Neural Information Processing Systems Curran Associates Inc.,2012. 1097.
[7] Cho K, Merrienboer B V, Gulcehre C, et al. Arxiv Preprint Arxiv, 2014, 1406: 1078.
[8] Sutskever Ilya, Vinyals O,Le Q V. Foundations & Trends<sup></sup> in Signal Processing, 2014, 7(3): 197.
[9] Deng L, Yu D. Foundations & Trends<sup></sup> in Signal Processing, 2014, 7(3): 197.
[10] Rifai S, Vincent P, Muller X, et al. Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 2011. 833. |
[1] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[2] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[3] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[4] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[5] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[6] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[7] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[8] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[9] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[10] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[11] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[12] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
[13] |
HUANG Hua1, LIU Ya2, KUERBANGULI·Dulikun1, ZENG Fan-lin1, MAYIRAN·Maimaiti1, AWAGULI·Maimaiti1, MAIDINUERHAN·Aizezi1, GUO Jun-xian3*. Ensemble Learning Model Incorporating Fractional Differential and
PIMP-RF Algorithm to Predict Soluble Solids Content of Apples
During Maturing Period[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3059-3066. |
[14] |
CHEN Jia-wei1, 2, ZHOU De-qiang1, 2*, CUI Chen-hao3, REN Zhi-jun1, ZUO Wen-juan1. Prediction Model of Farinograph Characteristics of Wheat Flour Based on Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3089-3097. |
[15] |
GUO Ge1, 3, 4, ZHANG Meng-ling3, 4, GONG Zhi-jie3, 4, ZHANG Shi-zhuang3, 4, WANG Xiao-yu2, 5, 6*, ZHOU Zhong-hua1*, YANG Yu2, 5, 6, XIE Guang-hui3, 4. Construction of Biomass Ash Content Model Based on Near-Infrared
Spectroscopy and Complex Sample Set Partitioning[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3143-3149. |
|
|
|
|