|
|
|
|
|
|
Drugs Identification Using Near-Infrared Spectroscopy Based on Random Forest and CatBoost |
JIANG Ping1, LU Hao-xiang2, LIU Zhen-bing2* |
1. School of Computer and Information Technology, Guangxi Police College, Nanning 530028, China
2. College of Computer and Information Security, Guilin University of Electronic Technology,Guilin 541004, China
|
|
|
Abstract Drug quality is related to people’s health and national lifeblood. The rapid development of the economy and society plays an extremely important role in the rapid and effective identification of drug quality. Spectral analysis technology has high accuracy, fast analysis speed and no pollution to samples, and is widely used in the chemical industry, petroleum, medicine and other important areas of people’s livelihood. In order to solve the problems of low accuracy, low identification speed and poor stability of the traditional drug identification model, the spectrometer was used to collect near-infrared spectroscopy data of drugs to achieve the purpose of pollution-free drugs. Then, random forest and CatBoost were combined to classify and identify drugs quickly and accurately. The proposed method firstly uses Random Forest (RF) to screen the effective characteristic wavelength of the spectrometer’s spectral data to eliminate the irrelevant wavelength in the drug spectral data and screen out the characteristic wavelength that can best characterize the sample properties. Then Extreme Learning Machine (ELM) was used as CatBoost weak classifier to analyze the feature wavelengths of the screening for drug attribute identification. Since ELM only contains one hidden layer and no iterative optimization is required to ensure the faster running of the identification model, CatBoost can improve the model’s identification accuracy by integrating a weak classifier. In order to effectively evaluate the performance of the drug identification model proposed in this paper, the spectral data of drugs of different sizes were constructed by randomly selected training sets, and experiments were carried out independently. The mean value of 10 running results was taken as the final result. In addition, Back Propagation with CatBoost, Support Vector Machine (SVM), BP, ELM, Summation Wavelet Extreme Learning Machine (SWELM) and Boosting were compared to evaluate the performance of the proposed model further. As can be seen from the classification results of training sets of different sizes, with the increase of training sets, the highest classification accuracy is 100%, and the prediction standard deviation tends to be 0. The experimental results show that the RF-CATBoost identification model proposed in this paper has higher classification accuracy, faster speed and stronger robustness than the comparison method on drug data sets of different sizes and can be widely used in the accurate identification of drug categories, to achieve effective supervision of drug quality.
|
Received: 2022-01-12
Accepted: 2022-03-28
|
|
Corresponding Authors:
LIU Zhen-bing
E-mail: zbliu@quet.edu.cn
|
|
[1] CHU Xiao-li, CHEN Pu, LI Jing-yan, et al(褚小立, 陈 瀑, 李敬岩, 等). J. Instr. Anal.(分析测试学报), 2020, 39(10): 1181.
[2] Pavlek L R, Mueller C, Jebbia M R, et al. Front. Pediatr., 2021, 8: 624113.
[3] WANG Li-qun, LI Yu-yu, JIN Rong-jiang, et al(王丽群, 李雨谿, 金荣疆, 等). J. Tissue. Eng.(中国组织工程研究), 2021, 25(11): 1799.
[4] FU Dan-dan, WANG Qiao-hua, GAO Sheng, et al(付丹丹, 王巧华, 高 升, 等). Chin. J. Anal. Chem.(分析化学), 2020, 48(2): 289.
[5] CHU Xiao-li, YUAN Hong-fu, LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Chin. J. Anal. Chem.(分析化学), 2002, 30(1): 114.
[6] Siddiqui M R, Alothman Z A, Rahman N. Arab. J. Chen., 2017, 44(1): 1409.
[7] Huang Y, Meng S, Zhao P, et al. Appl. Optics, 2019, 58(18): 5122.
[8] Nguyen K, Duong D Q, Almeida F T, et al. J. Dent. Res., 2020, 99(1): 1054.
[9] Morellos A, Pantazi X E, Moshou D, et al. Biosyst. Eng., 2016, 152: 104.
[10] Clua P G, Jo E, Nikolic S, et al. J. Pharmaceut. Biomed., 2020, 183(8): 113163.
[11] Zheng A, Yang H, Pan X, et al. Sensors, 2021, 21(4): 1088.
[12] ZHOU Ying, LIU Jia-ming, LI Xiu-yun(周 颖, 刘佳明, 李秀芸). China Pharm.(中国药师), 2020, 23(1): 172.
[13] Rodionova Y, Titova A V, Balyklo K S. Talanta, 2019, 205: 120150.
[14] Sampaio P S, Castanho A, Almeida A S, et al. Eur. Food Res. Technol., 2020, 246(3): 527.
[15] Kim S Y, Hong S J, Kim E, et al. Appl. Eng. Agric., 2021, 37(4): 653.
[16] Nasir R, Saleem M R, Nisar A, et al. Optik, 2021, 225(11): 165714.
[17] CHEN Wen-li, WANG Qi-bin, LU Hao-xiang, et al(陈文丽, 王其滨, 路皓翔, 等). J. Instr. Anal.(分析测试学报), 2020, 39(10): 1267.
[18] SHEN Dong-xu, HONG Ming-jian, DONG Jia-lin(沈东旭, 洪明坚, 董家林). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(11): 3457.
[19] Chen B, Wang Z B. Chemometr. Intell. Lab., 2019, 191: 103.
[20] Breiman L. Mach. Learn., 2001, 45(1): 5.
[21] Tang J, Fan B, Xiao L, et al. SPE Journal, 2020, 26(1): 482.
[22] Pinto P A, Dias A A, Fraga I, et al. Bioresour. Technol., 2012, 111: 261.
|
[1] |
YANG Cheng-en1, 2, LI Meng3, LU Qiu-yu2, WANG Jin-ling4, LI Yu-ting2*, SU Ling1*. Fast Prediction of Flavone and Polysaccharide Contents in
Aronia Melanocarpa by FTIR and ELM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 62-68. |
[2] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[3] |
ZHENG Pei-chao, YIN Yi-tong, WANG Jin-mei*, ZHOU Chun-yan, ZHANG Li, ZENG Jin-rui, LÜ Qiang. Study on the Method of Detecting Phosphate Ions in Water Based on
Ultraviolet Absorption Spectrum Combined With SPA-ELM Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 82-87. |
[4] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[5] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[6] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[7] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[8] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[9] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[10] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[11] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[12] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[13] |
YAN Xing-guang, LI Jing*, YAN Xiao-xiao, MA Tian-yue, SU Yi-ting, SHAO Jia-hao, ZHANG Rui. A Rapid Method for Stripe Chromatic Aberration Correction in
Landsat Images[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3483-3491. |
[14] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[15] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
|
|
|
|