Wasserstein GAN for the Classification of Unbalanced THz Database
ZHU Rong-sheng1, 2, SHEN Tao1, 2*, LIU Ying-li1, 2, ZHU Yan1, 2, CUI Xiang-wei1, 2
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China
2. Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650504, China
Abstract:The terahertz spectrum of the matter is unique. At present, combined with advanced machine learning methods, research on terahertz spectrum recognition technology based on large-scale spectral databases has become the focus of terahertz application technology. It is difficult to collect multi-material equilibrium spectral data, which is the basis for classifying terahertz spectral data. This paper proposes an unbalanced terahertz spectrum recognition method based on WGAN (Wasserstein Generative Adversarial Networks). As a new method of generating data, WGAN uses the generated data under the condition that the model reaches the Nash equilibrium to supplement the data set, and is finally trained by a support vector machine (SVM). The experimental results prove that the generated data can effectively map the distribution of real data, and the accuracy of identifying unbalanced spectral data can be improved by mixing the generated data with the real data. In this paper, three types of maltose compounds with similar characteristics spectra are used for verification. We first use S-G filtering and cubic spline interpolation to normalize the spectral data of the three substances, and then expand the unbalanced terahertz spectral data of the three substances by constructing a WGAN model to bring it to class equilibrium. The experiments are verified under the same test set, and three sets of comparative experiments are used to prove the effectiveness of WGAN in the processing of uneven data sets. First we use WGAN to generate data. As the number of iterations increases, the generated data gradually conforms to the real data distribution. When the model reaches the Nash equilibrium, the generated data basically conforms to the original data distribution. The experimental results prove that training the SVM model using the extended WGAN data set can solve the problem that the model has a small sample data (Maltotriose, Malthexaose) biased toward a large sample data (Maltoheptaose) on the test set. After comparing WGAN with traditional methods for processing unbalanced data sets FWSVM and COPY, we find that the training set accuracy of the three classification algorithms on the dataset-1 dataset can reach more than 90%. However, due to the limitation of the generalization ability of the model, the effect of the traditional method on the test set is not very satisfactory, and the accuracy of the test set after using WGAN can reach 91.54%. In terms of different imbalances, the data sets with imbalances of 16, 81, and 256 were used for verification. The accuracy rates on the three test sets are 92.08%, 91.54%, and 90.27%, which can meet the requirements of dealing with different imbalances in actual work.
朱荣盛,沈 韬,刘英莉,朱 艳,崔向伟. 基于WGAN的不均衡太赫兹光谱识别[J]. 光谱学与光谱分析, 2021, 41(02): 425-429.
ZHU Rong-sheng, SHEN Tao, LIU Ying-li, ZHU Yan, CUI Xiang-wei. Wasserstein GAN for the Classification of Unbalanced THz Database. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(02): 425-429.
[1] Tonouchi M. Nature Photonics, 2007, 1(2): 97.
[2] Jepsen P U, Cooke D G, Koch M. Laser & Photonics Reviews, 2011, 5(1): 124.
[3] Liebermeister L, Nellen S, Kohlhaas R, et al. Journal of Infrared, Millimeter, and Terahertz Waves, 2019, 40(3): 288.
[4] Li Y, Xu L, Zhou Q, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2019, 214: 246.
[5] Strachan C J, Taday P F, Newnham D A, et al. Journal of Pharmaceutical Sciences, 2005, 94(4): 837.
[6] Nishimura F, Hoshina H, Ozaki Y, et al. Polymer Journal, 2019, 51(2): 237.
[7] Fischer B M, Helm H, Jepsen P U. Proceedings of the IEEE, 2007, 95(8): 1592.
[8] Liu P, Zhang X, Pan B, et al. International Journal of Environmental Research, 2019, 13(1): 143.
[9] Mittleman D M. Optics Express, 2018, 26(8): 9417.
[10] Yang X, Pi Y, Liu T, et al. IEEE Sensors Journal, 2018, 18(3): 1063.
[11] He H, Garcia E A. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263.
[12] LIU Jin-jun(刘进军). Computer Applications and Software(计算机应用与软件), 2014, 31(1): 186.
[13] Tao X, Li Q, Ren C, et al. Expert Systems with Applications, 2019, 129: 118.
[14] Arjovsky M, Chintala S, Bottou L. arXiv Preprint arXiv, 2017, 1701: 07875.
[15] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Advances in Neural Information Processing Systems,2014, 27: 2672.