Data Augmentation of Raman Spectral and Its Application Research Based on DCGAN
LI Ling-qiao1,2, LI Yan-hui2, YIN Lin-lin4, YANG Hui-hua1,2*, FENG Yan-chun3, YIN Li-hui3, HU Chang-qin3
1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876,China
2. Man-Machine Intelligence Laboratory, Guilin University of Electronic Technology, Guilin 541004, China
3. National Institutes for Food and Drug Control, Beijing 100050,China
4. School of Environment, Beijing Normal University, Beijing 100875,China
Abstract The detection method of Raman spectroscopy relies on the chemometrics algorithms, and deep learning is the most popular are at present, which can be applied to the modeling of Raman spectroscopy. However, deep learning requires large samples for training, while Raman spectral collection is limited by equipment and labor cost. Obtaining large quantities of samples requires a higher cost, and also is suffered by fluorescence and other factors, which all restrict the application of deep learning to Raman spectral. In view of the above problems, the paper introduces the deep convolution generation counter network (DCGAN) to extract the characteristics of Raman peaks in the Raman spectrum, and generates a new Raman spectrum to expand the data set. At the same time, the reliability of DCGAN was proved by comparing with the slope-bias adjusting method, another method to expand the data set. In this paper, spectral selection criteria are designed and generated to fill the dataset with highly similar spectra, which is the first step for the application of deep learning in Raman spectra. In order to demonstrate that the generated spectrum has good comformality with the original spectrum, the paper sets up four groups of experiments for comparison: (1) the original Raman spectrum is input to SVM for classification, and the classification accuracy is 51.92%, (2) the original Raman spectrum was input to CNN for classification, and 75.00% classification accuracy was obtained, (3) the slope-bias adjusting method was used to generate the spectrum, which was input into CNN for classification, and the classification accuracy of 91.85% was obtained, (4) DCGAN was used to generate the spectrum, which was input into CNN for classification, and the classification accuracy was 98.52%. The comparison of the four groups of results proves the superiority of the Raman spectrum generated by DCGAN. The experimental results show that DCGAN can generated much alike spectrum through antagonism learning with only a small amount of Raman spectrum, and the generated spectrum is clearer than the original spectrum, reducing some interference factors, and has a preprocessing effect on the spectrum. Taking the advantage of DCGAN, a large number of high-quality data can be generated and filled into the original Raman spectral data set, and the sample size of the data set can be expanded, so that the deep learning model could be better trained, thus improving the accuracy of the classification or other model. This paper proposes a feasible scheme for applying deep learning method to Raman spectroscopy.
Key words:Raman spectrum; Data augmentation; Spectral classification; Deep convolutional generative adversarial networks
[1] XU Lin-nan, LIN Hong, NIU Bing, et al(许林楠, 林 泓, 钮 冰, 等). Journal of Instrumental Analysis(分析测试学报),2019,(11): 1400.
[2] LI Jia-jia, LIU Jing-li, JIN Ru-yi, et al(李佳佳, 刘靖丽, 靳如意, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2019, 39(8): 2403.
[3] Lee Yonghoon, Han Song-Hee, Nam Sang-Ho. Applied Spectroscopy, 2017, 71(9): 2199.
[4] Sesmero M P, Alonso-Weber J M, Sanchis A. Information Fusion, 2020, 58: 132.
[5] Anzanello M J, Ortiz R S, Limberger R. Forensic Science International, 2014, 235(2): 1.
[6] ZHANG Wei-dong, LI Ling-qiao, HU Jin-quan, et al(张卫东,李灵巧,胡锦泉,等). Chinese Journal of Analytical Chemistry(分析化学), 2018, 46(9): 1446.
[7] Pan X, Li L, Yang H, et al. Neurocomputing, 2017, 229: 88.
[8] Zhang Weidong, Dong Lili, Pan Xipeng, et al. IEEE Access, 2019,7(1): 72492.
[9] Pan Xipeng, Yang Dengxian, Li Lingqiao, et al. World Wide Web-Internet and Web Information Systems, 2018, 21(6): 1721.
[10] Radford A, Metz L, Chintala S. Computer Science, 2015, 47(8): 169.
[11] Polyakov A E, Ivanov M S. Fibre Chemistry, 2018, 49(6): 405.
[12] Theagarajan R, Bhanu B. PLOS One,2019,14(3): e0212849.
WANG Xin-qiang1, 3, CHU Pei-zhu1, 3, XIONG Wei2, 4, YE Song1, 3, GAN Yong-ying1, 3, ZHANG Wen-tao1, 3, LI Shu1, 3, WANG Fang-yuan1, 3*. Study on Monomer Simulation of Cellulose Raman Spectrum[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 164-168.