Abstract:Raman spectroscopy technology plays an important role in modern analytical chemistry and physical chemistry, particularly in providing important information about the structure and properties of substances. In recent years, the development of deep learning and artificial intelligence technologies has provided new directions for Raman spectroscopy analysis, excelling in spectral classification and identification. However, the performance of deep learning models is highly dependent on the scale and quality of the data. The acquisition of Raman spectroscopy data is time-consuming and of a single type, which cannot meet the model's training needs. Therefore, data enhancement techniques should be employed to augment the training data. Due to the problems above, a Raman spectrum sample data enhancement method based on the Voigt function is proposed. By fitting the spectral peaks in the spectrum with a Voigt function, each fitting peak is randomly shifted left and right, and the amplitude is changed within the specified range. Finally, all the fitting peaks are linearly superimposed to achieve the purpose of enhancing the data set. In this paper, this method is compared with three commonly used Raman spectroscopy data enhancement methods (noise addition method, offset method, and left-right translation method) under a set of Raman spectroscopy data with small sample sizes and multiple categories. Two evaluation indexes evaluate the quality of the enhanced spectra generated by different methods: structural similarity (SSIM) and PCA total interpretation variance ratio, and the models were trained and tested using three classification models: k-nearest neighbors (KNN), support vector machine (SVM), and one-dimensional convolutional neural network (1D-CNN) to evaluate the classification performance and generalization ability of the models. The results show that the Raman-enhanced spectrum obtained by the Voigt function method performs well in both evaluation indices. In the classification model trained on the enhanced data set, the accuracy of the verification set and the original data set in the three classification models namely, the noise addition method, the offset method, and the Voigt function method is 100%. In contrast, the classification model of the left and right translation method performs poorly, with accuracies of 99.80% and 96.35% (KNN), 98.75% and 100% (SVM), and 94.89% and 98.54% (1D-CNN), respectively. In simulated generated abnormal data, models trained with data enhanced by common methods performed poorly for certain types of abnormal data, while models trained with data enhanced by the Voigt function method performed excellently in various types of abnormal data. In summary, the Raman spectroscopy sample data augmentation method based on the Voigt function can effectively increase the diversity of the augmented samples, and the trained models exhibit good generalization ability and robustness, making them suitable for scenarios that require high abnormal data processing capabilities and high generalization ability. This method has certain application value in the field of Raman spectroscopy analysis technology.
Key words:Raman spectroscopy;Data augmentation;Deep learning;One-dimensional convolutional neural network;Voigt function
卜子川,刘继红,任凯利,刘 驰,张家庚,严学文. 基于Voigt函数的拉曼光谱样本数据增强方法[J]. 光谱学与光谱分析, 2025, 45(09): 2502-2510.
BU Zi-chuan, LIU Ji-hong, REN Kai-li, LIU Chi, ZHANG Jia-geng, YAN Xue-wen. Raman Spectral Sample Data Enhancement Method Based on Voigt
Function. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(09): 2502-2510.
[1] Nicolson F, Kircher M F, Stone N, et al. Chem. Soc. Rev., 2021, 50(1): 556.
[2] Li Zheling, Deng Libo, Ian A Kinloch, et al. Progress in Materials Science, 2023, 135: 101089.
[3] Wu Long, Tang Xuemei, Wu Ting, et al. Food Research International, 2023, 169: 112944.
[4] Ong T T X, Blanch E W, Jones O A H. Science of The Total Environment, 2020, 720: 137601.
[5] Lussier F, Thibault V, Charron B, et al. TrAC Trends in Analytical Chemistry, 2020, 124: 115796.
[6] Qi Yaping, Hu Dan, Jiang Yucheng, et al. Adv. Optical Mater, 2023, 11: 2203104.
[7] ZHENG Yun, YANG Si-yu, WANG Tao, et al(郑 运,杨思雨,王 涛,等). Chinese Journal of Analytical Chemistry(分析化学), 2024, 52(9): 1266.
[8] ZHOU Can-ru, WANG Zhe-tao, YANG Si-wei, et al(周粲入,王哲涛,杨思危,等). Chinese Journal of Analytical Chemistry(分析化学), 2023, 51(8): 1232.
[9] Mumuni A, Mumuni F. Array, 2022, 16: 100258.
[10] Bhatt N, Bhatt N, Prajapati P, et al. Scientific Reports, 2024, 14: 22329.
[11] Su Jianjun, Yu Xuejiao, Wang Xiru, et al. Engineering Applications of Artificial Intelligence, 2024, 129: 107602.
[12] Zhang Xudan, Li Hongyi, Tian Xuecong, et al. Chemometrics and Intelligent Laboratory Systems, 2022, 231: 104681.
[13] Wohlers M, McGlone A, Frank E, et al. Chemometrics and Intelligent Laboratory Systems, 2023, 240: 104924.
[14] Wang Yue, Fan Xiaqiong, Tian Shuai, et al. Chemometrics and Intelligent Laboratory Systems, 2022, 231: 104657.
[15] Chakravartula S S N, Moscetti R, Bedini G, et al. Food Control, 2022, 135: 108816.
[16] TAN Ai-ling, CHU Zhen-yuan, WANG Xiao-si, et al(谈爱玲,楚振原,王晓斯,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2022, 42(3): 769.
[17] Chung J, Zhang J, Saimon A I, et al. Scientific Reports, 2024, 14: 13230.
[18] Bjerrum E, Glahder M, Skov T. ArXiv, 2017, 1710. 01927.
[19] LIU Ming-hui, DONG Zuo-ren, XIN Guo-feng, et al(刘铭晖,董作人,辛国锋,等). Chinese Journal of Lasers(中国激光), 2017, 44(5): 0511003.
[20] LIU Cai-zheng, ZHU Qi-bing, HUANG Min, et al(刘财政,朱启兵,黄 敏,等). Laser and Optoelectronics Progress(激光与光电子学进展), 2019, 56(8): 083004.
[21] YUE Chao-xia, LIU Jia(岳朝霞,刘 甲). Advances in Applied Mathematics(应用数学进展), 2021, 10(12): 4535.
[22] Mujahid M, Kına E, Rustam F, et al. Journal of Big Data, 2024, 11(1): 87.