COD Concentration Prediction Model Based on Multi-Spectral Data Fusion and GANs Algorithm
CHEN Ying1,XU Yang-mei1, DI Yuan-jian1,CUI Xing-ning1,ZHANG Jie1,ZHOU Xin-de1,XIAO Chun-yan2, LI Shao-hua3
1. Hebei Province Key Laboratory of Test/Measurement Technology and Instrument, School of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China
2. School of Resources and Environment, Henan University of Technology, Jiaozuo 454000, China
3. Hebei Sailhero Environmental Protection Hi-Tech Co., Ltd., Shijiazhuang 050000, China
Abstract:Excessive concentration of organic pollutants in water is harmful, which causes not only serious environmental pollution but also endangers human health. Chemical oxygen demand (COD) can be used to characterize the pollution degree of organic pollutants in water. A quantitative prediction model of COD concentration based on generative adversarial networks (GANs) algorithm is proposed, which combines ultraviolet (UV) and Near Infrared (NIR) spectra with data-level fusion (DLDF) and feature level data fusion (FLDF). In this paper, firstly, COD standard samples are prepared according to a certain concentration gradient, and the ultraviolet spectrum (190~310 nm) and near-infrared spectrum (830~2 100 nm) of the standard sample are collected respectively. The first derivative and Savitzky-Golay (S-G) smoothing pretreatment of the obtained ultraviolet and near-infrared spectrum data are carried out to eliminate the baseline drift of the spectrum and the interference noise. Then, the data fusion of data level and featural level are carried out directly basing on the pretreated ultraviolet and near-infrared spectra, and the COD concentration prediction model is constructed by GANs algorithm. The model is evaluated by using the square of the correlation coefficient of the evaluation parameters (R2), the mean square root error of the predicted value and the real concentration value (RMSEP) and the prediction deviation. The results show that neither FLDF model nor DLDF model is not ideal. The analysis shows that the model contribution of the ultraviolet spectrum is concealed in the near-infrared band due to the unbalanced data in the ultraviolet and near-infrared bands, which makes the spectral fusion meaningless. In order to avoid the problem of fusion failure, the normalizat-ion method is proposed to deal with the mixed spectrum in the text. The effects of standard normal variation (SNV), maximum and minimum normalization (MMN) and vector normalization (VN) on the modeling are discussed. Then the normalized ultraviolet and near-infrared spectral data are fused again under the given sub-interval number, the input X of GAN model is taken as the input X, and the real measured COD value is taken as the output Y. The prediction models of COD concentration are established after different normalization methods. The modeling results show that different normalization methods have a great influence on the hybrid spectral data fusion model, and the prediction accuracy of the data-level fusion model and the feature-level fusion model is significantly improved before it is normalized, among which the prediction model with the maximum and minimum normalization is the most obvious. Finally, in order to verify the accuracy of the multi-spectral data fusion GANs Prediction model, the GANs prediction model of the full wavelength ultraviolet band of a single spectral source and the GANs prediction model of the full wavelength near-infrared band of a single spectral source are established. The experimental results show that the correlation coefficient of the characteristic level spectral fusion model basing on the ultraviolet and near-infrared spectra is 0.994 7, the prediction mean square root error is 0.976, the prediction model error is reduced by 52.9% comparing with the data level fusion, and the predicted recovery rate is 98.4%~103.1%, which is much better than the other groups. The generalization ability of the model is strong and the prediction accuracy is high. Compared with the monitoring model of single spectral source, the data fusion of mixed spectra can reflect more the chemical information of water samples, and reveals the pollutant degree of a water body more comprehensively, reflects the difference of pollutants in a water body from different levels, provides some technical support for on-line monitoring of COD concentration in water.
陈 颖,许扬眉,邸远见,崔行宁,张 杰,周鑫德,肖春艳,李少华. 多光谱数据融合和GANs算法的COD浓度预测[J]. 光谱学与光谱分析, 2021, 41(01): 188-193.
CHEN Ying,XU Yang-mei, DI Yuan-jian,CUI Xing-ning,ZHANG Jie,ZHOU Xin-de,XIAO Chun-yan, LI Shao-hua. COD Concentration Prediction Model Based on Multi-Spectral Data Fusion and GANs Algorithm. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(01): 188-193.
[1] TANG Bin, ZHAO Jing-xiao, WEI Biao, et al(汤 斌, 赵敬晓, 魏 彪, 等). China Environmental Science(中国环境科学), 2015, 35(2): 478.
[2] Brito R S, Pinheiro H M, Ferreira F, et al. Applied Spectroscopy, 2016, 70(3): 443.
[3] Lepot M, Torres A, Hofer T, et al. Water Research, 2016, 101: 519.
[4] Abedinzadeh N, Shariat M, Monavari S M, et al. Process Safety and Environmental Protection, 2018,18(1): 82.
[5] María J,Martelo Vidal, Manuel Vázquez. CyTA—Journal of Food, 2015, 13(1): 32.
[6] ZHAO Quan-you, LI Xia, LIU Xiao, et al(赵友全,李 霞,刘 潇,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2016,36(11):3592.
[7] ZHONG Yang, XIA Feng-yi, LIAN Ji-yao(仲 洋, 夏凤毅, 廉继尧). Chinese Journal of Environmental Engineering(环境工程学报), 2017, 11(2): 1300.
[8] YAO Sen, LI Tao, LIU Hong-gao, et al(姚 森, 李 涛, 刘鸿高, 等). Food Science(食品科学), 2018,39(8): 212.
[9] Rischbeck P, Cardellach P, Mistele B, et al. Journal of Agronomy & Crop Science, 2017, 203(6): 483.
[10] Barmeier G, Mistele B, Schmidhalter U. Crop and Pasture Science, 2016, 67(12): 1215.
[11] YANG Shuo, CHEN Li-fang, SHI Yu, et al(杨 朔, 陈丽芳, 石 瑀, 等). Journal of Computer Applications(计算机应用), 2018, 38(6): 1554.
[12] Zhu J Y, Zheng W S, Lai J H, et al. IEEE Transactions on Information Forensics & Security, 2017, 9(3): 501.