The “Cluster-Regression” COD Prediction Model of Distributed Rural Sewage Based on Three-Dimensional Fluorescence Spectrum and
Ultraviolet-Visible Absorption Spectrum
ZHOU Ming-rui1, 2, QU Jiang-bei2, LI Peng1, 2*, HE Yi-liang1, 2
1. China-UK Low Carbon College, Shanghai Jiaotong University, Shanghai 201306, China
2. School of Environmental Science and Engineering, Shanghai Jiaotong University, Shanghai 200240, China
Abstract:Based on the relationship between the three-dimensional fluorescence spectrum and the characteristic fluorescence peaks of organic matter, this study proposed to use the three-dimensional fluorescence spectrum for clustering and then for different kinds of water samples, using UV-Vis full-band absorption spectrum data to establish the COD prediction model technical route. The parallel factor analysis (PARAFAC) algorithm and fluorescence volume integration (FRI) algorithm were compared and analyzed, and then the fuzzy c-means(FCM) algorithm was used for clustering, and the COD prediction model of different water samples was established. The water samples in this study were collected from the rural areas around Changshu City, Jiangsu Province, and 100 experimental water samples were collected from the effluent of different distributed rural domestic sewage treatment plants. The measured three-dimensional fluorescence spectrum of water samples was pretreated by de-scattering, and then the fluorescence characteristic data were extracted by the PARAFAC algorithm and FRI algorithm, respectively. Then, the FCM clustering algorithm was used for similarity clustering. Finally, the partial least squares (PLS) algorithm was used to establish the regression and prediction model between the UV-Vis full-band absorption spectrum and COD of water samples, and the prediction accuracy was evaluated by the coefficient of determination and the root mean square error(RMSE). The results showed that the prediction models’ mean determination coefficients(R2) were 0.632, 0.819 and 0.906, respectively, after the fluorescence feature information was extracted using FRI and PARAFAC algorithms. RMSE were 27.857, 23.621 and 13.071, respectively. The regression and prediction accuracy was significantly improved after clustering, and the modeling established after the extraction of fluorescence feature information using the PARAFAC algorithm had the highest prediction accuracy, which was 0.274 higher than theR2 of the unclassified prediction model. The proposed COD prediction model based on a three-dimensional fluorescence spectrum combined with UV-Vis full-band absorption spectrum and using the combined algorithm of “PARAFAC-FCM-PLS” can effectively improve the prediction accuracy of COD and provide a new idea for high precision online monitoring of water quality.
Key words:Full spectral; Chemical oxygen demand; Parallel factor analysis; Fuzzy c-means classification; Partial least squares
周铭睿,曲江北,李 彭,何义亮. 分散式农村污水基于三维荧光光谱和紫外-可见全波段吸收光谱的“聚类-回归”COD预测模型[J]. 光谱学与光谱分析, 2022, 42(07): 2113-2119.
ZHOU Ming-rui, QU Jiang-bei, LI Peng, HE Yi-liang. The “Cluster-Regression” COD Prediction Model of Distributed Rural Sewage Based on Three-Dimensional Fluorescence Spectrum and
Ultraviolet-Visible Absorption Spectrum. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(07): 2113-2119.