Abstract:Quickly and effectively identifying the water contaminants is vital for reducing the impact of sudden drinking water pollution incidents. PCA is mostly used to extract the feature of different contaminants in drinking water with UV-Vis spectra. However, for the organic contaminants with high similarity in UV-Vis spectra, the identification result is ineffective when only extracting the feature of the largest variance direction from the data-driven point of view. This paper studies the classification of organic contaminants in water distribution systems developed by SPA and multi-classification SVM using UV-Vis spectroscopy. Firstly, the original spectral data of phenol, hydroquinone, resorcinol and m-phenylenediamine are measured by UV spectrometer and pretreated. The correlation between wavelength and concentration of four contaminants was compared. The peaks between phenol and resorcinol, hydroquinone and m-phenylenediamine are overlapped seriously, the classification results can interfere easily. In feature extraction, the SPA is introduced to select the organic contaminants’ characteristic wavelengths of UV-Vis spectra. Then, multiple linear regression analysis is carried out to choose the optimal parameter combination, which corresponds to the minimum prediction standard deviation. Based on this, the multi-classification support vector machine is used to form an identification model for drinking water organic contaminants. Finally, the classification results of spectral data based on full spectrum, PCA and SPA under different classification methods and different concentrations are compared and analyzed, and the applicability and stability of SPA are further explained. Experimental results demonstrate that SPA-based feature extraction method eliminates the interference of multi-collinearity and amplifies the difference among the UV-Vis spectra of different organic contaminants, thereby improving the accuracy of the classification model. This method has certain reference value for solving the problem of identifying the types of pollutants with overlapped peaks in the drinking water.
Key words:UV-Vis spectroscopy; Identification of organic contaminants; Successive projections algorithm; Multi-classification support vector machine
黄平捷,李宇涵,俞巧君,王 柯,尹 航,侯迪波,张光新. 基于SPA和多分类SVM的紫外-可见光光谱饮用水有机污染物判别方法研究[J]. 光谱学与光谱分析, 2020, 40(07): 2267-2272.
HUANG Ping-jie, LI Yu-han, YU Qiao-jun, WANG Ke, YIN Hang, HOU Di-bo, ZHANG Guang-xin. Classification of Organic Contaminants in Water Distribution Systems Developed by SPA and Multi-Classification SVM Using UV-Vis Spectroscopy. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40(07): 2267-2272.
[1] Miao X, Tang Y, Wong C W Y, et al. Environmental Pollution, 2015, 196: 473.
[2] Assaad A, Pontvianne S, Marie-Noëlle Pons. Environmental Monitoring and Assessment, 2017, 189(5): 229.
[3] Zhang H, Tian J, Huang J, et al. RSC Adv.,2016, 6: 110356.
[4] Huang P, Wang K, Hou D, et al. Applied Optics, 2017, 56(22): 6317.
[5] Yu Q,Yin H,Wang K, et al. Water, 2018, 10(11): 1566.
[6] ZHAO Ming-fu, TANG Ping, TANG Bin, et al(赵明富, 唐 平, 汤 斌, 等). Journal of Atmospheric and Environmental Optics(大气与环境光学学报), 2018, 13(6): 436.
[7] Hou D, Zhang J, Yang Z, et al. Optics Express, 2015, 23(13): 17487.
[8] GUO Bing-bing, HOU Di-bo, JIN Yu, et al(郭冰冰, 侯迪波, 金 宇,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2017,37(5):1460.
[9] Zhu H, Chu B, Zhang C, et al. Scientific Reports, 2017, 7(1): 4125.
[10] Gomes A A, Schenone A V, Goicoechea H C, et al. Analytical & Bioanalytical Chemistry, 2015, 407(19): 5649.
[11] Terrile, Amélia E, Marcheafave, et al. Journal of the Brazilian Chemical Society, 2016, 27(7): 1254.
[12] Vapnik V. The Nature of Statistical Learning Theory. Springer, 1995.
[13] Huang P, Jin Y, Hou D, et al. Sensors, 2017, 17(3): 581.
[14] Daengduang S, Vateekul P . IEEE 2017 9th International Conference on Knowledge and Smart Technology (KST)-Chonburi, Thailand (2017.2.1—2017.2.4). 2017 9th International Conference on Knowledge and Smart Technology (KST)-Applying One-Versus-One SVMs to Classify Multi-Label Data with Large Labels Using Spark,2017. 72.
[15] CHU Xiao-li(褚小立). Molecular Spectroscopy Analytical Technology Combined with Chemometrics and Its Applications(化学计量学方法与分子光谱分析技术). Beijing: Chemical Industry Press(北京:化学工业出版社), 2011.
[16] Ting K M. Confusion Matrix. Encyclopedia of Machine Learning and Data Mining. Springer US, 2017.