A Comparative Study of the COD Hyperspectral Inversion Models in
Water Based on the Maching Learning
WANG Chun-ling1, 2, SHI Kai-yuan1, 2, MING Xing3*, CONG Mao-qin3, LIU Xin-yue3, GUO Wen-ji3
1. School of Information Science and Technology,Beijing Forestry University, Beijing 100083,China
2. Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration,Beijing 100083,China
3. Nanjing Institute of Software Technology, Institute of Software Chinese Academy of Sciences, Nanjing 210049, China
Abstract:Chemical oxygen demand (COD) is an important indicator of organic pollution in water. How to quickly and accurately test the COD content of water is particularly important. The application of machine learning in the field of water quality inversion is increasing, and more research results have been obtained. Hyperspectral remote sensing has the advantages of high spectral-spatial resolution and multiple imaging channels, so it has great potential in retrieving water’s COD. This study uses different hyperspectral pre-processing methods to process the original hyperspectral data. It uses the hyperspectral data before and after processing to compare the inversion performance of different machine learning models and different hyperspectral pre-processing methods on the COD content of water. Firstly, 1 548 groups of COD content and corresponding hyperspectral data (400~1 000 nm) samples were collected by ZK-UVIR-I in-situ spectral water quality on-line monitor in Baodai River. In order to reduce the interference of spectral noise and eliminate the influence of spectral scattering, Savitzky-Golay (SG) smoothing, Multiplicative scatter correction (MSC) and SG smoothing combined with MSC methods were used to pre-process the original spectra. Secondly, the sample set is randomly divided into training set and test set, where the training set accounts for 80% and the test set accounts for 20%. A COD hyperspectral inversion model based on the four machine learning methods of linear regression, random forest (random forest), AdaBoost, and XGBoost was established for the pre-processed training set full-band spectrum. Moreover, three indexes of determination coefficient (R2), root mean square error (RMSE) and relative analysis error (RPD) were selected to evaluate the accuracy of the hyperspectral inversion model. The results show that random forest, AdaBoost and XGboost are all the better than linear regression. The prediction ability of the inversion model established by XGboost is the best whether the spectral data is processed or not, with R2 of 0.92, RMSE of 7.1 mg·L-1, and RPD of 3.4. Considering that the original spectrum may be redundant, the dimensionality reduction of the spectrum after SG smoothing and MSC processing is performed by principal component analysis (PCA), and the top ten principal components with a cumulative contribution rate of 95% are selected as the input variables of the model. XGBoost established the inversion model, and the results show that after PCA, the accuracy of the inversion model is improved, the RPD is 3.8, and the training time of the model is shortened from 72 seconds to 2.9 seconds. The above research can provide new methods and ideas for establishing hyperspectral inversion models of this water area and similar water areas.
Key words:Chemical oxgen demand(COD);Machine learning;Hyperspectral;Iversion model comparison
王春玲,史锴源,明 星,丛茂勤,刘昕悦,郭文记. 基于机器学习的水体化学需氧量高光谱反演模型对比研究[J]. 光谱学与光谱分析, 2022, 42(08): 2353-2358.
WANG Chun-ling, SHI Kai-yuan, MING Xing, CONG Mao-qin, LIU Xin-yue, GUO Wen-ji. A Comparative Study of the COD Hyperspectral Inversion Models in
Water Based on the Maching Learning. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(08): 2353-2358.
[1] Chander S, Gujrati A, Hakeem K A, et al. Current Science, 2019, 116(7):1172.
[2] Gidudu A, Mugo R, Letaru L, et al. African Journal of Aquatic Science, 2018, 43(2):141.
[3] Usali N, Ismail M H. Journal of Sustainable Development, 2010, 3(3):228.
[4] LI Xin-xing, GUO Wei, BAI Xue-bing, et al(鑫 星, 郭 渭, 白雪冰, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2021, 41(5):1343.
[5] LIU Li-xin, HE Di, LI Meng-zhu, et al(刘立新, 何 迪, 李梦珠, 等). Chinese Journal of Lasers(中国激光), 2020, 47(11):291.
[6] Hu Z T, Zhou Y. Geospatial Information, 2020, 18(7):4.
[7] Liu J X, Zhai W L, Li J F, et al. Contribu-Tions to Geology and Mineral Resources Research, 2020, 35(04):487.
[8] PAN De-lu, MA Rong-hua(潘德炉, 马荣华). Lake Science(湖泊科学), 2008,(2):139.
[9] Cao Y, Ye Y T, Zhao H L. China Environmental Science, 2017, 37(10):3940.
[10] Blix K, Eltoft T. Remote Sensing, 2018, 10(05):775.
[11] Hafeez S, Wong M S, Ho H C,et al. Remote Sensing, 2019, 11(6):617.
[12] Lu H, Ma X. Chemosphere, 2020, 249:126169.
[13] LI Yuan-bo, CAO Han(李远博, 曹 菡). Computer Technology and Development(计算机技术与发展), 2016, 26(2):26.
[14] Biau G, Scornet E. Test, 2016, 25(2):197.
[15] Schapire R E. Explaining Ababoost Empirical Inference, Springer, Berlin, Heidelberg, 2013:37.