生成式对抗网络的土壤有机质高光谱估测模型

doi:10.3964/j.issn.1000-0593(2021)06-1905-07

摘要
参考文献
相关文章 (15)

全文: PDF (1251 KB)
输出: BibTeX | EndNote (RIS)

摘要：已有的土壤有机质含量估测模型大多以光谱特征波段、线性和非线性模型为基础，较少考虑通过拓展样本数据建模集来提高模型的估测能力。为进一步提高土壤有机质高光谱反演模型估测精度，提出利用生成式对抗网络(GAN)合成伪高光谱数据和有机质含量的动态估测模型。选取湖南省长沙市及周边区域的水稻田为研究对象，采集土样和实测高光谱数据(350~2 500 nm)，室内化学测定有机质含量。以高光谱数据和有机质含量为基础，利用生成式对抗网络生成等量新数据, 结合原始数据建模集组成增强建模集。在GAN正式训练中，每轮训练完成后，设置4个观测点(对应增强建模集中含50，100，150和239个生成样本)，动态构建交叉验证岭回归(RCV)、偏最小二乘回归(PLSR)和BP神经网络(BPNN)土壤有机质含量估测模型(分别简称GAN-RCV，GAN-PLSR和GAN-BPNN)，并在相同测试集上实施模型评估。实验结果表明：(1)原始数据建模集上拟合的估测模型中，交叉验证岭回归表现最佳，决定系数(R²)和均方根误差(RMSE)分别为0.831 1和0.189 6；(2)GAN的150轮正式训练中，增强建模集上动态构建的GAN-RCV，GAN-PLSR和GAN-BPNN模型性能显著提高，具体表现为：GAN-RCV的R²取得最大值0.890 9(RMSE 0.153 7)、最小值0.850 5 (RMSE 0.18)与平均值0.868 7(RMSE 0.168 6)，最大R²比建模集上拟合的RCV提高了7.2%(RMSE降低了18.9%)，GAN-PLSR获得R²最大值0.855 4(RMSE 0.176 9)、最小值0.727 0 (RMSE 0.243 2)与平均值0.780 1 (RMSE 0.217 7)，最大R²比建模集上拟合的PLSR提高了20.6%(RMSE降低了29.5%)，GAN-BPNN表现最佳，R²取得最大值0.905 2(RMSE 0.143 3)、最小值0.801 7(RMSE 0.207 3)与平均值0.868 1(RMSE 0.168 6)，最大R²比建模集上拟合的BPNN提高了30.8%(RMSE降低了44.5%)；(3)随着增强建模集中生成样本数量增加，模型精度提升效果呈先升后降趋势，4个观测点中第3个观测点的模型性能提升最显著。充分的实验表明：基于GAN动态构建的有机质含量估测模型显著改善了模型预测性能。依据测试集上的评估结果，可择优使用最佳模型进行后续土壤有机质含量估测。

关键词：有机质；高光谱；生成式对抗网络；交叉验证岭回归；BP神经网络

Abstract：In the previous study of the estimation model of soil organic matter content, most models were based on the feature bands, linear and non-linear empirical models rarely explored the ability promotion using an extended modeling dataset. To further improve the performance of the estimation model, it proposed a dynamic estimation model of soil organic matter content using generative adversarial networks (GAN) to generate the pseudo hyperspectral and organic matter content. Paddy soil samples and hyperspectral data (350~2 500 nm) were collected from Changsha and its surrounding areas of Hunan Province, and the organic matter content was monitored chemically. Based on these data, equivalent new samples were generated by GAN and combined with the modeling set to form anenhanced modeling set. After completing each epochformal training of GAN, the prediction models of soil organic matter content were dynamically constructed using cross-validation ridge regression (RCV), partial least squares regression(PLSR) and BP neural network (BPNN) on four observation points (corresponding 50, 100, 150 and 239 generated samples in enhanced modeling set) (the abbreviation of models were GAN-RCV, GAN-PLSR and GAN-BPNN). The experimental results showed that: (1) Among the estimation models fitted on modeling set of the origin data, RCV was the best-performing model, whose determination coefficient (R²) and root square error (RMSE) were 0.831 1 and 0.189 6; (2) In the 150 epochs formal training of GAN, the performance of GAN-RCV, GAN-PLSR and GAN-BPNN dynamically constructed on the enhanced modeling set were significantly improved, specific performances: R² of GAN-RCV obtained the maximum 0.890 9 (RMSE 0.153 7), minimum 0.850 5 (RMSE 0.18) and mean 0.868 7 (RMSE 0.168 6), the maximum R² increased by 7.2% (RMSE decreased by 18.9%) compared with RCV fitted on the modeling dataset, R² of GAN-PLSR had the maximum 0.855 4 (RMSE 0.176 9), minimum 0.727 0 (RMSE 0.243 2) and mean 0.780 1 (RMSE 0.217 7), the maximum R² increased by 20.6% (RMSE decreased by 29.5%) than PLSR constructed on the modeling dataset, GAN-BPNN performed best, whose R² had the maximum 0.905 2(RMSE 0.143 3), minimum 0.801 7(RMSE 0.207 3) and mean 0.868 1(RMSE 0.168 6), the maximum R² increased by 30.8%(RMSE decreased by 44.5%) comparing BPNN fitted on the modeling set; (3) With the increase of the number of generated samples in the enhanced modeling dataset, the improvement effect of model accuracy showed a trend of increasing first and then decreasing, and among the four observation points, the model constructed on the third had the most significant performance improvement. Sufficient experiments showed that the dynamic estimation model based on GAN improved the performance effectively. According to the evaluation results on the test set, the optimum model could be used to predict the soil organic matter content in the follow-up application.

Key words：Soil organic matter; Hyperspectral data; Generative adversarial networks; Ridge cross validation; BP neural network

收稿日期: 2020-06-11 修订日期: 2020-10-03

中图分类号:

O657.3

基金资助: 国家科技基础性工作专项(2014FY110200)，国家自然科学基金项目(61973111)，湖南省教育厅重点项目(19A242)资助

通讯作者: 谢红霞 E-mail: xiehongxia136@sina.com

作者简介: 何少芳，1980年生，湖南农业大学信息与智能科学技术学院教师 e-mail: wxdyzp@sina.com

引用本文:

何少芳，沈陆明，谢红霞. 生成式对抗网络的土壤有机质高光谱估测模型[J]. 光谱学与光谱分析, 2021, 41(06): 1905-1911.
HE Shao-fang, SHEN Lu-ming, XIE Hong-xia. Hyperspectral Estimation Model of Soil Organic Matter Content Using Generative Adversarial Networks. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(06): 1905-1911.

链接本文:

https://www.gpxygpfx.com/CN/10.3964/j.issn.1000-0593(2021)06-1905-07 或 https://www.gpxygpfx.com/CN/Y2021/V41/I06/1905

[1] XU Xi-bo, LÜ Jian-shu, WU Quan-yuan, et al(徐夕博，吕建树，吴泉源，等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2018，38(8): 2556.
[2] LI Yang, LIU Xin-lu, PENG Jie, et al(李阳，刘新路，彭杰，等). Chinese Journal of Soil Science(土壤通报)，2018，49(4): 767.
[3] BAO Qing-ling, DING Jian-li, WANG Jing-zhe, et al(包青岭，丁建丽，王敬哲，等). Arid Land Geography(干旱区地理), 2019, 42(6): 1404.
[4] ZHANG Zhi-tao, LAO Cong-cong, WANG Hai-feng, et al (张智韬, 劳聪聪, 王海峰, 等). Transactions of the Chinese Society for Agricultural Machinery(农业机械学报)，2020，51(1): 156.
[5] GUO Jia-xin, ZHAO Xiao-min, GUO Xi, et al(国佳欣，赵小敏，郭熙，等). Acta Pedologica Sinica(土壤学报)，2020, 57(3): 636.
[6] SONG Ke-hui, ZHANG Ying, ZHANG Jiang-wei, et al(宋珂慧，张莹，张江伟，等). Journal of Computer Research and Development(计算机研究与发展)，2019，56(9): 1832.
[7] Goodfellow I, Pouget-abadie J, Mirza M, et al. Generative Adversarial Nets. In: Advances in Neural Information Processing Systems, Springer, Berlin, 2014: 2672.
[8] Radford A, Metz L, Chintala S. Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks, 2016, CORR abs/1511.06434.
[9] WANG Wan-liang, LI Zhuo-rong(王万良，李卓蓉). Journal on Communications(通信学报)，2018，39(2): 135.
[10] YE Chen, GUAN Wei(叶晨，关玮). Journal of Tongji University·Natural Science(同济大学学报·自然科学版)，2020，48(4): 591.
[11] WANG Kun-feng, GOU Chao, DUAN Yan-jie, et al(王坤峰, 苟超, 段艳杰, 等). Acta Automatica Sinica(自动化学报), 2017, 43(3): 321.
[12] WU Yi-quan, ZHOU Yang, SHENG Dong-hui, et al(吴一全，周杨，盛东慧，等). J. Infrared Millim. Waves(红外与毫米波学报), 2018, 37(1): 119.
[13] ZHU Ya-xing, YU Lei, HONG Yong-sheng, et al(朱亚星, 于雷, 洪永胜, 等). Scientia Agricultura Sinica(中国农业科学), 2017, 50(22): 4325.
[14] Salimans T, Goodfellow I, Zaremba W, et al. Improved Techniques for Training GANs, International Conference on Neural Information Processing Systems, NIPS’16, 2016, 2234.
[15] GE Xiang-yu, DING Jian-li, WANG Jing-zhe, et al(葛翔宇, 丁建丽, 王敬哲, 等). Acta Optica Sinica(光学学报)，2018，38(10): 1030001.