Wavelength Variable Selection Method in Near Infrared Spectroscopy Based on Discrete Firefly Algorithm
LIU Ze-meng1, ZHANG Rui2, ZHANG Guang-ming1*, CHEN Ke-quan2*
1. College of Electrical Engineering and Control Science, Nanjing Tech University,Nanjing 211816, China 2. College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University,Nanjing 211816, China
摘要: 近红外光谱数据量大,需要进行压缩,以降低建立光谱校正模型的计算复杂度,提高模型精度和稳健性。为此,提出了一种基于离散萤火虫算法(discrete firefly algorithm)的近红外光谱波长变量筛选方法。首先采用蒙特卡罗方法剔除异常值,并应用Kennard-Stone法进行校正样本的选择。对通用萤火虫算法进行离散化处理,改进了吸引度的自适应公式,在移动公式中增加了牵引权重,以适应离散化处理的影响和优化算法,并在离散萤火虫算法中加入精英保留策略,加快算法的收敛速度。实验中找到DFA算法中的各项参数中的最佳值。通过离散萤火虫算法优选波长变量,建立发酵液中丁二酸含量的近红外光谱偏最小二乘回归(partial least squares regression)校正模型。与标准遗传算法(genetic algorithm)优选波长方法进行了比较。结果显示,基于离散萤火虫算法的波长优选方法所建立的PLS校正模型,其校正集的相关系数(R2c)为0.986,RMSEC为0.409,预测集的相关系数(R2p)为0.969,RMSEP为0.458,模型稳健性和精度都要优于全光谱建模以及遗传算法波长优选方法。显示了DFA在近红外光谱数据筛选方面的优越性。
关键词:离散萤火虫算法;近红外光谱;波长选择;丁二酸发酵
Abstract:Taking into consideration of the large size of near-infrared spectral data, the spectral data has to be compressed to reduce the computational complexity of the established spectral calibration model and improve accuracy and robustness of the model. Near Infrared Spectroscopy wavelength variable selection method based on discrete firefly algorithm is presented. First, the Monte Carlo method was used to exclude outliers, and Kennard-Stone method was chosen for the selection of calibration set and prediction set. General firefly algorithm was discretized, by improving the attractiveness of adaptive formula, increasing traction weights in mobile formula and so on. In order to adapt to the effects of discretization and optimize algorithm, elitist strategy was added in the discrete firefly algorithm, to acceleratethe convergence rate. The optimum value of the DFA algorithm parameters was found in the experiment. With wavelength variables selection based on discrete firefly algorithm, succinic acid concentration of the fermentation broth partial least squares NIR calibration model was built, which was compared with genetic algorithm method. The results showed that the correlation coefficient of calibration set (R2c) of PLS calibration model based on discrete wavelengths firefly algorithm is 0.986, RMSEC of which is 0.409. Correlation coefficient of prediction set (R2p) is 0.969 while RMSEP is 0.458. It is superior to full spectrum modeling and calibration model using genetic algorithm method. DFA shows superiority of the near-infrared spectral data filtering.
[1] CHU Xiao-li,YUAN Hong-fu(褚小立,袁洪福). Modern Instruments(现代仪器), 2011,17(5): 1. [2] KONG Cui-ping,CHU Xiao-li,DU Ze-xue, et al(孔翠萍,褚小立,杜泽学,等). Chinese Journal of Analytical Chemistry(分析化学), 2010,38(6): 805. [3] XIA A-lin,YE Hua-jun,ZHOU Xin-qi, et al(夏阿林,叶华俊,周新奇,等). Chinese Journal of Analysis Laboratory(分析试验室), 2010, 29(9): 18. [4] GUO Zhi-ming,HUANG Wen-qian,PENG Yan-kun, et al(郭志明,黄文倩,彭彦昆,等). Chinese Journal of Analytical Chemistry(分析化学), 2014, 42(4): 513. [5] Yang Xinshe. Firefly Algorithms for Multimodal Optimization. International Symposium on Stochastic Algorithms SAGA 2009: Stochastic Algorithms: Foundations and Applications, 2009. 169. [6] Zouache D, Nouioua F, Moussaoui A. Soft Computing, 2016, 20(7): 1. [7] Rodrigues P S, Wachs-Lopes G A, Erdmann H R, et al. Pattern Analysis and Applications, 2015, 18(2): 1. [8] Karthikeyan S, Asokan P, Nickolas S. The International Journal of Advanced Manufacturing Technology, 2014, 72(9-12): 1567. [9] Jati G K, Suyanto. Evolutionary Discrete Firefly Algorithm for Travelling Salesman Problem. in: Adaptive and Intelligent Systems, Springer-Verlag Berlin Heidelberg, 2011. 393. [10] LI Ming-fu,MA Jian-hua,ZHANG Yu-yan, et al(李明富,马建华,张玉彦,等). Computer Integrated Manufacturing System(计算机集成制造系统), 2014,33(12): 2719. [11] CENG Bing,LI Ming-fu,ZHANG Yi, et al(曾 冰,李明富,张 翼,等). Journal of Mechanical Engineering(机械工程学报), 2013,49(11): 177. [12] CHU Xiao-li,YUAN Hong-fu,LU Wan-zhen(褚小立,袁洪福,陆婉珍). Progress in Chemistry(化学进展), 2004,16(4): 528. [13] FU Qiang,TONG Nan,ZHONG Cai-ming, et al(符 强,童 楠,钟才明,等). Computer Science(计算机科学), 2014,41(3): 228. [14] Chandrasekaran K, Simon S P, Padhy N P. Information Sciences, 2013, 249(2): 67. [15] CAI Ting,SU Li,CHEN Ke-quan, et al(蔡 婷,苏 溧,陈可泉,等). Chinese Journal of Bioprocess Engineering(生物加工过程), 2007,5(1): 66.