Study on Combinatorial Optimization of Spectral Principal Components Using Successive Projections Algorithm
WU Di1, JIN Chun-hua1,2,HE Yong1*
1. College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310029, China 2. College of Life Science and Biological Engineering, Ningbo University, Ningbo 315211, China
摘要: 应用连续投影算法(successive projections algorithm, SPA)选择由主成分分析(principal component analysis,PCA)得到主成分的最佳组合。首先对奶粉的短波近红外光谱进行PCA分析, 然后通过SPA得到的脂肪和蛋白质含量预测最佳主成分组合分别为主成分1,2,4,5,6和7以及主成分1,2,3,4,5和8。通过最小二乘支持向量机(Least-squares support vector machine, LS-SVM)对奶粉中脂肪和蛋白质含量进行预测, SPA选择得到的主成分组合均优于分别采用前4个到前8个主成分。 基于SPA得到的主成分组合得到脂肪含量预测结果的确定系数(R2p),预测误差均方根(root mean square error for prediction, RMSEP)和剩余预测偏差(residual predictive deviation, RPD)分别为0.989 0,0.170 3和9.534 3。而蛋白质含量预测结果的R2p,RMSEP和RPD分别为0.987 6,0.134 8和8.927 4。说明SPA能够用于快速有效选取最佳的主成分数, 寻优过程简单快速,并且不用对大量参数进行调试。
关键词:连续投影算法;光谱;主成分分析;主成分选择;奶粉
Abstract:Successive projections algorithm (SPA) was employed to select the optimal combination of principal components (PCs) which were obtained by principal component analysis. Short-wave near infrared spectra of milk powder was firstly analyzed by PCA, and the optimal combination of obtained first eight PCs was determined by SPA. The optimal PC combination of fat content prediction was PC1, PC2, PC4, PC5, PC6 and PC7, and the combination for protein content prediction was PC1, PC2, PC3, PC4, PC5 and PC8. Least-squares support vector machine models inputted by different PC combination were established to predict fat and protein content, respectively. Both the fat and protein content prediction results of the PC combination selected by SPA were better than those of first four PCs to first eight PCs. R2p, and root mean square errors for prediction and residual predictive deviation of prediction results of the PC combination selected by SPA were 0.989, 0.170 3 and 9.534 3, respectively for fat, and 0.987 6, 0.134 8 and 8.927 4 for protein. The overall results demonstrate that SPA can fast and effectively select the optimal PC combination. The selecting process is simple and does not need abundant parameter debugging.
吴 迪1,金春华1,2,何 勇1*. 基于连续投影算法的光谱主成分组合优化方法研究[J]. 光谱学与光谱分析, 2009, 29(10): 2734-2737.
WU Di1, JIN Chun-hua1,2,HE Yong1*. Study on Combinatorial Optimization of Spectral Principal Components Using Successive Projections Algorithm . SPECTROSCOPY AND SPECTRAL ANALYSIS, 2009, 29(10): 2734-2737.
[1] ZHANG Xiao-que, GAO Zhi-rong, XIA Yun-gui(张小确,高枝荣,夏云贵). Hebei Journal of Industrial Science and Technology(河北工业科技), 2007, 24(6): 345: 354. [2] LIU Guang-jun, GAO Hong-tao(刘广军,高洪涛). Journal of Qufu Normal University(Natural Science)(曲阜师范大学学报), 2004, 30(3): 75: 81. [3] He Y, Li X L, Deng X F. J. Food Eng., 2007, 79(4): 1238. [4] Wu D, Yang H Q, Chen X J, et al. J. Food Eng., 2008, 88(4): 474. [5] WANG Li, LIU Fei, HE Yong(王 莉,刘 飞,何 勇). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2008, 28(4): 813. [6] Wu D, Feng L, Zhang, C Q, et al. Trans. of ASAE, 2008, 51(3): 1133. [7] Supek F, Peharec P, Krsnik-Rasol M, et al. Proteomics 2008, 8: 28. [8] Dantas Filho H A, Harrop Galvao R K, Dantas Filho H A, et al. Chemom. Intell. Lab. Syst., 2004, 72: 83. [9] Ugulino Araújo M C, Bezerra Saldanha T C, Harrop Galvaol R K, et al. Chemom. Intell. Lab. Syst., 2001, 57(2): 65. [10] CHEN Bin, MENG Xiang-long, WANG Hao(陈 斌,孟祥龙,王 豪). Journal of Instrumental Analysis(分析测试学报), 2007, 126(11): 66. [11] YANG Nan-lin, CHENG Yi-yu, QU Hai-bin(杨南林,程翼宇,瞿海斌). Chinese Journal of Analytical Chemistry(分析化学), 2003, 31(6): 664. [12] CHU Xiao-li,YUAN Hong-fu,LU Wan-zhen(褚小立,袁洪福,陆婉珍). Progress in Chemistry(化学进展), 2004, 16(4): 528. [13] Macho S, Rius A, Callao M P, et al. Analytica Chimica Acta, 2001, 445(2): 213. [14] Wu D, He Y, Feng S, et al. Journal of Food Engineering, 2008, 84: 124. [15] Williams P C. Implementation of Near-Infrared Technology. In: Williams P and Norris K, Editors, Near-Infrared Technology in the Agricultural and Food Industries. 2nd ed., St. Paul, MN: American Association of Cereal Chemists, 2001: 145.