LLE-PLS Nonlinear Modeling Method for Near Infrared Spectroscopy and Its Application
YANG Hui-hua1,2, QIN Feng2, WANG Yong1,3, WU Yun-ming4, SHI Xiao-hao4, LIANG Qiong-lin1, WANG Yi-ming1, LUO Guo-an1,3*
1. Analysis Center, Tsinghua University, Beijing 100084, China 2. College of Computer and Control, Guilin University of Electronic Technology, Guilin 541004, China 3. The Modern Research Center for TCM, East China University of Science and Technology, Shanghai 200237, China 4. Green Valley Holding Co., Ltd., Shanghai 201203, China
Abstract The traditional near infrared (NIR) spectra modeling algorithm-partial least squares (PLS) can’t effectively reflect the nonlinear correlations existing between the near infrared spectra and the chemical or physical properties of samples. Locally linear embedding (LLE) is a newly proposed nonlinear dimension reduction algorithm, which is a kind of manifold learning algorithm. It can find out the intrinsic dimension from high dimensional data effectively, and map the high dimensional input data points to a global low dimensional coordinates while keeping the spatial relations of the adjacent points, i.e. the geometry structure of the high dimensional space. No application of LLE in the information processing of NIR spectra has been reported. By combining LLE and PLS, a novel nonlinear modeling method LLE-PLS for NIR spectra was proposed. In the proposed method, LLE and PLS were adopted to deduct the dimensions of NIR spectra and build regressor, respectively. The LLE-PLS method was applied to correlate the NIR spectra with the concentrations of salvia acid B in the elution of column chromatography of Salvianolate. The results showed that LLE-PLS outperformed other preprocessing methods such as multiplicative scattering correction, the 1st derivative, vector normalization, minimum-maximum normalization, detrend, debias, and the 2nd derivative. After parameter optimization, LLE-PLS can accurately predict the concentration of salvia acid B, with a minimum RMSECV of 0.128 mg·mL-1 and r2 of 0.998 8, suggesting that LLE-PLS is better than PLS in modeling and prediction. The parameter of the number of nearest neighbor k of LLE-PLS and output dimension d can affect the performance of the method. The research showed that k is robust to RMSECV, and an excessively low or high output dimension d will result in a greater error because of insufficient or excessive information extraction. It can be concluded that LLE-PLS can effectively model the nonlinear correlations between spectra and physicochemical properties of the samples. And it is feasible to actualize online monitoring of the process of column chromatography of Salvianolate by coupling NIR spectra with LLE-PLS modeling method.
Key words:Locally linear embedding;Partial least squares;Near infrared spectroscopy;Salvianolate
Corresponding Authors:
LUO Guo-an
E-mail: luoga@tsinghua.edu.cn
Cite this article:
YANG Hui-hua,QIN Feng,WANG Yong, et al. LLE-PLS Nonlinear Modeling Method for Near Infrared Spectroscopy and Its Application[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2007, 27(10): 1955-1958.
[1] ZHAO Chen, QU Hai-bin, CHENG Yi-yu(赵 琛, 瞿海斌, 程翼宇). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2004, 24(1): 50. [2] WANG Yong, YANG Hui-hua, HU Ping, et al(王 勇, 杨辉华, 胡 坪, 等). Journal of Nanchang University·Natural Science(南昌大学学报·理科版), 2006, 30(suppl.): 1342. [3] CHEN Da, WANG Fang, SHAO Xue-guang, et al(陈 达, 王 芳, 邵学广, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2004, 24(6): 672. [4] TIAN Gao-you, YUAN Hong-fu, LIU Hui-ying(田高友, 袁洪福, 刘慧颖). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2006, 26(8): 1441. [5] YING Yi-bin, LIU Yan-de, FU Xia-ping(应义斌, 刘燕德, 傅霞萍). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2006, 26(1): 63. [6] Chen D, Shao X G, Hu B, et al. Acalytica Chimica Acta, 2004, 511(1): 37. [7] ZHANG Lin, ZHANG Li-ming, LI Yan, et al(张 琳, 张黎明, 李 燕, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2005, 25(10): 1610. [8] WANG Feng-xia, ZHANG Zhuo-yong, WANG Ya-min(王凤霞, 张卓勇, 王亚敏). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2006, 26(5): 908. [9] Pascal C, Serge W, Michel U. Analytica Chimica Acta, 2007, 591(2): 219. [10] ZHANG Lu-da, JIN Ze-chen, SHEN Xiao-nan, et al(张录达, 金泽宸, 沈晓南, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2005, 25(9): 1400. [11] Ustun B, Melssen W J, Oudenhuijzen M, et al. Analytica Chimica Acta, 2005, 544(1): 292. [12] Roweis S T, Saul L K. Science, 2000, 290: 2323. [13] Saul L T, Roweis S T. Journal of Machine Learning Research, 2004, 5(3): 119. [14] XU Rong, JIANG Feng, YAO Hong-xun(徐 蓉, 姜 峰, 姚鸿勋). CAAL Transactions on Intelligent Systems(智能系统学报), 2006, 1(1): 44. [15] TAN Lu, WU Yi, YI Dong-yun(谭 璐, 吴 翊, 易东云). Journal of National University of Defense Technology(国防科技大学学报), 2004, 26(6): 91. [16] Zhang C S, Wang J, Zhao N Y, et al. Pattern Recognition, 2004, 37: 325. [17] Shi R J, Shen I F, Chen W B. Proceedings of the Computer Graphics, Imaging and Vision: New Trends, 2005. 147. [18] Li H G, Shi C P, Li X G. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, 2005,7: 4516. [19] GAO Rui, ZHANG Ying, WANG Shu-chen, et al(高 蕊, 张 颖, 王书臣, 等). Chinese Journal of Clinical Pharmacology Theraputics(中国临床药理学与治疗学), 2004, 9(11): 1209. [20] http://www.math.umn.edu/~wittman/mani/index.html.