Isomap-PLS Nonlinear Modeling Method for Near Infrared Spectroscopy
YANG Hui-hua1,2,QIN Feng1,WANG Yi-ming2,LUO Guo-an2
1. College of Computer and Control, Guilin University of Electronic Technology, Guilin 541004, China 2. Modern Research Center of Traditional Chinese Medicine, Tsinghua University, Beijing 100084, China
Abstract:For modeling the nonlinear relationship existing between samples’ near infrared (NIR) spectra and their chemical or physical properties, a novel modeling method was put forward in the present paper, which builds model by combining Isomap and partial least squares (PLS). Isomap is a newly proposed nonlinear dimension reduction algorithm, and belongs to the algorithm family of manifold learning, which is a new branch of machine learning. Isomap is based on multidimensional scaling (MDS) algorithm;however, it replaces the Euclidean distance in MDS with an approximated geodesic distance, so it can effectively find out the intrinsic low dimensional structure from high dimensional data. By combining Isomap and PLS, refered to as Isomap-PLS, a novel nonlinear modeling method for NIR spectra analysis was proposed. In this method, Isomap was used to extract nonlinear information from high dimensional NIR spectra while keeping the invariance of geometric property, and then PLS was adopted to remove linear information redundancy and build a calibration model. The parameters of the Isomap, i.e. the number of the nearest neighbor k and output dimension d, can affect the performance of the method. In this paper, a grid search approach was used for parameter optimization. The Isomap-PLS modeling method was applied to two public benchmark NIR datasets, and the modeling results were compared with that of PLS. The results demonstrated that in both datasets, each model built with Isomap-PLS had a smaller rooted mean square error of cross-validation (RMSECV) than the corresponding model built with PLS. Moreover, for some properties, the RMSECV of Isomap-PLS was significantly reduced by a factor of 2-5 compared with that of PLS. It can be concluded that by taking the virtue that Isomap can reflect the intrinsic nonlinear structure of NIR spectra, Isomap-PLS can effectively model the nonlinear correlations between spectra and physicochemical properties of the samples, and so it gains more power in calibration and prediction than PLS.
[1] YAN Yan-lu, ZHAO Long-lian, HAN Dong-hai, et al(严衍禄, 赵龙莲, 韩东海, 等). Foundation of Near-Infrared Spectral Analysis and Its Application(近红外光谱分析基础与应用). Beijing: China Light Industry Press(北京: 中国轻工业出版社), 2005. 1. [2] LU Wan-zhen, YUAN Hong-fu, XU Guang-tong, et al(陆婉珍, 袁洪福, 徐广通, 等). Modern Near Infrared Spectroscopy Analysis Technology (现代近红外分析技术). Beijing: China Petrochemical Press(北京: 中国石化出版社), 2001. 10. [3] WANG Hui-wen, WU Zai-bin, MENG Jie(王惠文, 吴载斌, 孟 洁). Partial Least-Squares Regression Linear and Nonlinear Methods(偏最小二乘回归的线性与非线性方法). Beijing: National Defense Industry Press(北京: 国防工业出版社), 2006. 9. [4] Manabu K, Hiromu O, Shiji H, et al. Computers and Chemical Engineering, 2001, 25(7): 1103. [5] Wold S, Rrhe A, Wold H. SIAMJ Journal of Science Statistics Computing, 1984, 5(3): 735. [6] LUO Guo-an, YANG Hui-hua, WANG Yong, et al(罗国安, 杨辉华, 王 勇, 等). In: LU Wan-zhen, YUAN Hong-fu, CHU Xiao-li, et al(陆婉珍, 袁洪福, 褚小立, 等). Near Infrared Spectroscopy Technology in Modern China - Proceedings of the 1st Chinese Conference on Near Infrared Spectroscopy. Beijing: China Petrochemical Press(北京: 中国石化出版社), 2006. 47. [7] ZHAO Chen, QU Hai-bin, CHENG Yi-yu(赵 琛, 瞿海斌, 程翼宇). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2004, 24(1): 50. [8] WANG Feng-xia, ZHANG Zhuo-yong, WANG Ya-min(王凤霞, 张卓勇, 王亚敏). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2006, 26(5): 908. [9] Pascal C, Serge W, Michel U. Analytica Chimica Acta, 2007, 591(2): 219. [10] ZHANG Lu-da, JIN Ze-chen, SHEN Xiao-nan, et al(张录达, 金泽宸, 沈晓南, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2005, 25(9): 1400. [11] WU Rong-hui, SHAO Xue-guang(吴荣晖, 邵学广). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2006, 26(4): 617. [12] Ustun B, Melssen W J, Oudenhuijzen M, et al. Analytica Chimica Acta, 2005, 544(1): 292. [13] YANG Hui-hua, QIN Feng, LINAG Qiong-lin, et al. Chinese Chemical Letters, 2007, 18(7): 852. [14] YANG Hui-hua, QIN Feng, WANG Yong, et al(杨辉华, 覃 锋, 王 勇, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2007, 27(10): 1955. [15] Tenenbaum J B, Silva V D, Langford J C. Science, 2000, 290(22): 2319. [16] XU A, GUO P. Lecture Notes in Computer Science, 2006, 3972: 486. [17] ZHAO Lian-wei, LUO Si-wei, ZHAO Yan-chang, et al(赵连伟, 罗四维, 赵艳敞, 等). Journal of Software(软件学报), 2005, 16(8): 1423. [18] ZHAN De-chuan, ZHOU Zhi-hua(詹德川, 周志华). Journal of Computer Research and Development(计算机研究与发展), 2005, 42(9): 1533. [19] Cox T, Cox M. Multidimensional Scaling. London: Chapman and Hall, 2001. [20] http://www.bdagroup.nl/downloads/bda_downloads.html. [21] Wulfert F, Kok W T, Smilde A K. Anal. Chem. 1998, 70: 1761. [22] http://www.models.kvl.dk/research/data/. [23] http://www.math.umn.edu/~wittman/mani/index.html.