1. Key Laboratory of Micro/Nano Devices and System Technology, Chongqing University, Chongqing400030, China 2. R&D Center of Micro/Nano System and New Material Technology, Chongqing University, Chongqing400030, China 3. Key Laboratory of Optoelectronic Technology and System, Chongqing University, Chongqing400030, China 4. School of Software Engineering, Chongqing University, Chongqing400030, China
Abstract:Manifold learning is a new kind of algorithm originating from the field of machine learning to find the intrinsic dimensionality of numerous and complex data and to extract most important information from the raw data to develop a regression or classification model. The basic assumption of the manifold learning is that the high-dimensional data measured from the same object using some devices must reside on a manifold with much lower dimensions determined by a few properties of the object. While NIR spectra are characterized by their high dimensions and complicated band assignment, the authors may assume that the NIR spectra of the same kind of substances with different chemical concentrations should reside on a manifold with much lower dimensions determined by the concentrations according to the above assumption. As one of the best known algorithms of manifold learning, locally linear embedding (LLE) further assumes that the underlying manifold is locally linear. So, every data point in the manifold should be a linear combination of its neighbors. Based on the above assumptions, the present paper proposes a new algorithm named least square locally weighted regression (LS-LWR), which is a kind of LWR with weights determined by the least squares instead of a predefined function. Then, the NIR spectra of glucose solutions with various concentrations are measured using a NIR spectrometer and LS-LWR is verified by predicting the concentrations of glucose solutions quantitatively. Compared with the existing algorithms such as principal component regression (PCR) and partial least squares regression (PLSR), the LS-LWR has better predictability measured by the standard error of prediction (SEP) and generates an elegant model with good stability and efficiency.
Key words:Near infrared spectra;Manifold learning;Locally linear embedding;Locally weighted regression
[1] Seung H S, Lee D. Science, 2000, 290: 2268. [2] Roweis S, Saul L. Science, 2000, 290: 2323. [3] Saul L, Roweis S. Journal of Machine Learning Research, 2003,4: 119. [4] Tenenbaum J, Silva V D, Langford J. Science, 2000, 290: 2319. [5] Belkin M, Niyogi P. Neural Computation, 2003,15(6): 1373. [6] Donoho D L, Grimes C E. Proceedings of the National Academy of Arts and Sciences, USA, 2003, 100: 5591. [7] He Xiaofei, Niyogi P. Locality Preserving Projections. Advances in Neural Information Processing Systems 16 (NIPS’2003). Vancouver, Canada. 2003. [8] LUO Si-wei(罗四维). Information Processing Theory of Visual Perception(视觉感知系统信息处理理论). Beijing: Publishing House of Electronics Industry(北京:电子工业出版社),2006. [9] XU Rong, JIANG Feng, YAO Hong-xun(徐蓉,姜峰,姚鸿勋), CAAI Transactions on Intelligent Systems(智能系统学报), 2006,1(1): 44. [10] LU Wan-zhen, YUAN Hong-fu, XU Guang-tong,et al(陆婉珍,袁洪福,徐广通,等). Modern Analysis Techniques for Near Infrared Spectroscopy(现代近红外光谱分析技术). Beijing: China Petrochemical Press(北京:中国石化出版社), 2000. [11] YAN Yan-lu, ZHAO Long-lian, HAN Dong-hai, et al(严衍禄,赵龙莲,韩东海,等). Fundamental and Applications of NIR Spectroscopy(近红外光谱分析基础与应用). Beijing: China Light Industry Press(北京: 中国轻工业出版社),2005. [12] XU Lu, SHAO Xue-guang(许禄,邵学广). Methods of Chemometrics(化学计量学方法,第2版). Beijing: Science Press(北京: 科学出版社),2004. [13] Otto M. Chemometrics(化学计量学). Translated by SHAO Xue-guang, CAI Wen-sheng, XU Xiao-jie(邵学广,蔡文生,徐筱杰, 译). Beijing: Science Press(北京: 科学出版社),2003. [14] Cleveland William S,Devlin Susan J. Journal of the American Statistical Association, 1988, 83: 596. [15] DU Xiao-qing, WEN Zhi-yu, XIANG Xian-yi, et al(杜晓晴,温志渝,向贤毅,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2007,27(3):619.