Rapid and Robust Partial Least Squares Regression and Its Application to NIR Spectroscopy Analysis
CHENG Zhong1,2, CHEN De-zhao1*
1. Department of Chemical and Biochemical Engineering, Zhejiang University, Hangzhou 310027, China 2. Department of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou 310012, China
Abstract:Modern near infrared spectroscopy (NIRS), as an indirect analytical technique, is used to carriy out quantitative analysis of unknown samples by establishing a model with calibration samples. Taking into account the low sensitivity and poor disturbance rejection of NIRS, a new robust version of the SIMPLS algorithm was constructed from a robust covariance matrix for high-dimensional data and robust linear regression in the present paper. Because SIMPLS was based on the empirical cross-covariance matrix between the response variables and the regressors and on linear least squares regression, the results were affected by abnormal observations in the data set. In order to eliminate their negative impact on the accuracy and reliability of the model, a simple multivariate outlier-detection procedure and a robust estimator for the covariance matrix were embedded in the SIMPLS regression framework, based on the use of information obtained from projections onto the directions that maximize and minimize the kurtosis coefficient of the projected data. Finally, application of the proposed kurtosis-SIMPLS method to the NIR analysis was presented with a comparison to the SIMPLS. The results show that kurtosis-SIMPLS method not only finds out the very outliers from the data set with less computational cost, but also holds better prediction performance and steady capability for the normal samples.
Key words:Partial least squares;Outliers detection;Kurtosis;Robust regression;Near infrared spectroscopy;Quantitative analysis
[1] QI Xiao-ming, ZHANG Lu-da,DU Xiao-lin, et al(齐小明, 张录达, 杜晓林, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2003, 23(5):870. [2] MIN Shun-geng, QIN Fang-li, LI Ning,et al(闵顺耕, 覃方丽, 李 宁, 等). Chinese Journal of Analytical Chemistry(分析化学), 2003, 31:843. [3] Krivoshiev G P, Chalucova R P, Moukarev M I. Lebensmittel-Wissenschaft und -Technologie, 2000, 33(5):344. [4] LI Ning, MIN Shun-geng, QIN Fang-li, et al(李 宁, 闵顺耕, 覃方丽, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2004, 24(1):45. [5] DING Li-min, JI Cheng, YANG Cai-xia, et al(丁丽敏, 计 成, 杨彩霞 ,等). Acta Zoonutrimenta Sinica(动物营养学报), 2000, 12(1):21. [6] Kiralj R, Ferreira Márcia M C. Journal of Molecular Graphics and Modelling, 2003, 21(5):435. [7] Adebiyi O A, Corripio A B. Computers and Chemical Engineering, 2003, 27(2):143. [8] Rebrov E V, de Croon M H J M, Schouten J C. Chemical Engineering Journal, 2002, 90(1/2):61. [9] Chen J, Bandoni A, Romagnoli J A. Computers and Chemical Engineering, 1998, 22(4/5):641. [10] Hawkins D. Identification of Outliers. London:Chapman and Hall, 1980. [11] Naes T. Technometrics, 1985, 27(3):301. [12] De Jong S. Chemometrics and Intelligent Laboratory Systems, 1993, 18(3):251. [13] Lopuhaa H P, Rousseeuw P J. The Annals of Statistics, 1991, 19:229. [14] Gervini D. Journal of Multivariate Analysis, 2003, 84:116. [15] Pena D, Prieto F J. Technometrics, 2001, 43(3):286. [16] Jouan-Rimbauda D, Bouveressea E, Massarta D L. Analytica Chimica Acta, 1999, 388:283. [17] Hubert M, Vanden B K. Journal of Chemometrics,2003, 17:537. [18] Martens H A, Dardenne P. Chemometrics and Intelligent Laboratory Systems, 1998, 44:99.