A New Wavelength Selection Algorithm Based on the Fusion of Multiple Models
HONG Ming-jian1,2,WEN Zhi-yu1
1. Micro-Electromechanical System Research Center of Chongqing University, Chongqing 400030, China 2. School of Software Engineering, Chongqing University, Chongqing 400030, China
Abstract:NIR spectroscopy makes a feature of a large number of wavelengths with a much smaller set of samples. However, some of the wavelengths contribute no information to the modeling. Even worse, they may contain the irrelevant information such as noise and background, which may result in a complex model and/or bad predictive ability of the model. So, it’s important to do research in-depth to eliminate these wavelengths and improve the quality of the final model. The present paper firstly summarizes the variable selection methods based on a single PLS regression model and concludes that (1) the cross-validation can be used to select optimal model with good predictive ability, but the resulting model may be not suitable for selecting variables; (2) selecting variables based on a single regression model is inaccurate and instable because a single vector of regression coefficients may not measure the importance of the variables correctly and may vary with models of different complexity. On basis of this analysis, this paper proposed a new method for variable selection based on the fusion of multiple PLS models. This method fuses the multiple PLS regression coefficients to form a vector, then a threshold is determined to eliminate the variables whose corresponding element in the vector is lower than this threshold. Finally, this method is verified by 3 well-known NIR datasets and compared with the UVE-PLS and GA-PLS algorithms. The experiments show that this method may result in a model with less complexity and/or better predictive ability. Moreover, the proposed method is elegant and efficient and therefore can be put in practical use.
洪明坚1,2,温志渝1 . 一种多模型融合的近红外波长选择算法 [J]. 光谱学与光谱分析, 2010, 30(08): 2088-2092.
HONG Ming-jian1,2,WEN Zhi-yu1 . A New Wavelength Selection Algorithm Based on the Fusion of Multiple Models . SPECTROSCOPY AND SPECTRAL ANALYSIS, 2010, 30(08): 2088-2092.
[1] LU Wan-zhen, YUAN Hong-fu, XU Guang-tong,et al(陆婉珍,袁洪福,徐广通,等). Modern Analysis Techniques for Near Infrared Spectroscopy(现代近红外光谱分析技术). Beijing: China Petrochemical Press(北京:中国石化出版社), 2000. [2] Martens H,Naes T. Multivariate Calibration, Chichester. UK: John Wiley & Sons, Inc., 1989. [3] Geladi P,Kowalski B R. Analytica Chimica Acta, 1986, 185: 1. [4] Pierna J A F, Abbas O, Baeten V. Analytica Chimica Acta, 2009, 642: 89. [5] Leardi R. Journal of Chemometrics, 2000, 14: 643. [6] Sutter J M,Kalivas J H. Microchemical Journal, 1993, 47: 60. [7] Hageman J A, Streppel M, Wehrens R. Journal of Chemometrics, 2003, 17: 427. [8] Frenich A G, Jouan-Rimbaud D, Massart D L. Analyst, 1995, 120: 2787. [9] Teófilo R F, Martins J A P A, Ferreira M M C. Journal of Chemometrics, 2009, 23: 32. [10] Centner V, Massart D, Noord O E D. Analytical Chemistry, 1996, 68: 3851. [11] Xu H, Liu Z, Cai W. Chemometrics and Intelligent Laboratory Systems, 2009, 97: 189. [12] Benoudjit N, Cools E, Meurens M. Chemometrics and Intelligent Laboratory Systems, 2004, 70: 47. [13] CHEN Xiao-jing, WU Di, YU Jia-jia(陈孝敬, 吴 迪, 虞佳佳). Acta Optica Sinica(光学学报), 2008, 28: 2153. [14] Xu Q,Liang Y. Chemometrics and Intelligent Laboratory Systems, 2001, 56: 1. [15] Eldén L. Computational Statistics & Data Analysis, 2004, 46: 11. [16] Martens H A,Dardenne P. Chemometrics and Intelligent Laboratory Systems, 1998, 44: 99. [17] Snee R D. Technometrics, 1977, 19: 415. [18] Kalivas J H. Chemometrics and Intelligent Laboratory Systems, 1997, 37: 255.