A Principal Components Selection Method Based on the Modified Randomization Test for Avoiding Over-Fit and Under-Fit in Spectra Calibration
LI Li-na, LI Qing-bo, YAN Hou-lai, ZHANG Guang-jun*
Key Laboratory of Precision Opto-Mechatronics Technology, Ministry of Education, School of Instrument Science and Opto-Electronics Engineering, Beihang University, Beijing 100191, China
Abstract:More or less principal components often give an over-fit or under-fit quantitative calibration model. In order to avoid over-fit or under-fit in spectra calibration, a principal components selection method based on a modified randomization test is proposed. Three near infrared spectra experiments (the complexity of the sample components in each experiment is increasing by degrees) are introduced in this paper for evaluating the proposed method. The method is compared with the cross-validation method. And the spectra model complexity of how to affect the prediction performance of calibration is discussed. Then the adaptability of this modified randomization test to the uncertainty complex spectra model is also discussed. The results indicate that the proposed method has no process of leaving some samples out like cross-validation does, and all the training samples are considered when selecting principal components, so the problem of over-fit or under-fit can be avoided, which is benefit to improve prediction performance of calibration in spectral analysis. And the modified randomization test method is different with the commonly used randomization test that a simplified criterion is introduced here and it is easy to implement. With the proposed method, the authors can have a visualized and interactive process when selecting principal components. For these three experiments, 4, 5 and 8 selected principal components are employed in calibration respectively and the prediction result is the best for the independent external prediction sets. It is also implied that the proposed method is adaptable to the complex samples with more variables and little samples.
Key words:Spectral analysis;Quantitative calibration;Randomization test;Partial least squares;Principal component
李丽娜,李庆波,阎侯赖,张广军* . 一种改进的随机检验法用于主成分选择以避免光谱分析校正模型的过拟合或欠拟合 [J]. 光谱学与光谱分析, 2010, 30(11): 3041-3046.
LI Li-na, LI Qing-bo, YAN Hou-lai, ZHANG Guang-jun* . A Principal Components Selection Method Based on the Modified Randomization Test for Avoiding Over-Fit and Under-Fit in Spectra Calibration . SPECTROSCOPY AND SPECTRAL ANALYSIS, 2010, 30(11): 3041-3046.