An Optimal Selection Method of Samples of Calibration Set and Validation Set for Spectral Multivariate Analysis
LIU Wei1, ZHAO Zhong1*, YUAN Hong-fu2, SONG Chun-feng2, LI Xiao-yu2
1. College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China 2. College of Materials Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
Abstract:The side effects in spectral multivariate modeling caused by the uneven distribution of sample numbers in the region of the calibration set and validation set were analyzed, and the “average” phenomenon that samples with small property values are predicted with larger values, and those with large property values are predicted with less values in spectral multivariate calibration is showed in this paper. Considering the distribution feature of spectral space and property space simultaneously, a new method of optimal sample selection named Rank-KS is proposed. Rank-KS aims at improving the uniformity of calibration set and validation set. Y-space was divided into some regions uniformly, samples of calibration set and validation set were extracted by Kennard-Stone(KS) and Random-Select(RS) algorithm respectively in every region, so the calibration set was distributed evenly and had a strong presentation. The proposed method were applied to the prediction of dimethylcarbonate (DMC) content in gasoline with infrared spectra and dimethylsulfoxide in its aqueous solution with near infrared spectra. The “average” phenomenon showed in the prediction of multiple linear regression (MLR) model of dimethylsulfoxide was weakened effectively by Rank-KS. For comparison, the MLR models and PLS1 models of MDC and dimethylsulfoxide were constructed by using RS, KS, Rank-Select, sample set partitioning based on joint X- and Y-blocks (SPXY) and proposed Rank-KS algorithms to select the calibration set, respectively. Application results verified that the best prediction was achieved by using Rank-KS. Especially, for the distribution of sample set with more in the middle and less on the boundaries, or none in the local, prediction of the model constructed by calibration set selected using Rank-KS can be improved obviously.
刘 伟1,赵 众1*,袁洪福2,宋春风2,李效玉2 . 光谱多元分析校正集和验证集样本分布优选方法研究 [J]. 光谱学与光谱分析, 2014, 34(04): 947-951.
LIU Wei1, ZHAO Zhong1*, YUAN Hong-fu2, SONG Chun-feng2, LI Xiao-yu2 . An Optimal Selection Method of Samples of Calibration Set and Validation Set for Spectral Multivariate Analysis. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2014, 34(04): 947-951.
[1] Daszykowski M,Walczak B,Massart D L. Analytical Chimica Acta,2002,468:91. [2] Gabriel G Siano,Héctor C Goicoechea. Chemometrics and Intelligent Laboratory Systems,2007,88:204. [3] YUAN Hong-fu, CHU Xiao-li, TIAN Gao-you, et al (袁洪福,褚小立,田高友,等). Standard Guidelines for Molecular Spectroscopy Multivariate Calibration Quantitative Analysis(分析光谱多元校正定量分析通则). National Standard(中华人民共和国国家标准). [4] Kanduc K R, Zupan J, Majcen N. Chemometrics and Intelligent Laboratory Systems, 2003, 65(2): 221. [5] Kennard R W,Stone L A. Technometrics, 1969, 11: 137. [6] Snee R D. Technometrics, 1977, 19(4): 415. [7] WU Jing-zhu(吴静珠). Research of NIR-Based Technology on Agriculture Products Detection(农产品品质检测中的近红外光谱分析技术研究). Beijing: China Agricultural University(北京:中国农业大学), 2006. [8] Roberto Kawakami Harrop Galvo,Mário César Ugulino Araujb, Gledson Emídio José,et al. Talanta, 2005,67: 736. [9] Christian Hakemeyera, Ulrike Straussa, Silke Werza, et al. Talanta, 2012,15: 12. [10] XIE Jun, PAN Tao, CHEN Jie-mei, et al(谢 军,潘 涛,陈洁梅,等). Chinese Journal of Analytical Chemistry(分析化学), 2010, 38(3): 342.