Determination of Hesperidin in Tangerine Leaf by Near-Infrared Spectroscopy with SPXY Algorithm for Sample Subset Partitioning and Monte Carlo Cross Validation
1. School of Chinese Pharmacy, Beijing Univeresity of Chinese Medicine, Beijing 100102, China 2. Department of Chemistry, Capital Normal University, Beijing 100037, China
摘要: 在近红外光谱PLS定量模型的建立过程中训练集样本的选取和潜变量数的确定是十分重要的。因此,该研究以橘叶中橙皮苷的含量检测为例,分别比较了random sampling (RS),Kennard-Stone(KS),duplex, sample set partitioning based on joint x-y distance (SPXY) 四种训练集样本的选取方法对模型的影响,以及留一交互验证法和蒙特卡罗法对潜变量数确定的影响。结果表明,SPXY法选取的训练集建立的模型优于其他三种方法,蒙特卡罗法能够较好地确定模型的潜变量数并有效地减少过拟合风险,所建模型的交互验证均方根,预测均方根及预测集相关系数分别为0.768 1,0.736 9,0.975 2。
关键词:近红外光谱法;训练集选择;SPXY;潜变量数;蒙特卡罗法
Abstract:It is very crucial that a representative training set can be extracted from a pool of real samples. Moreover, it is difficult to determine the adapted number of latent variables in PLS regression. For comparison, PLS models were constructed by SPXY, as well as by using the random sampling, duplex and Kennard-Stone methods for selecting a representative subset during the measurement of tangerine leaf. In order to choose correctly the dimension of calibration model, two methods were applied, one of which is leave-one-out cross validation and the other is Monte Carlo cross validation. The results present that the correlation coefficient of the predicted model is 0.996 9, RMSECV is 0.768 1, and RMSEP is 0.736 9, which reveal that SPXY is superior to the other three strategies, and Monte Carlo cross validation can successfully avoid an unnecessary large model, and as a result decreases the risk of over-fitting for the calibration model.
Key words:NIR spectrometry;Sample subset partitioning;SPXY;Number of latent variables;Monte Carlo cross validation
展晓日1,朱向荣2,史新元1,张卓勇2,乔延江1* . SPXY样本划分法及蒙特卡罗交叉验证结合近红外光谱用于橘叶中橙皮苷的含量测定[J]. 光谱学与光谱分析, 2009, 29(04): 964-968.
ZHAN Xiao-ri1,ZHU Xiang-rong2,SHI Xin-yuan1,ZHANG Zhuo-yong2,QIAO Yan-jiang1* . Determination of Hesperidin in Tangerine Leaf by Near-Infrared Spectroscopy with SPXY Algorithm for Sample Subset Partitioning and Monte Carlo Cross Validation . SPECTROSCOPY AND SPECTRAL ANALYSIS, 2009, 29(04): 964-968.
[1] LI Jun-xia, MIN Shun-geng, ZHANG Hong-liang, et al(李君霞,闵顺耕,张洪亮,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2006, 26(5): 833. [2] JIANG Jin-feng,ZHAO Ming-yue(蒋锦峰,赵明月). Tobacco Science and Technology(烟草科技), 2006, (3): 33. [3] CHEN Quan-sheng, ZHAO Jie-wen, ZHANG Hai-dong(陈全胜,赵洁文,张海东). Food Science(食品科学), 2006, 27(4): 186. [4] GAO Jun, YAO Cheng(高 俊,姚 成). Journal of Analytical Science(分析科学学报), 2006, 22(1): 71. [5] WANG Feng-xia, ZHANG Zhuo-yong, WANG Ya-min, et al(王凤霞,张卓勇,王亚敏,等). Journal of Capital Normal University(Natural Science Edition)(首都师范大学学报·自然版), 2005, 26(3): 41. [6] LU Wan-zhen,YUAN Hong-fu,XU Guang-tong,et al(陆婉珍,袁洪福,徐广通,等). Modern Near Infrared Spectroscopy Analytical Technology(现代近红外光谱分析技术). Beijing: China Petro-Chemical Press(北京:中国石化出版社),2000. 146. [7] Galvo Roberto Kawakami Harrop, Araujo Mário César Ugulino, José Gledson Emidio, et al. Talanta,2005, 67:736. [8] XU Qing-song, LIANG Yi-zeng. Chemometrics and intelligent Laboratory Systems, 2001, 56:1. [9] Du Yi Ping, Sumaporn Kasemsumran, Katsuhiko Maruo, et al. Chemometrics and intelligent Laboratory Systems, 2006, 82: 83. [10] LI Yun-feng, YUAN Jing-qi, XUE Yao-feng (李运锋,袁景淇,薛耀锋). Control and Instruments in Chemical Industry(化工自动化及仪表), 2004, 31(6): 21. [11] WU Jing-zhu, WANG Yi-ming, ZHANG Xiao-chao, et al(吴静珠,王一鸣,张小超,等). Transactions of the Chinese Society of Agricultural Machinery(农业机械学报), 2006, 37(4): 80. [12] Mc Carthy W J. TQ Analyst User’s Guide. Madison, W I: Thermo Nicolet Corp, 2000. [13] XIE Pei-shan(谢培山). Chromatographic Fingerprint of Traditional Chinese Medicine(中药色谱指纹图谱). Beijing: People’s Medical Publishing House(北京:人民卫生出版社),2005. 164.