Identification of Transgenic Soybean Varieties Using Mid-Infrared Spectroscopy
FANG Hui1, ZHANG Zhao1, WANG Hai-long1, YANG Xiang-dong2, HE Yong1, BAO Yi-dan1*
1. College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China 2. Agriculture Biotechnology Research Center, Jilin Academy of Agriculture Science, Changchun 130033, China
摘要: 转基因技术对于实现粮食增产,保护生物多样性,减少化学农药使用量等方面有着重大意义,但也可能存在一定的安全隐患。因此,转基因作物检测鉴别技术的研究愈发受到重视。本文采用中红外光谱分析技术,研究对不同品种的转基因大豆及其亲本进行鉴别的可行性。实验采集了三种不同的非转基因大豆亲本(HC6, JACK和W82)及其转基因大豆品种在3 818~734 cm-1范围内的光谱信息。采用偏最小二乘-判别分析(partial least squares-discriminant analysis, PLS-DA)进行判别分析,三种大豆的建模集的判别正确率分别为96.67%, 96.67%和83.33%,预测集的判别正确率分别为83.33%, 85%和85%。研究中采用X-loading weights、变量投影重要性(variable importance in the projection,VIP)和二阶导数(second derivative,2-Der)三种特征波数选择方法对光谱数据进行处理,并根据得到的特征波数分别建立PLS-DA模型进行判别分析,三种大豆的建模集和预测集的判别正确率均超过76.67%和75%。采用主成分分析(principal component analysis,PCA)和独立组分分析(independent component analysis,ICA)两种特征信息提取方法对光谱数据进行处理,分别建立PCA-PLS-DA和ICA-PLS-DA模型进行判别分析,三种大豆的建模集和预测集的判别正确率均超过80%和75%。研究表明中红外光谱分析技术可以较为准确地鉴别非转基因亲本与转基因品种,为转基因大豆的无损鉴别提供新的思路。同时结合特征波数选择方法与特征信息提取方法可以有效地降低模型复杂度,减少程序运算量。
关键词:中红外光谱;转基因大豆;特征波数选择;特征信息提取
Abstract:Transgenic technology has enormous significance in increasing food production, protecting biodiversity and reducing the use of chemical pesticides and so on. However, there may be some security risks; therefore, research on genetically modified crop identification technology is attracting more and more attention. Mid-infrared spectroscopy combined with feature extraction methods were used to investigate the feasibility of identifying different kinds of transgenic soybeans in the wavelength range of 3 818~734 cm-1. For this purpose, partial least squares-discriminant analysis (PLS-DA) was employed as pattern recognition methods to classify three non-GMO parent soybeans(HC6, JACK and W82)and their transgenic soybeans. The results of the calibration set were 96.67%, 96.67% and 83.33% for three non-GMO parent soybeans and their transgenic soybeans, and the results of the prediction set were 83.33%, 85% and 85%. X-loading weights, variable importance in the projection (VIP) algorithm and second derivative (2-Der) algorithm were applied to select sensitive wavenumbers. The sensitive wavelengths selected with x-loading weights were used to build PLS-DA model, the classification accuracy of the calibration set were 91.11%, 91.67% and 81.67%, and the results of the prediction set were 80%, 80% and 75%. By using the VIP algorithm, the classification accuracy of the calibration set were 94.44%, 95% and 76.67%, and the results of the prediction set were 80%, 85% and 75%. By using the 2-Der algorithm, the classification accuracy of the calibration set were 88.89%, 81.67% and 80%, and the results of the prediction set were 76.67%, 75% and 75%. Principal components analysis (PCA) and independent component analysis (ICA) were applied to extract feature information. The principal components were combined with PLS-DA model. The classification accuracy of the calibration set were 96.67%, 90% and 80%, and the results of the prediction set were 80%, 90% and 80%. The independent components were combined with PLS-DA model. The classification accuracy of the calibration set were 93.33%, 83.33% and 83.33% while the results of the prediction set were 83.33%, 75% and 75%. The overall results indicated that mid-infrared spectroscopy could accurately identify the varieties of the non-GMO parent soybeans, which provided a new idea for nondestructive testing of transgenic soybeans. Feature extraction methods could be used to build more concise models and reduce the amount of program operations combined with sensitive wavenumbers selection methods.