Near Infrared Spectroscopy Analysis Based Machine Learning to Identify Haploids in Maize
LI Wei1, LI Jin-long1, LI Wei-jun2, LIU Li-wei3, LI Hao-guang2, CHEN Chen1, CHEN Shao-jiang1*
1. National Maize Improvement Center of China, Engineering Center for Maize Breeding of MOE, China Agricultural University, Beijing 100193, China
2. Institute of Semiconductores, Chinese Academy of Sciences, Beijing 100083, China
3. Beijing Tunyu Seed Co., Ltd.,Beijing 100193, China
Abstract:Haploid identification is a very important part of doubled haploid technology in maize. In this reasearch, we studied the near-infrared transmission spectra of a large number seeds of haploids and heterozygous diploids to establish an accurate model for haploid identification. Compared with the average spectrum of all haploids and heterozygous diploids, it was found that the absorption peak position of the two spectra was almost the same, but the haploid absorbance was slightly higher than that of heterozygous diploid, especially at the wavelengths of 940~1 120 and 1 180~1 316 nm which shared larger differences. Based on the near infrared spectra of haploids and heterozygous diploids from three different sourcegermplasm, different machine learning algorithms were called to construct a haploid selection model, accuracy of models developed with different spectral preprocessing methods were compared, and the effects of datasets to model evaluation were also studied. By comparison with several models, the haploid identification accuracy of the partial least squares method and the neural network algorithm reached a high accuracy of 95.42% and 93.26% respectively. The results of the testing set were consistent with the accuracy of the model, indicating that the two algorithms are suitable for large-scale screening of haploids. By using the partial least squares model, the accuracy of the model developed from the spectral preprocessing methods of smoothing was the best. Compared with the modeling results of different data size, it was found that increasing the data set in a certain range could improve the accuracy of the model. And when proportion of haploids was high enough, the recall rate of haploid prediction would reach up to 100%. In addition, haploids and heterozygous diploids which was difficult to be identified by R1-nj color were selected to form a new dataset. The accuracy of the partial least squares method trained by this dataset was 93.39%. This showed the advantages of NIR machine learning method for haploid identification, which could be used to achieve accurate identification in the case independent of R1-nj color expression. The method of NIR haploid identification based on machine learning has high accuracy and efficiency, and the method can be optimized with increasing data. This research paved a way for the intelligent identification of haploid.
李 伟,李金龙,李卫军,刘丽威,李浩光,陈 琛,陈绍江. 基于机器学习的玉米单倍体近红外光谱鉴别方法研究[J]. 光谱学与光谱分析, 2018, 38(09): 2763-2769.
LI Wei, LI Jin-long, LI Wei-jun, LIU Li-wei, LI Hao-guang, CHEN Chen, CHEN Shao-jiang. Near Infrared Spectroscopy Analysis Based Machine Learning to Identify Haploids in Maize. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2018, 38(09): 2763-2769.
[1] Nanda D K, Chase S S. Crop Science, 1966, 6(2): 213.
[2] CHEN Shao-jiang,SONG Tong-ming(陈绍江, 宋同明). Acta Agronomica Sinica(作物学报),2003,(4): 587.
[3] Melchinger A E, Schipprack W, Würschum T, et al. Scientific Reports, 2013, 3: 2129.
[4] LIU Jin,GUO Ting-ting,YANG Pei-qiang,et al(刘 金,郭婷婷,杨培强,等). Transactions of the Chinese Society of Agricultural Engineering(农业工程学报), 2012,(S2): 233.
[5] CHEN Shao-jiang(陈绍江). Crops(作物杂志),2013,(6): 1.
[6] CHEN Shao-jiang, LI Liang, LI Hao-chuan, et al(陈绍江,黎 亮,李浩川,等). Beijing: China Agricultural University Press(北京:中国农业大学出版社),2009.
[7] Jones R W, Reinot T, Frei U K, et al. Applied Spectroscopy, 2012, 66(4): 447.
[8] QIN Hong,MA Jing-yi,CHEN Shao-jiang,et al(覃 鸿,马竞一,陈绍江,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2016,36(1): 292.
[9] Team R C. R Foundation for Statistical Computing, Vienna, Austria. 2016.
[10] Kuhn M. Journal of Statistical Software, 2008, 28(5): 1.
[11] Dong X, Xu X, Li L, et al. Molecular Breeding, 2014, 34(3): 1147.