Combine Hyperspectral Imaging and Machine Learning to Identify the Age of Cotton Seeds
DUAN Long1, YAN Tian-ying1, WANG Jiang-li2, 3, YE Wei-xin1, CHEN Wei1, GAO Pan1, 2*, LÜ Xin2, 3*
1. College of Information Science and Technology, Shihezi University, Shihezi 832003, China
2. The Key Laboratory of Oasis Eco-Agriculture, Xinjiang Production and Construction Corps, Shihezi 832003, China
3. College of Agriculture, Shihezi University, Shihezi 832003, China
Abstract:At present, the technology of precision cotton seeding has been promoted comprehensively in Xinjiang Corps, which can accurately achieve the agronomic technical standards of one grain per hole, but it also sets higher demands for the screening of high-quality cotton seeds. To avoid the decrease of germination rate caused by the cotton seeds with lack of vitality in previous years, machine learning and near-infrared (NIR) hyperspectral imaging (HSI) technology can be used to identify cotton seed years with high precision and to screen cotton seeds quickly and nondestructively. A total of 1 440 cotton seeds with no difference in appearance were collected in 2016, 2017, 2018, and 2019, and 360 seeds per year (According to 3∶1∶1, it is divided into the training set, validation set, and test set.) as samples. Hyperspectral images of cotton seeds in the range of 915~1 698 nm were collected according to each batch of 60 seeds, and average spectra (1 002~1 602 nm) for removing obvious noise at the beginning and the end were extracted as the raw data. SavitzkyGolay (SG) smoothing algorithm was used to preprocess the spectra. The principal component analysis loading (PCA-loading) method was used to select 13 effective wavelengths. Six classification models, including logistic regression (LR), partial least squares discriminant analysis (PLS-DA),support vector machine (SVM), recurrent neural network (RNN), long-short memory network (LSTM), and convolution neural network (CNN), were established based on full spectra and effective wavelengths. When using full spectra to build models, the identification accuracy of the six classification models on the test set was 96.27%, 98.98%, 99.32%, 96.95%, 97.63%, and 100%, respectively, among which CNN and SVM models had achieved good results. When using effective wavelengths to build models, the identification accuracy of the six classification models on the test set was 93.56%, 97.29%, 98.30%, 95.25%, 94.24%, and 99.66%, respectively, among which CNN and SVM models still had excellent classification results. The results showed that the six classification models could achieve high precision cotton seed years identification when the full spectra were used, and the identification accuracy of CNN and SVM models was still up to 98% when the effective wavelengths were used. The deep learning methods are generally better than the traditional machine learning methods, but traditional machine learning methods can still maintain good identification accuracy. Therefore, the combination of near-infrared hyperspectral imaging technology and machine learning methods can achieve high-precision identification of cotton seed years. It provides theories foundation and methods for selecting high-quality cotton seeds in the process of precision sowing.
段 龙,鄢天荥,王江丽,叶伟欣,陈 伟,高 攀,吕 新. 结合高光谱成像和机器学习的棉种年份鉴别[J]. 光谱学与光谱分析, 2021, 41(12): 3857-3863.
DUAN Long, YAN Tian-ying, WANG Jiang-li, YE Wei-xin, CHEN Wei, GAO Pan, LÜ Xin. Combine Hyperspectral Imaging and Machine Learning to Identify the Age of Cotton Seeds. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(12): 3857-3863.