Merging MIR and NIR Spectral Data for Flavor Style Determination
SHA Yun-fei1, HUANG Wen1, WANG Liang1, LIU Tai-ang2,YUE Bao-hua2, LI Min-jie2, YOU Jing-lin2, GE Jiong1*, XIE Wen-yan1*
1. Technology Center of Shanghai Tobacco Group Co., Ltd., Shanghai 200082, China
2. Department of Chemistry, Shanghai University, Shanghai 200444, China
Abstract:Tobaccos flavor type’s determination is an important field tobacco industry. In this work, 189 tobacco samples with different flavor were tested by middle infrared (MIR) spectrum and near-infrared (NIR) spectrum. After the test, 21 characteristic absorption value from a certain wavelength in the MIR spectrum and 13 characteristic absorption value from a certain wavelengthin the IR spectrum were selected as main variants. Then the characteristic data extracted from MIR and IR spectrum were submitted to the principal component analysis (PCA), respectively. The PCA pattern showed a poor classification result by using MIR and IR data solely. After that, the MIR and IR variants were submitted to PCA analysis as merged data. The PCA pattern calculated from merged data showed a good classification result. Through the data analysis, there different flavor Style (fen-flavor Style, medium flavor Style and robust flavor Style) can be classified clearly into their category. After PCA analysis, different mathematical algorithms as step-back algorithm and genetic algorithm were applied to select 34 variants that used in PCA model. 24 variants and 19 variants were selected by step-back algorithms and genetic algorithms, respectively. Compared to the projection pattern by using different variant selected by a different algorithm, we found that though the genetic algorithms used the least variants, the classification result is as good as PCA algorithms and step-back algorithms. After that, genetic algorithms were chosen to make projection drawing that separated three different flavors into different planes by using least variants chosen from MIR and IR merged data. Finally, a support vector classification(SVC)model was built to determine different tobacco flavor by using the variants selected by the genetic algorithm. The accuracy of the model was 92.72%, the accuracy in discriminating fen-flavorstyle, medium flavorstyle and robust flavorstyle were 93.75%, 92.11% and 91.84%. The accuracy of predicted outputs was tested by the leave-one-out cross validation (LOOCV). And the accuracy of LOOCV was 88.24%, the accuracy in discriminating fen-flavorstyle, medium flavorstyle and robust flavorstyle were 90.63%, 86.84%, and 87.76%. The accuracy in prediction of the unknown sample was 86.84% and the accuracy in discriminating fen-flavorstyle, medium flavorstyle and robust flavorstyle were 88.24%, 85.71% and 85.71%. The results of accuracy are above 85% in model test, LOOCV teat and the prediction of unknown sample. The result shows that the mixing data from the MIR spectrum and NIR spectrum can provide more information in the mathematical model building and provide an efficient way in fast tobacco flavor discrimination.
Key words:Middle infrared spectrum; Near infrared spectrum; Tobacco flavor; Data fusion
沙云菲,黄 雯,王 亮,刘太昂,岳宝华,李敏杰,尤静林,葛 炯,谢雯燕. 中红外和近红外数据融合的香型风格判别[J]. 光谱学与光谱分析, 2021, 41(02): 473-477.
SHA Yun-fei, HUANG Wen, WANG Liang, LIU Tai-ang,YUE Bao-hua, LI Min-jie, YOU Jing-lin, GE Jiong, XIE Wen-yan. Merging MIR and NIR Spectral Data for Flavor Style Determination. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(02): 473-477.