Abstract:The near-infrared spectrum contains the characteristic information of the hydrogen-containing groups of organic molecules in the substance, and it has the characteristics of high dimensionality and large redundancy. Traditional near-infrared spectroscopy techniques are based on shallow correction models, such as principal component regression, partial least squares regression, artificial neural networks, support vector regression etc., which cannot extract the deep information of the spectral data. This paper proposes a near-infrared spectroscopy modeling method based on stacked supervised autoencoders, which can fit the complex non-linear relationship between spectral data and target physicochemical values and extract the deep feature information of the data. First, the optimal preprocessing method is selected by comparing the effects of different spectral preprocessing on the model prediction results. Then the correlation coefficient method is used to extract the characteristic bands of the preprocessed spectrum. The method uses the processed near-infrared spectrum data as the input signal. Then use the target physicochemical values to perform supervised pre-training on multiple supervised autoencoders, and stack multiple pre-trained supervised autoencoders. The stacked supervised autoencoder is obtained, the pre-trained parameters are used as the initialization parameters of the stacked supervised autoencoder, and then the target physicochemical values are used to supervise and fine-tune the stacked supervised autoencoder. Finally the optimal parameters of the model are obtained. Established partial least squares regression prediction model, artificial neural networks prediction model, stack auto-encoder prediction model and stack supervised auto-encoder prediction model on the corn water content data and the total acid content data of yellow wine respectively, verifying the feasibility of stack supervised auto-encoder modeling. The root means square error and residual prediction deviation are employed to evaluate model performance. The accuracy of four modeling methods of partial least squares regression, backpropagation- artificial neural networks, stack auto-encoder, and stack supervised auto-encoder are compared and analyzed. The analysis results show that the model established by stack supervised auto-encoder has a good prediction effect. The two evaluation indexes of the corn water content data set reached 0.061 1 and 4.271; the two evaluation indexes of rice wine’s total acid content data reached 0.126 6 and 4.006, excellent for the other three methods.
Key words:Near infrared spectroscopy; Deep learning; Stack supervised auto-encoder (SSAE); Quantitative calibration model
[1] YAN Yan-lu(严衍禄). Basic and Application pf Near Infrared Spectroscopy Analysis(近红外光谱分析与应用). Beijing: Chinese Light Industry Press(北京: 中国轻工业出版社), 2005.
[2] Joscelin T D, Matthew W V, Mari S C. Industrial Crops and Products, 2014, 59: 119.
[3] Antonios M, Xanthoula-Eirini P, Dimitrios M, et al. Biosystems Engineering, 2016, 152: 104.
[4] Sophia Mayr, Krzysztof B Bec, Justyna Grabska, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2021, 249: 1386.
[5] Balabin R M, Lomakina E I, Safieva R Z. Fuel, 2011, 90(5): 2007.
[6] Hu Baotian, Lu Zhengdong, Li Hang, et al. Advances in Neural Information Processing Systems, 2014, 3: 2042.
[7] Lecun Y, Bengio Y, Hinton G. Nature, 2015, 521(7553): 436.
[8] Yu J B, Yan X F. Industrial and Engineering Chemistry Research, 2018, 57(45): 15479.
[9] Lyu Y T, Chen J H, Song Z H. Chemometrics and Intelligent Laboratory Systems, 2019, 189: 8.
[10] Zhang Zhanpeng, Luo Ping, Chen C L, et al. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(5): 918.
[11] ZHANG Wei-dong, LI Ling-qiao, HU Jin-quan, et al(张卫东, 李灵巧, 胡锦泉, 等). Chinese Journal of Analytical Chemistry(分析化学), 2018, 46(9): 1446.
[12] LU Hao-xiang, WEI Man-man, YANG Hui-hua, et al(路皓翔, 魏曼曼, 杨辉华, 等). Laser and Infrared(激光与红外), 2019, 49(4): 460.
[13] Yang Huihua, Hu Baichao, Pan Xipeng, et al. Journal of Innovative Optical Health Sciences, 2017, 10(2): 1630011.
[14] CHEN Ling-yi, ZHAO Zhong-gai, LIU Fei(陈令奕, 赵忠盖, 刘 飞). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2017, 37(11): 3414.
[15] Roberto K H G, Mário C U A, Gledson E J, et al. Talanta, 2005, 67(4): 736.
[16] Yang Z Y, Ge Z Q. Journal Process Control, 2020, 92: 19.
[17] Lei L, Patterson A, White M. Supervised Autoencoders: Improving Generalization Performance With Unsupervised Regularizers, Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018. 107.
[18] Malley D F, Rönicke H, Findlay D L, et al. Journal of Paleolimnology, 1999, 21(3): 295.