Analysis and Identification of Terahertz Tartaric Acid Spectral
Characteristic Region Based on Density Functional Theory and
Bootstrapping Soft Shrinkage Method
TANG Xin, ZHOU Sheng-ling*, ZHU Shi-ping*, MA Ling-kai, ZHENG Quan, PU Jing
College of Engineering and Technology, Southwest University, Chongqing 402160, China
Abstract:Terahertz time-domain spectroscopy contains the chemical and physical information of samples and indicates the background information related to equipment noise, sample status and environmental parameters. Its diversified spectrum may affect the model’s performance and reduce the prediction accuracy. Therefore, extracting the characteristic information of target components, eliminatingredundant variables and screen the characteristic spectrum regions from the spectral data in a complex, overlapping and changing environment is of great significance for the quantitative and qualitative analysis of the terahertz spectrum. This paper collected the THz absorption spectra of 342 L-tartaric acid samples with concentrations of 10%, 20%, 40%, 50%, 60% and 80%. The B3LYP method in density functional theory (DFT) was used to optimize the monolecular model of L-tartaric acid based on 6-31G* (d, p) basis set, and the terahertz spectrum characteristics of the monolecular model were theoretically simulated. The molecular vibration modes corresponding to the characteristic wave peaks were analyzed, and the absorption spectra in the band of 0.2~1.6 THz were obtained. Compared with the measured absorption spectrum, the measured results agree well with the theoretical calculation results. The terahertz absorption spectrum of L-tartaric acid was screened using Bootstrapping soft shrinkage (BOSS). The competitive adaptive weighted sampling (CARS-PLS), Monte Carlo non-informational variable elimination (MC-UVE-PLS) and interval partial least square method (iPLS) were then compared and analyzed to obtain a better feature spectral region identification model. The analysis results indicate that the effective spectrum area obtained by the BOSS algorithm agrees better with the characteristic spectral region calculated by DFT theory. The L-tartaric acid spectrum modeling and regression analysis were conducted using full-spectrum PLS, CARS-PLS, MC-UVE-PLS, iPLS and BOSS algorithms. The experimental results imply that the prediction accuracy of the four spectral region screening methods is improved compared with the full spectrum PLS model. In addition, the prediction ability of the BOSS algorithm is improved most significantly by whose cross-validation root-mean-square error (RMSECV), prediction root-mean-square error (RMSEP), validation set determination coefficient (R2test) and test set determination coefficient (R2train) are 0.026 0, 0.026 0, 0.988 1 and 0.987 5 respectively, with higher prediction accuracy and model stability than other models. Therefore, it is foreseeable that, this study may provide an effective method for rapid and quantitative detection based on terahertz spectroscopy.
唐 鑫,周胜灵,祝诗平,马羚凯,郑 权,普 京. 基于密度泛函理论与自举软缩减法的酒石酸太赫兹光谱特征谱区分析指认[J]. 光谱学与光谱分析, 2022, 42(09): 2740-2745.
TANG Xin, ZHOU Sheng-ling, ZHU Shi-ping, MA Ling-kai, ZHENG Quan, PU Jing. Analysis and Identification of Terahertz Tartaric Acid Spectral
Characteristic Region Based on Density Functional Theory and
Bootstrapping Soft Shrinkage Method. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(09): 2740-2745.
[1] DI Zhi-gang, YAO Jian-quan, JIA Chun-rong, et al(邸志刚, 姚建铨, 贾春荣,等). Laser-infrared(激光与红外), 2011, 41(10): 1163.
[2] HU J, Chen R,Xu Z, et al. Sensors (Basel, Switzerland), 2021, 21(9): 3238.
[3] Li C, Zhao T, Li C, et al. Food Chemistry, 2017, 221: 990.
[4] Jiang Yuying, Li Guangming, Lü Ming, et al. Chin. Phys. B, 2020, 29(9): 145.
[5] Deng B, Yun Y, Cao D, et al. Analytica Chimica Acta, 2016, 908: 63.
[6] Grimme S, Antony J, Schwabe T, et al. Organic & Biomolecular Chemistry, 2007, 5(5): 741.
[7] Grimme S, Antony J, Ehrlich S, et al. Journal of Chemical Physics, 2010, 132(15): 154104.
[8] Zhou Q, Shen Y, Li Y, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2020, 236: 118346.
[9] Zhang Q, Chen T, Ma L, et al. Chemical Physics Letters, 2021, 767: 138350.
[10] Buzady A, Unferdorben M, Toth G, et al. Journal of Infrared Millimeter and Terahertz Waves, 2017, 38(8): 963.
[11] Soltani A, Gebauer D, Duschek L, et al. Chemistry-A European Journal, 2017, 23(57): 14128.
[12] Galvao R K H, Araujo M, José G, et al. Talanta, 2005, 67(4): 736.
[13] The Cambridge Crystallographic Data Centre: https://www.ccdc.cam.ac.uk/structures/.
[14] Lee C T, Yang W T, Parr R G. Phys. Rev. B, 1988, 37(2): 785.
[15] Becke A. Phys. Rev. A, 1988, 38(6): 3098.
[16] Maringolo M P, Tello A C M, Guimaraes A R,et al. J. Mol. Model., 2020, 26(10): 293.