1. 厦门大学电子科学系,福建省等离子体与磁共振研究重点实验室,福建 厦门 361005 2. 厦门大学通信工程系,福建 厦门 361005 3. Department of Bioprocess Engineering & Institute of Bioproduct Development, Universiti Teknologi Malaysia, Skudai 81310, Malaysia
A Novel Metabolomic Data Scaling Method Based on K-L Divergence
DENG Ling-li1, 2, Cheng Kian-Kai3, SHEN Gui-ping1, ZHOU Ling1, LIU Xin-zhuo1, DONG Ji-yang1*, CHEN Zhong1
1. Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen 361005, China 2. Department of Communication Engineering, Xiamen University, Xiamen 361005, China 3. Department of Bioprocess Engineering & Institute of Bioproduct Development, Universiti Teknologi Malaysia, Skudai 81310, Malaysia
Abstract:A new scaling method in the current study based on Kullback-Leibler (K-L) divergence is proposed for NMR metabolomic data. The proposed method (called K-L scaling) is a supervised scaling method as group information is incorporated in the scaling procedure. Notably, K-L divergence measures the difference between two different datasets by their probability distributions, it can be used for the analysis of data that either follows Gaussian or non-Gaussian distributions. In K-L scaling, all variables were first standardized to unit variance, then their variance was adjusted using Kullback-Leibler divergence to highlight the significant variables. K-L scaling can tell effectively the difference in spectral data points between two experimental groups, and then enhances the weights of biological-relevant variables, and at the same time reduces the weight of noise and uninformative variables. The developed method was applied to a 1H-NMR metabolomic dataset acquired from human urine. Analysis results of the dataset showed that this new scaling method is efficient in suppressing the contribution of noise in the resulting multivariate model. In addition, it can increase the weights of important variables, and improve the interpretability and predictability of subsequent principal component regression (PCR) and partial least squares discriminant analysis (PLS-DA). Furthermore, the scaling method facilitated the identification of metabolic signatures. The current result suggested that the developed K-L scaling method may become a useful alternative for the preprocessing of NMR-based metabolomic data.
[1] Katajamaa M, Oreic M. Journal of Chromatography A, 2007, 1158(1-2): 318. [2] Van den Berg R A, Hoefsloot H C, Westerhuis J A, et al. BMC Genomics, 2006, 7(1): 142. [3] Jackson J E. A User’s Guide to Principal Components. Hoboken, NJ: Wiley-Interscience, 2003. [4] Eriksson L, Antti H, Gottfries J, et al. Analytical and Bioanalytical Chemistry, 2004, 380(3): 419. [5] Keun H C, Ebbels T M D, Antti H, et al. Analytica Chimica Acta, 2003, 490(1-2): 265. [6] Dong J Y, Li W, Deng L L, et al. Chemical Journal of Chinese Universities-Chinese, 2011, 32(2): 262. [7] Rached Z, Alajaji F, Campbell L L. IEEE Transactions on Information Theory, 2004, 50(5): 917. [8] Xu J J, Yang S Y, Cai S H, et al. Analytical and Bioanalytical Chemistry, 2010, 396(4): 1451. [9] DONG Ji-yang, ZHOU Ling, DENG Ling-li(董继扬, 周 玲, 邓伶莉). Chinese Patent (中国专利),2013SR060215, 2013. [10] Dong J Y, Cheng K K, Xu J J, et al. Chemometrics and Intelligent Laboratory Systems, 2011, 108(2): 123. [11] Heyman H M, Meyer J J M. South African Journal of Botany, 2012, 82(0): 21. [12] Craig A, Cloarec O, Holmes E, et al. Analytical Chemistry, 2006, 78(7): 2262.