Abstract:A novel classifier was constructed in the present paper by combination of an improved canonical variates analysis (ICVA) with Fish linear discriminant analysis (LDA). The resulting discrimination model based on this proposed approach (ICVA-LDA) was divided into two parts: the inner part that estimated the robust weight vector of canonical variates by linear partial least square algorithm and the outer part that built the LDA discrimination model by making use of the extracted canonical variates. The method utilized partial least squares regression as an engine for solving an eigenvector problem involving singular covariance matrices and the canonical variates were more relevant for discriminative purposes. Thus, the weight vectors found in the modified CVA method not only possessed the same properties as weight vectors of the standard CVA method, but also forced the discriminative information into the first fewer of canonical variates. The improved discrimination model was more concise and efficient in dealing with the problem of the effect sensitivity and numerous predictor variables with serious multicollinearity in the spectra data. Furthermore, in ICVA-LDA the interpretation could be performed with respect to the original high-dimensional data space. Finally, application to a four-group problem with near-infrared transmittance spectroscopy data consisting of 310 samples and 404 variables of the proposed ICVA-LDA approach was presented with comparison to the LDA combined with principal component analysis (PCA-LDA) and standard CVA-LDA methods. All the three discrimination models were validated using fivefold segmented cross-validation. The result demonstrates that the limitations of LDA were overcome with PLS algorithm and then the classification performance of LDA was improved by ICVA. This proposed approach can also be widely used in other fields for classification and discrimination of small samples and collinear data.
[1] CHU Xiao-li,YUAN Hong-fu,LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Chinese Journal of Analytical Chemistry(分析化学), 2000, 28(4):421. [2] LI Yan-zhou, MIN Shun-geng, LIU Xia(李彦周, 闵顺耕, 刘 霞). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2007, 27(7):1299. [3] CUI Fu-de(崔福德). Pharmaceutics(药剂学). Beijing: People’s Medical Publishing House(北京: 人民卫生出版社), 2003. [4] YU Hai-yan, YING Yi-bin, FU Xia-ping, et al(于海燕, 应义斌, 傅霞萍, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2007, 27(5):920. [5] LI Xiang-ru, HU Zhan-yi, ZHAO Yong-heng(李乡儒, 胡占义, 赵永恒). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2007, 27(9):1898. [6] Duda R O, Hart P E, Stork D G. Pattern Classification, Second Edition. New York: John Wiley & Sons, 2001. [7] ZHOU Da-ke, YANG Xin, PENG Ning-song(周大可, 杨 新, 彭宁嵩). Journal of Shanghai Jiaotong University(上海交通大学学报), 2005, 39(4):527. [8] Jolliffe I T. Principal Component Analysis(Second Edition). New York:Springer Verlag, 2002. [9] Rodríguez-Pieiro A M, Rodríguez-Berrocal F J, de la Cadena M P. Journal of Chromatography B, 2007, 849(1/2):251. [10] Rao C R. Handbook of Statistics:Data Mining and Data Visualization, North Holland: Elsevier BV, 2005. [11] Nrgaard L, Bro R, Westad F, et al. Journal of Chemometrics, 2006, 20(8/9):425. [12] Geladi P. Journal of Chemometrics, 1988, 2(4):231. [13] XU Q S, De J S, Lewi P, et al. Chemometrics and Intelligent Laboratory Systems, 2004, 71(1):21. [14] Wold H. Research Papers in Statistics, Festschrift for Jerzy Neuman. New York:Wiley, 1966. 411. [15] Cawley G C, Talbot N L C. Pattern Recognition,2003, 36(11):2585. [16] Malthouse E C, Tamhane A C, Mah R S H. Computers & Chemical Engineering, 1997, 21(8):875. [17] HU Lan-ping, ZHANG Lin, LI Yan, et al(胡兰萍, 张 琳, 李 燕, 等). Chinese Journal of Analytical Chemistry(分析化学), 2007, 35(3):345. [18] Dyrby M, Engelsen S B, Nrgaard L, et al. Applied Spectroscopy, 2002, 56(2):579. [19] TAO Shao-hui,CHEN De-zhao, HU Wang-ming, et al(陶少辉, 陈德钊, 胡望明, 等). Chinese Journal of Analytical Chemistry(分析化学), 2005, 33(1):50.