A Novel Approach to NIR Spectral Quantitative Analysis: Semi-Supervised Least-Squares Support Vector Regression Machine
LI Lin1, XU Shuo2*, AN Xin3, ZHANG Lu-da4
1. College of Information and Electrical Engineering, China Agricultural University, Beijing 100193, China 2. Information Technology Supporting Center, Institute of Scientific and Technical Information of China, Beijing 100038, China 3. School of International Trade and Economics, University of International Business and Economics, Beijing 100029, China 4. College of Science, China Agricultural University, Beijing 100193, China
Abstract:In near infrared spectral quantitative analysis, the precision of measured samples’ chemical values is the theoretical limit of those of quantitative analysis with mathematical models. However, the number of samples that can obtain accurately their chemical values is few. Many models exclude the amount of samples without chemical values, and consider only these samples with chemical values when modeling sample compositions’ contents. To address this problem, a semi-supervised LS-SVR(S2LS-SVR) model is proposed on the basis of LS-SVR, which can utilize samples without chemical values as well as those with chemical values. Similar to the LS-SVR, to train this model is equivalent to solving a linear system. Finally, the samples of flue-cured tobacco were taken as experimental material, and corresponding quantitative analysis models were constructed for four sample compositions’ content(total sugar, reducing sugar, total nitrogen and nicotine) with PLS regression, LS-SVR and S2LS-SVR. For the S2LS-SVR model, the average relative errors between actual values and predicted ones for the four sample compositions’ contents are 6.62%, 7.56%, 6.11% and 8.20%, respectively, and the correlation coefficients are 0.974 1, 0.973 3, 0.923 0 and 0.948 6, respectively. Experimental results show the S2LS-SVR model outperforms the other two, which verifies the feasibility and efficiency of the S2LS-SVR model.
[1] YAN Yan-lu, ZHAO Long-lian, HAN Dong-hai, et al(严衍禄,赵龙莲,韩东海,等). Foundation of Near Infrared Spectral Analysis and its Applications(近红外光谱分析基础与应用). Beijing: China Light Industry Press(北京:中国轻工业出版社), 2005. [2] Abdi H. Partial Least Squares (PLS) Regression. Encyclopedia for Research Methods for the Social Sciences, Lewis-Beck M, Bryman A, Futing T, eds. Sage, Thousand Oaks, CA, 2003. 792. [3] Vapnik V N. The Nature of Statistical Learning Theory, 2nd Edition. New York: Springer Verlag, 1999. [4] Suykens J A K, Van Gestel T, Brabanter J D,et al. Least Squares Support Vector Machines. World Scientific Pub. Co., Singapore, 2002. [5] Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training. Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT), Madison, Wisconsin, United States, 1998. 92. [6] Zhu X. Semi-Supervised Learning Literature Survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2008. [7] Chapelle O, Schlkopf B, Zien A. Semi-Supervised Learning. Cambridge: MIT Press, 2006. [8] Chapelle O, Sindhwani V, Keerthi S S. Journal of Machine Learning Research, 2008, 9(2):203. [9] Cortes C, Mohri M. On Transductive Regression. Advances in Neural Information Processing Systems 19, Schlkopf B, Platt J, Hoffman T, eds. MIT Press, Cambridge, MA, 2007. 305. [10] Brefeld U, Crtner T, Scheffer T,et al. Efficient Co-Regularised Least Squares Regression. Proceedings of the 23nd International Conference on Machine Learning(ICML), 2006. 137. [11] Zhou Z-H, Li M. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(11): 1479. [12] Van Gestel T, Suykens J A K, Baesens B,et al. Machine Learning, 2004, 54(1): 5. [13] Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press, 2004. [14] Keerthi S S, Lin C J. Neural Computation, 2003, 15(7): 1667. [15] Lin H T, Lin C J. A Study on Sigmoid Kernels for SVM and the Training of Non-PSD Kernels by SMO-Type Methods. Technical Report, Department of Computer Science, National Taiwan University, 2003. [16] Hsu C-W, Chang C-C, Lin C-J. A Practical Guide to Support Vector Classification. Available [online]: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. [17] Xu S, Ma F J, Tao L. Learn from the Information Contained in the False Splice Sites as well as in the True Splice Sites using SVM. Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering(ISKE), Chengdu, China, 2007. 1360.