Local Preserving Projection Similarity Measure Method Based on Kernel Mapping and Rank-Order Distance
QIN Yu-hua1, ZHANG Meng1*, YANG Ning2, SHAN Qiu-fu3
1. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
2. Qingdao Lanzhi Modern Service Industry Digital Engineering Research Center, Qingdao 266071, China
3. China Tobacco Yunnan Industrial Co., Ltd., Technical Research Center, Kunming 650024, China
Abstract:Aiming at the curse of dimensionality problem in measuring spectral similarity caused by the high dimensionality, high redundancy, non-linearity and small samples of the near-infrared spectrum, a local preserving projection algorithm based on kernel mapping and rank-order distance (KRLPP) is proposed in this paper. First, the spectral data is mapped to a higher-dimensional space through a kernel transformation, which effectively ensures the manifold structure’s nonlinear characteristics. Then, the dimensionality of the data is reduced by the locality preserving projections (LPP) algorithm, the rank-order distance is introduced instead of the traditional Euclidean distance or geodesic distance, and a more accurate local neighborhood relationship can be obtained by sharing the information of neighboring points. Finally, the measurement of the spectrum is realized by calculating the distance in low-dimensional space. This method solves the problem of distance failure in high-dimensional space and improves the accuracy of similarity measurement results. In order to verify the effectiveness of the KRLPP algorithm, firstly, the best parameters including the number k of the nearest neighbors and the dimensionality d of the reduced space were determined according to the residuals variation of the dataset before and after dimension reduction. Secondly, it compared with PCA, LPP, and INLPP algorithms from the perspectives of the projection effect of the spectra dimension reduction and the model classification ability. The results show that the KRLPP algorithm has a better ability to distinguish tobacco positions, and the effects of dimension reduction and correct identification of different tobacco positions are significantly better than PCA, LPP and INLPP methods. Finally, five representative tobacco were selected as target tobacco from a certain brand of cigarette formula. At the same time, PCA, LPP and KRLPP methods were used to find similar tobacco for each target tobacco from 300 tobacco samples used for formula maintenance, and the tobacco and cigarette formulas before and after replacement were evaluated from the aspects of chemical composition and sensory. Among them, the parameter selection of LPP and KRLPP for dimensionality reduction is consistent, and 6 principal components were selected for PCA. The results showed that, compared with PCA and LPP methods, the chemical components of total sugar, reducing sugar, total nicotine, total nitrogen and sensory indexes such as aroma, smoke and taste of the replacement tobacco and the replacement formula selected by the KRLPP algorithm had the least difference, and the accuracy of similarity measurement was the highest. This method can be applied to search for alternative raw materials for formula products and assist enterprises in maintaining product quality.
[1] CHU Xiao-li, XU Yu-peng, LU Wan-zhen(褚小立, 许育鹏, 陆婉珍). Chinese Journal of Analytical Chemistry(分析化学), 2008, 23(5): 702.
[2] SONG Chun-jing, DING Xiang-qian, XU Peng-min, et al(宋春静, 丁香乾, 徐鹏民, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2017, 37(7): 2032.
[3] Li W, Wang G, Li K, et al. Chinese High Technology Letters, 2017, 65(2): 1764.
[4] HE Ling, CAI Yi-chao, YANG Zheng(贺 玲, 蔡益朝, 杨 征). Computer Science(计算机科学), 2010, 37(5): 155.
[5] XIE Ming-xia, GUO Jian-zhong, ZHANG Hai-bo, et al(谢明霞, 郭建忠, 张海波, 等). Computer Engineering and Science(计算机工程与科学), 2010, 32(5): 92.
[6] CAO Peng-yun, FU Qiu-juan, GONG Hui-li, et al(曹鹏云, 付秋娟, 宫会丽, 等). Chinese Tobacco Science(中国烟草科学), 2013, 34(3): 84.
[7] XU Bao-ding, DING Xiang-qian, QIN Yu-hua, et al(徐宝鼎, 丁香乾, 秦玉华, 等). Laser & Optoelectronics Progress(激光与光电子学进展), 2019, 56(3): 251.
[8] Lu K, He X F. Pattern Recognition, 2005, 38(11): 2047.
[9] ZHANG Zhi-wei, YANG Fan, XIA Ke-wen, et al(张志伟, 杨 帆, 夏克文, 等). Journal of Electronics and Information Technology(电子与信息学报), 2008, 45(3): 539.
[10] Gu X H, Gong W G, Yang L P. Neurocomputing, 2011, 74(17): 1452.
[11] HUANG Dong-mei, ZHANG Xiao-tong, ZHANG Ming, et al(黄冬梅, 张晓桐, 张 明, 等). Laser & Optoelectronics Progress(激光与光电子学进展), 2019, 56(2): 63.
[12] Zhu C, Wen F, Sun J. A·Rank-Order Distance Based Clustering Algorithm for Face Tagging, CVPR 2011, 2011, 481. doi: 10.1109/CVPR.2011.5995680.
[13] ZHAO Chun-hui, TIAN Ming-hua, LI Jia-wei(赵春晖,田明华,李佳伟). Journal of Harbin Engineering University(哈尔滨工程大学学报),2017,38(8):1179.
[14] Agelet L E, Ellis D D, Duvick S. J. Cereal. Sci., 2012, 55(4): 160.
[15] Meesa C, Souard F, Delported C. Talanta, 2018, 177(9): 4.