|
|
|
|
|
|
Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3 |
1. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
2. Information Center, China Tobacco Jiangxi Industrial Co., Ltd., Nanchang 330096, China
3. Faculty of Information Science and Engineering,China Ocean University,Qingdao 266100,China
|
|
|
Abstract Near-infrared spectroscopy has the characteristics of high dimension, high redundancy, and nonlinearity, which seriously affects the similarity measurement results between samples. This paper proposes a t-distributed stochastic nearest neighbor embedding algorithm (Wt-SNE) based on Wasserstein divergence. Based on the idea of manifold learning algorithm, Gaussian distribution is used to convert the distance of high-dimensional data into a probability distribution, and t-distribution is used to represent the probability distribution of corresponding data points in low-dimensional space, which is more inclined to long-tailed distribution. The probability distribution embedding of high-dimensional data is mapped to the low-dimensional space. The low-dimensional manifold structure is reconstructed, the Wasserstein divergence is introduced to measure the difference between the probability distributions in the two spaces, and the similarity of the two distributions is improved by reducing the divergence value. In this way, the dimensionality reduction processing of high-dimensional data is realized. In order to verify the effectiveness of the Wt-SNE algorithm, this paper first performs dimensionality reduction projection on tobacco NIR spectral data and compares it with PCA, LPP, and t-SNE algorithms. The results show that the sample category boundaries in the low-dimensional space are more obvious after the dimensionality reduction of the Wt-SNE algorithm. Secondly, the KNN, SVM, and PLS-DA classifiers were used to predict the tobacco origin of the reduced-dimensional data, and the accuracy rates were 93.8%, 91.5%, and 92.7% respectively, indicating that the reduced-dimensional data not only reconstructed the spatial structure of the original spectrum but also retained the similarity relationship between samples. Finally, tobacco from a particular cigarette formula was selected for single material target tobacco replacement, and the replacement samples were selected based on the Marginal distance between the candidate samples and the target samples. The experiments showed that the replacement tobacco selected by Wt-SNE had the highest similarity to the target tobacco, the chemical composition contents such as nicotine and total sugar were less different from those of the target tobacco, and the aroma, smoke, and taste scores showed high consistency. The method can effectively measure the similarity between the NIR spectra of the tobacco and provide a strong basis for the maintenance of the cigarette formula.
|
Received: 2022-10-04
Accepted: 2023-04-14
|
|
Corresponding Authors:
QIN Yu-hua
E-mail: yuu71@163.com
|
|
[1] CHEN Yuan-zhe, WANG Qiao-hua, GAO Sheng, et al(陈远哲,王巧华,高 升,等). Laser & Optoelectronics Progress(激光与光电子学进展), 2021, 58(12): 1230001.
[2] GAO Yun-fei, FU Lin-yu, QU Jun, et al(高云飞,付霖宇,瞿 军,等). Electronics Optics & Control(电光与控制), 2019, 26(6): 18.
[3] LIU Zhe, MENG Hui, ZHANG Yong-bin, et al(刘 喆,孟 辉,张永彬,等). Acta Optica Sinica(光学学报), 2022, 42(4): 0430001.
[4] LI Hong-da, LI De-cheng, ZENG Rong(李宏达,李德成,曾 荣). Acta Pedologica Sinica(土壤学报), 2021, 58(5): 1224.
[5] ZANG Zhuo, LIN Hui, YANG Min-hua(臧 卓,林 辉,杨敏华). Science of Surveying and Mapping(测绘科学), 2014, 39(2): 146.
[6] XU Bao-ding, DING Xiang-qian, QIN Yu-hua, et al(徐宝鼎,丁香乾,秦玉华,等). Laser & Optoelectronics Progress(激光与光电子学进展), 2019, 56(3): 251.
[7] JIANG Bin, ZHAO Zi-liang, WANG Shu-ting, et al(姜 斌,赵梓良,王淑婷,等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(9): 2913.
[8] MA Yan-jun, LI Xue-ying, MA Li, et al(马雁军,李雪莹,马 莉,等). Acta Tabacaria Sinica(中国烟草学报), 2017, 23(3): 38.
[9] MA Meng-hao, WANG Zhe(马幪浩,王 喆). Computer Engineering and Applications(计算机工程与应用), 2022, 58(5): 193.
[10] BIAN Rong-zheng, ZHANG Jian, ZHOU Liang, et al(边荣正,张 鉴,周 亮,等). Journal of Computer-Aided Design & Computer Graphics(计算机辅助设计与图形学学报), 2021, 33(11): 1746.
[11] YAN Cai-rong, ZHANG Qing-long, ZHAO Xue, et al(燕彩蓉,张青龙,赵 雪,等). Journal of Computer Research and Development(计算机研究与发展), 2016, 52(12): 2793.
[12] LI Jian-guo, ZHAO Hai-tao, SUN Shao-yuan(李建国,赵海涛,孙韶媛). Computer Science(计算机科学), 2019, 46(6): 212.
[13] WANG Xiao-shun, CHEN Dan, LIN Lei-cheng(王孝顺,陈 丹,林垒城). Application Research of Computers(计算机应用研究), 2020, 37(10): 3164.
[14] Rubner Y, Tomasi C, Guibas L J. International Journal of Computer Vision, 2000, 40(2): 99.
[15] Zhou Wending,Bao Shijian,Xu Fangmin, et al. The Journal of China Universities of Posts and Telecommunications, 2020, 27(1): 1.
[16] MENG Xiao-chen, WANG Yue, ZHU Lian-qing(孟晓辰,王 玥,祝连庆). Journal of Biomedical Engineering(生物医学工程学杂志), 2018, 35(5): 697.
[17] XU Hui-di, LIN Lu-lu, LI Zheng, et al(徐荟迪,林露璐,李 征,等). Acta Optica Sinica(光学学报), 2019, 39(3): 0330001.
|
[1] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[2] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[3] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[4] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[5] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[6] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[7] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[8] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[9] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[10] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[11] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
[12] |
HUANG Hua1, LIU Ya2, KUERBANGULI·Dulikun1, ZENG Fan-lin1, MAYIRAN·Maimaiti1, AWAGULI·Maimaiti1, MAIDINUERHAN·Aizezi1, GUO Jun-xian3*. Ensemble Learning Model Incorporating Fractional Differential and
PIMP-RF Algorithm to Predict Soluble Solids Content of Apples
During Maturing Period[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3059-3066. |
[13] |
CHEN Jia-wei1, 2, ZHOU De-qiang1, 2*, CUI Chen-hao3, REN Zhi-jun1, ZUO Wen-juan1. Prediction Model of Farinograph Characteristics of Wheat Flour Based on Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3089-3097. |
[14] |
GUO Ge1, 3, 4, ZHANG Meng-ling3, 4, GONG Zhi-jie3, 4, ZHANG Shi-zhuang3, 4, WANG Xiao-yu2, 5, 6*, ZHOU Zhong-hua1*, YANG Yu2, 5, 6, XIE Guang-hui3, 4. Construction of Biomass Ash Content Model Based on Near-Infrared
Spectroscopy and Complex Sample Set Partitioning[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3143-3149. |
[15] |
WU Yong-qing1, 2, TANG Na1, HUANG Lu-yao1, CUI Yu-tong1, ZHANG Bo1, GUO Bo-li1, ZHANG Ying-quan1*. Model Construction for Detecting Water Absorption in Wheat Flour Using Vis-NIR Spectroscopy and Combined With Multivariate Statistical #br#
Analyses[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(09): 2825-2831. |
|
|
|
|