|
|
|
|
|
|
Study on Feature Selection of Near Infrared Spectra Based on Feature Hierarchical Combining Improved Particle Swarm Optimization |
XU Bao-ding1, QIN Yu-hua2, YANG Ning1, GAO Rui3*, YUAN Cheng-cheng1 |
1. College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
2. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
3. China Tobacco Yunnan Industrial Co., Ltd., Technical Research Center, Kunming 650024, China |
|
|
Abstract In the quantitative modeling of near-infrared spectroscopy data, the high redundancy and high noise of the data severely affect the robustness and accuracy of the modeling. Therefore, this paper presents a feature-based spectroscopy combined with improved Particle Swarm Optimization (PSO) Method of choosing. First, we measure the importance score of each feature through mutual information, and then sort the features according to the importance of the features in descending order. This effectively avoids the problem of losing important information caused by using the principal component reduction method. Secondly, the concept of jump degree is introduced and a method of feature stratification is constructed. Similar features of similar importance are merged into the same feature subset, and the descending ordered feature set is segmented into different feature subsets, avoiding the screening uncertainty caused by artificially setting the score of feature importance score during feature process. Finally, the particle swarm optimization algorithm with fast convergence rate and few control parameters is used as the optimal feature subset optimization method. At the same time, particle swarm optimization is improved in two aspects: The chaotic model is introduced to increase the diversity of the population and improve the global searching ability of PSO, so as to avoid getting into local optimum. The number of features is introduced into the fitness function, and the influence of the number of features on the fitness function is adjusted by the penalty factor in the early iteration to improve the adaptability of the algorithm. The stratified data is collected as a feature subset and then added as a modified particle swarm optimization algorithm to select the high-resolution feature subset. In this paper, the nicotine index as an example of the feature selection process is described, using Nicolet company Antaris II near infrared spectrometer near infrared spectrum data acquisition, spectrum scanning range is 4 000~10 000 cm-1. First, we use the mutual information theory to calculate the importance score of 1 557 features of the whole spectrum on the quantitative modeling of the index to be measured, and take the average of 30 experiments. Secondly, all the features are sorted in descending order of importance scores to calculate the jumping degree of all the features. According to the jumping degree, the critical points of the feature stratification are searched, and the features are divided into different feature layers to construct a feature containing 8 feature subsets set S={S′1, S′2, S′3, S′4, S′5, S′6, S′7, S′8}. Then, the feature subset is in turn {S′1}, {S′1, S′2}, {S′1, S′2, S′3},…,{S′1, S′2, S′3, S′4, S′5, S′6, S′7, S′8} as a candidate for initial particle swarm. With R/(1+RMSEP) as the evaluation criteria of the pros and cons of feature subsets, each iterative experiment 50 times, the ratio of the largest feature subset is the optimal feature subset. In order to verify the effectiveness of this algorithm, we select representative tobacco near-infrared spectral data as a training set and a test set, establish a PLS quantitative model of nicotine and total sugar, and compare with the full-spectrum, stratified characteristic spectrum, particle swarm algorithm selected by the characteristic spectra. The simulation results show that the modeling correlation coefficients R of nicotine and total sugar selected by this algorithm are respectively 0.988 5 and 0.982 2, RMSECV of mutual verification are 0.098 4 and 0.889 3 respectively, RMSEP of prediction root mean square error are 0.901 6 and 0.100 7 respectively, Accuracy are significantly higher than the other three methods. From the selected number of features, the proposed algorithm has the least number of selected features, effectively eliminating the weak correlation and noise and redundant information in the original feature set, minimizing the number of main factors of the model and reducing the complexity of the model, and the model is steadier, more adaptable.
|
Received: 2018-01-18
Accepted: 2018-05-11
|
|
Corresponding Authors:
GAO Rui
E-mail: gaorui177@163.com
|
|
[1] YUAN Tian-jun, WANG Jia-jun, ZHE Wei, et al(袁天军,王家俊,者 为,等). Chinese Agricultural Science Bulletin(中国农学通报), 2013, 29(20): 190.
[2] QIU Jun, ZHANG Huan-bao, SONG Yan, et al(邱 军,张怀宝,宋 岩,等). Chinese Tobacco Science(中国烟草科学), 2008, 29(1): 55.
[3] QIN Yu-hua, DING Xiang-qian, GONG Hui-li(秦玉华,丁香乾,宫会丽). Infrared and Laser Engineering(红外与激光工程), 2013, 42(5): 1355.
[4] SHU Ru-xin, SUN Ping, YANG Kai, et al(束茹欣,孙 平,杨 凯, 等). Tobacco Science and Technology(烟草科技), 2011, (11): 50.
[5] CHEN Xiao-jing, WU Di, YU Jia-jia, et al(陈孝敬,吴 迪,虞佳佳,等). Acta Optica Sinica(光学学报), 2008, 28(11): 2154.
[6] ZOU Xiao-bo, ZHAO Jie-wen (邹小波,赵杰文). Acta Optica Sinica(光学学报), 2007, 27(7): 1316.
[7] LIU Xin, YU Sui-huai, CHU Jian-jie, et al(刘 昕,余隋怀,初建杰,等). Computer Engineering and Applications(计算机工程与应用), 2015,51(7):1.
[8] TANG Shi-wei, LIU Xian-mei(唐世伟,刘贤梅). Information Theory(信息论). Harbin:Harbin Engineering University Press(哈尔滨:哈尔滨工业大学出版社), 2009.
[9] ZHANG De-ran(张德然). Statisitical Research(统计研究), 2003, (5): 53.
[10] Kennedy J, Eberhart R C. International Conference On Neural Networks,1995. 1942.
[11] Kennedy J, Eberhart R C. International Conference On Systems, Man, And Cybernetics,1997. 4104.
[12] YANG Rui-qing, LIU Guang-yuan(杨瑞请,刘光远). Computer Science(计算机科学), 2008,35(3):137.
[13] Mahdiyeh Eslami, Hussain Shareef, Azah Mohamed. Journal of Central South University of Technology,2011,18: 1579.
[14] LI Ce, WANG Bao-yun, GAO Hao(李 策,王保云,高 浩). Computer Technology and Development(计算机技术与发展), 2017,27(4):89. |
[1] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[2] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[3] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[4] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[5] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[6] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[7] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[8] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[9] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[10] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[11] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[12] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
[13] |
DONG Jian-jiang1, TIAN Ye1, ZHANG Jian-xing2, LUAN Zhen-dong2*, DU Zeng-feng2*. Research on the Classification Method of Benthic Fauna Based on
Hyperspectral Data and Random Forest Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3015-3022. |
[14] |
HUANG Hua1, LIU Ya2, KUERBANGULI·Dulikun1, ZENG Fan-lin1, MAYIRAN·Maimaiti1, AWAGULI·Maimaiti1, MAIDINUERHAN·Aizezi1, GUO Jun-xian3*. Ensemble Learning Model Incorporating Fractional Differential and
PIMP-RF Algorithm to Predict Soluble Solids Content of Apples
During Maturing Period[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3059-3066. |
[15] |
LI Zhong-bing1, 2, JIANG Chuan-dong2, LIANG Hai-bo3, DUAN Hong-ming2, PANG Wei2. Rough and Fine Selection Strategy Binary Gray Wolf Optimization
Algorithm for Infrared Spectral Feature Selection[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3067-3074. |
|
|
|
|