|
|
|
|
|
|
NIR Spectral Feature Selection Using Lasso Method and Its Application in the Classification Analysis |
LI Yu-qiang1, PAN Tian-hong1, 2*, LI Hao-ran1, ZOU Xiao-bo3 |
1. School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
2. School of Electrical Engineering and Automation, Anhui University, Hefei 230061, China
3. School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, China |
|
|
Abstract Near-infrared spectroscopy (NIRS) is a non-destructive detection method for qualitative or quantitative analysis by using spectral feature data. The integrity and representativeness of feature data determine the performance of the analytical model. However, existing analytical methods can only extract the feature data from the spectral subinterval. Then the developed models using these feature extracting methods have poor stability. In order to extract the feature from the high-dimensional NIR spectral data and improve the accuracy and stability of NIR spectral model, a spectral screening method using the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm is proposed in this paper. Furthermore, the Tricholoma Matsutake, one of the high-value foreign trade products in China is taken as example to validate the developed classified model using LASSO algorithm. The effectiveness of the feature screening algorithm for the high-dimensional spectral data is discussed, and predictive accuracy and stability of the Tricholoma Matsutake distinguished and edible fungus classified model using LASSO and PCA are also analyzed. It is well known that the fresh Tricholoma Matsutake has the unique shape and it is easy to distinguish its counterfeit. However, it is difficult to distinguish the dry Tricholoma Matsutake from other mushrooms because all of dry mushrooms have the similar flake shape. As a result, dry Tricholoma Matsutake adulteration incidents have occurred frequently. 166 dry samples of Yunnan Tricholoma Matsutake, Pleurotuseryngii, Jujube hilt nipple mushroom and Agaricusblazei were selected in this experiment, and 166×512-dimensional raw spectral data were obtained by NIRQuest 512 NIR spectrometer with a spectral range of 900~1 700 nm. The standard normal transformation (SNV) was taken to pre-process the spectral data after the anomalous data eliminating. The LASSO was used to extract feature variables from the high-dimensional NIR spectral data based on the spectral pretreatment. Then the typical linear (k-Nearest Neighbor, KNN) and the nonlinear modeling (Back-Propagation neural network, BP) algorithms combined with the Kennard-Stone method were used to construct the Tricholoma Matsutake distinguished and edible fungus classified model. The effectiveness of models using LASSO and PCA were also analyzed. Furthermore, the predictive accuracy and the stability of the developed KNN model and BP model were analyzed by using the Monte Carlo method. The experimental results demonstrated that the prediction accuracy and stability of model using LASSO were better than those of the model using PCA. The prediction accuracy of the distinguished and edible fungus classified models using the original spectral data were 69.57% (BP), 60.87%(KNN) and 67.39% (BP), 65.22% (KNN) respectively. And the prediction accuracy of the distinguished and edible fungus classified models using LASSO algorithm were up to 100% (BP), 67.39% (KNN) and 89.13% (BP), 80.43% (KNN) respectively. The two models were performed by 10 times Monte Carlo method and the average results were 99.93% and 97.22%, respectively. Compared with the conventional feature selection methods (such as PCA), the LASSO algorithm can extract the feature from the high-dimensional NIR spectral data. And the accuracy and stability of the models using NIR spectral data can be improved. Furthermore, the developed algorithm is alternative to be a new feature extraction method for NIR spectral data analysis.
|
Received: 2018-10-31
Accepted: 2019-02-10
|
|
Corresponding Authors:
PAN Tian-hong
E-mail: thpan@live.com
|
|
[1] WANG Meng-dong, WANG Sheng-peng(王梦东, 王胜鹏). Journal of Huazhong Agricultural University(华中农业大学学报), 2015, 34(1): 123.
[2] Yahui L, Xiaobo Z, Tingting S,et al. Food Anal. Methods,2017:10: 1034.
[3] Balabin R M, Smirnov S V. Analytica Chimica Acta, 2011, 692(1): 63.
[4] WU Xi-yu, ZHU Shi-ping, WANG Qian, et al(吴习宇, 祝诗平, 王 谦, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2018, 38(8): 2369.
[5] Xu Y, Kutsanedzie F Y H, Sun H,et al. Food Anal. Methods,2018,11: 1199.
[6] LI Lu, HUANG Han-ying, LI Yi, et al(李 路, 黄汉英, 李 毅,等). Food and Fermentation Industries(食品与发酵工业), 2018, 44(2): 87.
[7] Saptoro A, Tadé M, Vuthaluru H. Chemical Product and Process Modeling, 2012, 7(1): 1.
[8] Wu X, Wu B, Sun J, et al. Journal of Food Process Engineering, 2017, 40(2): 23.
[9] LIU Pi-lian, WANG Xiao, LIU Mu-hua, et al(刘丕莲, 王 晓, 刘木华,等). Chinese Journal of Pesticide Science(农药学学报), 2014, 16(1): 106.
[10] Frank A F, Hlutkowsky C, Bemis L, et al. NeuroImage, 2019, 184: 68.
[11] Zhang Liguo, Zhang Xin, Ni Lijun, et al. Food Chemistry, 2014, 145: 342.
[12] Yosra A, Estrella Funes L Gabriel Beltran M, et al. Journal of Near Infrared Spectroscopy, 2015, 23(2): 111.
[13] Teye E, Huang X Y, Lei W, et al. Food Research International, 2014, 55: 288.
[14] Verleker A P, Shaffer M, Fang Q, et al. Applied Optics, 2017, 56(4): 1131. |
[1] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[2] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[3] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[4] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[5] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[6] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[7] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[8] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[9] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[10] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[11] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[12] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
[13] |
DONG Jian-jiang1, TIAN Ye1, ZHANG Jian-xing2, LUAN Zhen-dong2*, DU Zeng-feng2*. Research on the Classification Method of Benthic Fauna Based on
Hyperspectral Data and Random Forest Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3015-3022. |
[14] |
HUANG Hua1, LIU Ya2, KUERBANGULI·Dulikun1, ZENG Fan-lin1, MAYIRAN·Maimaiti1, AWAGULI·Maimaiti1, MAIDINUERHAN·Aizezi1, GUO Jun-xian3*. Ensemble Learning Model Incorporating Fractional Differential and
PIMP-RF Algorithm to Predict Soluble Solids Content of Apples
During Maturing Period[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3059-3066. |
[15] |
LI Zhong-bing1, 2, JIANG Chuan-dong2, LIANG Hai-bo3, DUAN Hong-ming2, PANG Wei2. Rough and Fine Selection Strategy Binary Gray Wolf Optimization
Algorithm for Infrared Spectral Feature Selection[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3067-3074. |
|
|
|
|