|
|
|
|
|
|
Variable Selection Methods in Spectral Data Analysis |
LI Yan-kun1*, DONG Ru-nan1, ZHANG Jin2, HUANG Ke-nan3, MAO Zhi-yi4 |
1. Department of Environmental Science and Engineering, North China Electric Power University, Hebei Key Lab of Power Plant Flue Gas Multi-Pollutants Control, Baoding 071003, China
2. School of Food Science, Guizhou Medical University, Guiyang 550025, China
3. The 82nd Army Group Hospital of the Chinese People’s Liberation Army, Baoding 071000, China
4. Tianjin Building Material Science Research Academy, Tianjin 300110, China |
|
|
Abstract How to extract useful information from massive or high-dimensional data is a huge challenge for current data analysis and a hot spot of current research. Variable selection technology can extract feature information variables from numerous and complex measurement data, and achieve the purpose of simplifying multivariate model and even improving the model’s prediction performance. In spectral analysis, the measurement data will inevitably contain interference and irrelevant information variables and the multicollin earity among variables, which will affect the robustness and prediction ability of the model. Therefore, the variable(wavelength) selection methods have progressed greatly in the research and application of spectral analysis. Based on the related pieces of literature and the author’s research experiences, this paper summarizes the proposals, characteristics, developments, categories, comparisons and applications in recent five yearsof methods for selecting variables not only in near-infrared spectra area but also in fields of mid-infrared spectra, Raman spectra and other spectra. The parameters as their criteria or thresholds for evaluating the importance of variables and the strategies or tracks of selecting variables are vital. Moreover, each method has its advantages and limitations. In practice, it is necessary to select the appropriate method according to the characteristics of boththe method and the object. Key contents: (1) Compared the wavelength selection, and wavelength interval selection methods; (2) Summarized the different variable selection methods based on PLS model parameters; (3) Classified and overviewed the variable selection methods according to the strategiesof searching and selection of variables. Finally, we discuss the problems of variable selection methods (such as overfitting and instability etc.) appearing in the actual system and the corresponding solutions. Meantime, there look forward to the research trend, development prospect and application direction of the variable selection methods. Among them, new criteria for evaluating the importance and new selection strategy of variables still require further research. It is expected that this paper will play a positive role in promoting the follow-up researches and applications of variable selection technology.
|
Received: 2020-11-01
Accepted: 2021-02-16
|
|
Corresponding Authors:
LI Yan-kun
E-mail: liyankun@ncepu.edu.cn;liyankun_ncepu@foxmail.com
|
|
[1] Yun Y H, Li H D, Deng B C, et al. Trac-Trend Anal. Chem., 2019, 113: 102.
[2] CHU Xiao-li, YUAN Hong-fu, LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Progress in Chemistry(化学进展), 2004, 16(4): 528.
[3] Nie M P, Meng L W, Chen X J, et al. J. Chemometr., 2019, 33(4): e3113.
[4] Mehmood T, Ahmed B J. Chemometrics, 2016, 30(1): 4.
[5] Wold S, Albano C, Dunll M. Pattern Regression Finding and Using Regularities in Multi-variate Data. London: Analysis Appfied Science Publication, 1983.
[6] Centner V, Massart D L. Denoord O E, et al. Anal. Chem., 1996, 68: 3851.
[7] Cai W S, Li Y K, Shao X G. Chemom. Intell. Lab. Syst., 2008, 90(2): 188.
[8] Norgaard L, Saudland A, Wagner J, et al. Appl. Spectrosc., 2000, 54(3): 413.
[9] Jiang J H, Berry R J, Siesler H W, et al. Anal. Chem., 2002, 74(14): 3555.
[10] Li H D, Liang Y Z, Xu Q S, et al. Anal. Chim. Acta, 2009, 648(1): 77.
[11] Wold S, Johansson E, Cocchi M. 3D-QSAR in Drug Design, Theory, Methods, and Applications. Leiden:ESCOM Science Publishers, 1993.
[12] Fisher R A. The Design of Experiments. Edinburgh:Oliver and Boyd. 1935.
[13] Lindgren F, Geladi P, Rännar S, et al. J. Chemometr., 1994, 8(5): 349.
[14] Forina M, Casolino C, Millan C P. J. Chemometr., 1999, 13(2): 165.
[15] Chen D, Hu B, Shao X, et al. Analyst, 2004, 129(7): 664.
[16] Li Y K, Jing J. Chemom. Intell. Lab. Syst., 2014, 130(130): 45.
[17] Li C, Zhao T L, Li C, et al. Food Chem., 2017, 221(4): 990.
[18] Li Y K. Anal. Methods, 2012, 4(1): 254.
[19] NIU Xiao-ying, SHAO Li-min, ZHAO Zhi-lei, et al(牛晓颖, 邵利敏, 赵志磊, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2019, 39(2): 443.
[20] ZHAO Fang, PENG Yan-kun(赵 芳, 彭彦昆). Chinese Journal of lasers(中国激光), 2017, 44(11): 243.
[21] Ding Y, Xia G Y, Ji H W, et al. Anal. Methods, 2019, 11(29): 3657.
[22] Miao X X, Miao Y, Gong H R, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2021, 257: 119700.
[23] Pereira Rainha K, Tristão do Carmo Rocha J, Tavares Rodrigues R R, et al. Anal. Lett., 2019, 52(18): 2914.
[24] XU Liang, YAN Liang-liang, SAI Jilahu, et al(许 良, 闫亮亮, 塞击拉呼, 等). Computers and Applied Chemistry(计算机与应用化学), 2016, 33(4): 415.
[25] XIE Jun, MA Hui, PAN Tao(谢 军, 马 辉, 潘 涛). Chinese Journal of Analysis Laboratory(分析试验室), 2015, 34(3): 255.
[26] Wang S H,Zhao Y,Hu R, et al. Chinese J. Anal. Chem., 2019, 47(4): e19034.
[27] Cramer J A,Kramer K E,Johnson K J, et al. Chemom. Intell. Lab. Syst., 2008, 92(1): 13.
[28] Nie L X, Dai Z, Ma S C. Analytical Letters, 2016, 49(14): 2259..
[29] SHI Yan, SUN Dong-mei,XIONG Jing,et al(石 岩, 孙冬梅, 熊 婧,等). Chinese Pharmaceutical Journal(中国药学杂志),2018,53(14): 1216.
[30] Hu L Q, Yin C L, Ma S, et al. Spectrochim. A, 2018, 205: 207.
[31] Zhang X, Li W, Yin B, et al. Spectrochim. Acta A, 2013, 114: 350.
[32] Li W, Zhang X, Zheng K Y, et al. J. AOAC Int., 2015, 98(1): 183.
[33] Zheng K Y, Feng T, Zhang W, et al. Chemom. Intell. Lab. Syst., 2019, 191: 109.
[34] Ferreira D S, Poppi R J, Lima Pallone J A. J. Cereal Sci., 2015, 64: 43.
[35] Gosselin R,Rodrigue D,Duchesne C. Chemom. Intell. Lab. Syst., 2010, 100(1): 12.
[36] Xu H, Liu Z C, Cai W S, et al. Chemom. Intell. Lab. Syst., 2009, 97(2): 189.
[37] Mao Z Y, Shan R F, Wang J J, et al. Spectrochim. Acta A, 2014, 128: 711.
[38] XIE Huan, CHEN Zheng-guang(谢 欢, 陈争光). Analytical Chemistry(分析化学), 2019, 47(12): 1987.
[39] YUN Yong-huan, DENG Bai-chuan, LIANG Yi-zeng(云永欢, 邓百川, 梁逸曾). Chinese Journal of Analytical Chemistry(分析化学), 2015, 43(11): 1638.
[40] Ma X P, Pang J F, Dong R N, et al. J. Food Compos. Anal., 2020, 91: 103509.
[41] Li Y K, Ma X P, Huang K N, et al. Indian J. Biochem. Bio., 2019, 56(1): 53.
[42] Li Y K, Zeng X C. Anal. Methods, 2016, 8: 183.
[43] Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor, Mich: University of Michigan Press, 1992.
[44] Metropolis N, Rosenbluth A W, Rosenbluth M N, et al. J. Chem. Phys., 1953, 21(6): 1087.
[45] Kennedy J, Eberhart R. Particle Swarm Optimization, IEEE International Conference on Neural Networks, Perth, 1995, 4: 1942.
[46] Colorni A, Dorigo M, Maniezzo V, et al. Distributed Optimization by Ant Colonies, Proceedings of the First European Conference on Artificial Life. Paris, 1991: 134.
[47] Mirjalili S, Mirjalili S M, Lewis A. Adv. Eng. Software, 2014, 69: 46.
[48] Deng B C, Yun Y H, Cao D S, et al. Anal. Chim. Acta, 2016, 908: 63.
[49] Yun Y H, Wang W T, Deng B C, et al. Anal. Chim. Acta, 2015, 862: 14.
[50] Deng B C, Yun Y H, Liang Y Z, et al. Analyst, 2014, 139(19): 4836.
[51] Song X Z, Huang Y, Yan H, et al. Anal. Chim. Acta, 2016, 948: 19.
[52] Yun Y H, Li H D, Wood L R E, et al. Spectrochim. Acta A, 2013, 111: 31.
[53] Moreira E D T, Pontes M J C, Galvão R K H, et al. Talanta, 2009, 79(5): 1260.
[54] Gomes A D, Galvao R K H, de Araújo M C U, et al. Microchem J. 2013, 110: 202.
[55] Araujo M C U, Saldanha T C B, Galvao R K H, et al. Chemom. Intell. Lab. Syst., 2001, 57(2): 65.
[56] Yu Q, Li J, Yao L, et al. J. Appl. Remote Sens., 2018, 12(3): 036019.
[57] Fisher R A. Annals of Eugenics, 1936, 7: 179.
[58] PANG Jia-feng, TANG Chen, LI Yan-kun, et al(庞佳烽, 汤 谌, 李艳坤, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(10): 3235.
[59] Jin Z, Yang J Y, Hu Z S, et al. Pattern Recognit., 2001, 34(7): 1405.
[60] Zhang S P, Tan Z L, Liu J, et al. Spectrochim. Acta A, 2019, 227: 117551.
[61] WU Li-zhou, WANG Xiao-hui, WANG Zhi-hui, et al(吴立周, 王晓慧, 王志辉, 等). Journal of Zhejiang A&F University(浙江农林大学学报), 2020, 37(1): 136.
[62] LI Guan-wen, GAO Xiao-hong, XIAO Neng-wen, et al(李冠稳, 高小红, 肖能文, 等). Chinese Journal of Luminescence(发光学报), 2019, 40(8): 1030.
[63] Breiman L. Mach. Learn., 2001, 45(1): 5.
[64] Boser B E, Guyon I M,Vapnik V N. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the 5th Annual Workshop on Computational Learning Theory, Pittsburgh, MD: ACM Press, 1992: 144.
[65] Zhang R Q, Zhang F Y, Chen W C, et al. Chemom. Intell. Lab. Syst., 2019, 184: 132.
[66] Tibshirani R. J. R. Stat. Soc. B, 1996, 58(01): 267.
[67] Zou H, Hastie T. Regression Shrinkage and Selection via the Elastic Net, With Application to Microarrays, 2003: 1.
[68] Hoerl A E, Kennard R W. Technometrics, 1970, 12(1): 55.
[69] Shan R F, Cai W S, Shao X G. Chemom. Intell. Lab. Syst., 2014, 131: 31.
[70] Shao X G, Du G R, Jing M, et al. Chemom. Intell. Lab. Syst., 2012, 114: 44.
[71] Zhang J, Cui X Y, Cai W S, et al. J. Chemom., 2018, 32(11): e2971.
[72] Zhang J, Cui X Y, Cai W S, et al. Sci. China Chem., 2019, 62(02): 271.
[73] Xu H, Cai W S, Shao X G. Anal. Methods, 2010, 2: 289.
[74] Lin Y W, Deng B C, Wang L L, et al. Chemom. Intell. Lab. Syst., 2016, 159: 196.
[75] Lin Y W, X N, Wang L L, et al. Chemom. Intell. Lab. Syst., 2017, 168: 62.
[76] Mehmood T, Liland K H, Snipen L, et al. Chemom. Intell. Lab. Syst., 2012, 118: 62. |
[1] |
HUANG You-ju1, TIAN Yi-chao2, 3*, ZHANG Qiang2, TAO Jin2, ZHANG Ya-li2, YANG Yong-wei2, LIN Jun-liang2. Estimation of Aboveground Biomass of Mangroves in Maowei Sea of Beibu Gulf Based on ZY-1-02D Satellite Hyperspectral Data[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3906-3915. |
[2] |
LIU Fei1, TAN Jia-jin1*, XIE Gu-ai2, SU Jun3, YE Jian-ren1. Early Diagnosis of Pine Wilt Disease Based on Hyperspectral Data and Needle Resistivity[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3280-3285. |
[3] |
LI Xin-xing1, 2, ZHANG Ying-gang1, MA Dian-kun1, TIAN Jian-jun3, ZHANG Bao-jun3, CHEN Jing4*. Review on the Application of Spectroscopy Technology in Food Detection[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(08): 2333-2338. |
[4] |
JIN Cheng-liang1, WANG Yong-jun2*, HUANG He2, LIU Jun-min3. Application of High-Dimensional Infrared Spectral Data Preprocessing in the Origin Identification of Traditional Chinese Medicinal Materials[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(07): 2238-2245. |
[5] |
YANG Liu1, GUO Zhong-hui1, JIN Zhong-yu1, BAI Ju-chi1, YU Feng-hua1, 2, XU Tong-yu1, 2*. Inversion Method Research of Phosphorus Content in Rice Leaves Produced in Northern Cold Region Based on WPA-BP[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(05): 1442-1449. |
[6] |
ZHAO Ting-ting1, 3, WANG Ke-jian1, 3*, SI Yong-sheng1, 3, SHU Ying2, HE Zhen-xue1, 3, WANG Chao1, 3, ZHANG Zhi-sheng2*. Freshness Detection of Lamb Based on AW-OPS Hyperspectral
Wavelength Selection Method[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 830-837. |
[7] |
LI Chun-qiang1, 2, GAO Yong-gang1, 2, XU Han-qiu1, 2*. Cross Comparison Between Landsat New Land Surface Temperature
Product and the Corresponding MODIS Product[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 940-948. |
[8] |
LIU Qing-song1, DAN You-quan1, YANG Peng2, XU Luo-peng1, YANG Fu-bin1, DENG Nan1. Simulation of Emission Spectrum of Abyssal Methane Based on
HITRAN Database[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(09): 2714-2719. |
[9] |
BAI Zi-jin1, PENG Jie1*, LUO De-fang1, CAI Hai-hui1, JI Wen-jun2, SHI Zhou3, LIU Wei-yang1, YIN Cai-yun1. A Mid-Infrared Spectral Inversion Model for Total Nitrogen Content of Farmland Soil in Southern Xinjiang[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(09): 2768-2773. |
[10] |
ZHANG Yan1, 2, 3,WU Hua-rui1, 2, 3,ZHU Hua-ji1, 2, 3*. Hyperspectral Latent Period Diagnosis of Tomato Gray Mold Based on TLBO-ELM Model[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(09): 2969-2975. |
[11] |
GUO Yang1, GUO Jun-xian1*, SHI Yong1, LI Xue-lian1, HUANG Hua2, LIU Yan-cen1. Estimation of Leaf Moisture Content in Cantaloupe Canopy Based on
SiPLS-CARS and GA-ELM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(08): 2565-2571. |
[12] |
ZHENG Yi1, 2, 3, WANG Yao1, 2, LIU Yan1, 2*. Study on Classification and Recognition of Mountain Meadow Vegetation Based on Seasonal Characteristics of Hyperspectral Data[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(06): 1939-1947. |
[13] |
PAN Dong-rong1, 2, HAN Tian-hu2, YAN Hao-wen1*. Spatiotemporal Dynamics of Vegetation Coverage in Different Ecological Areas of the Qilian Mountains Based on Spectral Data[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(04): 1192-1198. |
[14] |
XIAO Shi-jie1, WANG Qiao-hua1, 2*, LI Chun-fang3, 4, DU Chao3, ZHOU Zeng-po4, LIANG Sheng-chao4, ZHANG Shu-jun3*. Nondestructive Testing and Grading of Milk Quality Based on Fourier Transform Mid-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(04): 1243-1249. |
[15] |
DENG Shi-yu1, 2, LIU Cheng-zhi1, 4*, TAN Yong3*, LIU De-long1, ZHANG Nan1, KANG Zhe1, LI Zhen-wei1, FAN Cun-bo1, 4, JIANG Chun-xu3, LÜ Zhong3. A Combination of Multiple Deep Learning Methods Applied to Small-Sample Space Objects Classification[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(02): 609-615. |
|
|
|
|