光谱学与光谱分析 |
|
|
|
|
|
Study on an Algorithm for Near Infrared Singular Sample Identification Based on Strong Influence Degree |
WU Zhao-na1, DING Xiang-qian2, GONG Hui-li1*, DONG Mei3, WANG Mei-xun3 |
1. College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China 2. Center of Information Engineering, Ocean University of China, Qingdao 266071, China 3. Linyi Tobacco Co., Ltd. of Shangdong Province,Linyi 276000, China |
|
|
Abstract Correcting sample selection and elimination of singular sample is very important for the quantitative and qualitative modeling of near infrared spectroscopy. However, methods for identification of singular sample available are generally based on data center estimates which require an experience decision threshold, this largely limit its recognition accuracy and practicability. Aiming at the low accuracy of the existing methods of singular sample recognition problem, this paper improves the existing metric - Leverage value and presents a new algorithm for near infrared singular sample identification based on strong influence degree. This metric reduces the dependence on the data center to a certain extent, so that the normal samples become more aggregation, and the distance between the singular samples and the normal samples is opened; at the same time, in order to avoid artificial setting threshold unreasonably according to experience, this paper introduces the concept of the jump degree in the field of statistics, and proposes an automatic threshold setting method to distinguish singular samples. In order to verify the validity of our algorithm, abnormal samples of 200 representative samples were eliminated in the calibration set with using Mahalanobis distance, Leverage- Spectral residual method and the algorithm presented in this paper respectively; then through partial least squares (PLS), the rest of the calibration samples were made quantitative modelings (took Nicotine as index), and the results of quantitative modelings were made a comparative analysis; besides, 60 representative testing samples were made a prediction through the modelings; at last, all the algorithms above were made a comparison with took Root Mean Square Error of Cross Validation (RMSECV), Correlation Coefficient (r) and Root Mean Square Error of Prediction (RMSEP) as evaluation Index. The experimental results demonstrate that the algorithm for near infrared singular sample identification based on strong influence degree significantly improves the accuracy of singular sample identification over existing methods. With lower RMSECV (0.104), RMSEP (0.112) and higher r (0.983), it also contribute to boost the stability and prediction ability of the model.
|
Received: 2014-06-01
Accepted: 2014-09-05
|
|
Corresponding Authors:
GONG Hui-li
E-mail: huiligong@163.com
|
|
[1] Philip Williams, Karl Norris. Near Infrared Technology in the Agriculture and Food Industries. 2nd ed. Inc. St., American Association of Cereal Chemists, Minnesota USA: AACC, 2001. [2] CHU Xiao-li, YUAN Hong-fu, LU Wan-zhen(褚小立, 袁洪福, 陆婉珍). Modern Scientific Instruments(现代科学仪器), 2006, 16(3): 8. [3] CHU Xiao-li(褚小立). Molecular Spectroscopy Analytical Technology Combined with Chemometrics and Its Applications(化学计量学方法与分子光谱分析技术). Beijing: Chemical Industry Press(北京: 化学工业出版社), 2011. 77. [4] YAN Yan-lu, ZHANG Lu-da, CHEN Bin, et al(严衍禄, 张录达, 陈 斌, 等). Modern Instruments(现代仪器), 2011,17(5): 5. [5] Nieuwoudt H H, Prior B A, Pretorius I S, et al. Agricultural and Food Chemistry, 2004, 52(12): 3726. [6] LENG Hong-qiong, GUO Ya-dong, LIU Wei, et al(冷红琼, 郭亚东, 刘 巍, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2013,33(7): 1801. [7] ZHU Shi-ping, WANG Yi-ming, ZHANG Xiao-chao, et al(祝诗平, 王一鸣, 张小超, 等). Transactions of the Chinese Society of Agricultural Machinery(农业机械学报),2004,35(4): 115. [8] CHEN Bin, ZOU Xian-yong, ZHU Wen-jing(陈 斌, 邹贤勇, 朱文静). Journal of Jiangsu University·Natural Science Edition(江苏大学学报·自然科学版),2008,29(4): 277. [9] YANG Hu, SHAO Hua(杨 虎, 邵 华). Chinese Journal of Engineering Mathematics(工程数学学报),2009,26(1): 123. [10] ZHANG De-ran(张德然). Statistical Research(统计研究), 2003, 5: 53. |
[1] |
GAO Feng1, 2, XING Ya-ge3, 4, LUO Hua-ping1, 2, ZHANG Yuan-hua3, 4, GUO Ling3, 4*. Nondestructive Identification of Apricot Varieties Based on Visible/Near Infrared Spectroscopy and Chemometrics Methods[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 44-51. |
[2] |
BAO Hao1, 2,ZHANG Yan1, 2*. Research on Spectral Feature Band Selection Model Based on Improved Harris Hawk Optimization Algorithm[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 148-157. |
[3] |
HU Cai-ping1, HE Cheng-yu2, KONG Li-wei3, ZHU You-you3*, WU Bin4, ZHOU Hao-xiang3, SUN Jun2. Identification of Tea Based on Near-Infrared Spectra and Fuzzy Linear Discriminant QR Analysis[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3802-3805. |
[4] |
LIU Xin-peng1, SUN Xiang-hong2, QIN Yu-hua1*, ZHANG Min1, GONG Hui-li3. Research on t-SNE Similarity Measurement Method Based on Wasserstein Divergence[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3806-3812. |
[5] |
BAI Xue-bing1, 2, SONG Chang-ze1, ZHANG Qian-wei1, DAI Bin-xiu1, JIN Guo-jie1, 2, LIU Wen-zheng1, TAO Yong-sheng1, 2*. Rapid and Nndestructive Dagnosis Mthod for Posphate Dficiency in “Cabernet Sauvignon” Gape Laves by Vis/NIR Sectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3719-3725. |
[6] |
WANG Qi-biao1, HE Yu-kai1, LUO Yu-shi1, WANG Shu-jun1, XIE Bo2, DENG Chao2*, LIU Yong3, TUO Xian-guo3. Study on Analysis Method of Distiller's Grains Acidity Based on
Convolutional Neural Network and Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(12): 3726-3731. |
[7] |
LUO Li, WANG Jing-yi, XU Zhao-jun, NA Bin*. Geographic Origin Discrimination of Wood Using NIR Spectroscopy
Combined With Machine Learning Techniques[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3372-3379. |
[8] |
ZHANG Shu-fang1, LEI Lei2, LEI Shun-xin2, TAN Xue-cai1, LIU Shao-gang1, YAN Jun1*. Traceability of Geographical Origin of Jasmine Based on Near
Infrared Diffuse Reflectance Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3389-3395. |
[9] |
YANG Qun1, 2, LING Qi-han1, WEI Yong1, NING Qiang1, 2, KONG Fa-ming1, ZHOU Yi-fan1, 2, ZHANG Hai-lin1, WANG Jie1, 2*. Non-Destructive Monitoring Model of Functional Nitrogen Content in
Citrus Leaves Based on Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3396-3403. |
[10] |
HUANG Meng-qiang1, KUANG Wen-jian2, 3*, LIU Xiang1, HE Liang4. Quantitative Analysis of Cotton/Polyester/Wool Blended Fiber Content by Near-Infrared Spectroscopy Based on 1D-CNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3565-3570. |
[11] |
HUANG Zhao-di1, CHEN Zai-liang2, WANG Chen3, TIAN Peng2, ZHANG Hai-liang2, XIE Chao-yong2*, LIU Xue-mei4*. Comparing Different Multivariate Calibration Methods Analyses for Measurement of Soil Properties Using Visible and Short Wave-Near
Infrared Spectroscopy Combined With Machine Learning Algorithms[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3535-3540. |
[12] |
KANG Ming-yue1, 3, WANG Cheng1, SUN Hong-yan3, LI Zuo-lin2, LUO Bin1*. Research on Internal Quality Detection Method of Cherry Tomatoes Based on Improved WOA-LSSVM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3541-3550. |
[13] |
HUANG Hua1, LIU Ya2, KUERBANGULI·Dulikun1, ZENG Fan-lin1, MAYIRAN·Maimaiti1, AWAGULI·Maimaiti1, MAIDINUERHAN·Aizezi1, GUO Jun-xian3*. Ensemble Learning Model Incorporating Fractional Differential and
PIMP-RF Algorithm to Predict Soluble Solids Content of Apples
During Maturing Period[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3059-3066. |
[14] |
CHEN Jia-wei1, 2, ZHOU De-qiang1, 2*, CUI Chen-hao3, REN Zhi-jun1, ZUO Wen-juan1. Prediction Model of Farinograph Characteristics of Wheat Flour Based on Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3089-3097. |
[15] |
GUO Ge1, 3, 4, ZHANG Meng-ling3, 4, GONG Zhi-jie3, 4, ZHANG Shi-zhuang3, 4, WANG Xiao-yu2, 5, 6*, ZHOU Zhong-hua1*, YANG Yu2, 5, 6, XIE Guang-hui3, 4. Construction of Biomass Ash Content Model Based on Near-Infrared
Spectroscopy and Complex Sample Set Partitioning[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3143-3149. |
|
|
|
|