Fast Outlier Detection for Milk Near-Infrared Spectroscopy Analysis
LIU Rong, CHEN Wen-liang, XU Ke-xin*, QIU Qing-jun, CUI Hou-xin
State Key Laboratory of Precision Measuring Technology and Instruments, College of Precision Instruments & Opto-Electronics Engineering, Tianjin University, Tianjin 300072, China
摘要: 近红外光谱作为一种依靠模型对物化性质进行分析的技术,对光谱数据的准确性进行快速准确的判断是得到可靠分析结果的前提。但是光谱数据中奇异点的存在会在很大程度上影响多变量校正模型的准确性,从而影响模型的预测效果。文章综合利用半数重采样法(Resampling by Half-Mean,RHM)和最小半球体积法(Smallest Half-Volume,SHV)成功剔除了被测量的牛奶成分近红外光谱中的奇异点,其效果远优于传统的奇异点剔除方法,并且该方法具有简单快速、计算量小、数值稳定等特点,非常适用于在线分析和其他类型的光谱数据中奇异点的检测。
关键词:奇异点;近红外光谱;马氏距离;杠杆值;SHV;RHM
Abstract:Near-infrared spectroscopy is a fast and efficient analytical technique based on multivariate calibration model, which correlates near-infrared spectra with the property of samples (such as concentration). The reliability of analytical results depends mostly on the accuracy of measured spectra. But outliers do not make for reliable data. The authors combined RHM (Resampling by Half-Means) with SHV (Smallest Half-Volume) method to detect the outliers of the near-infrared spectra of milk samples, and the results were satisfactory. The performance of the new method is superior to the traditional outliers detecting algorithms such as Mahalanobis distances and hat matrix leverage. And this combined method is simple and fast to use, conceptually clear, and numerically stable, so it is recommended to be used for the detection of multiple outliers in multivariate data, especially the online measurement and discriminant analysis.