|
|
|
|
|
|
Kernel Mahalanobis-Driven Clustering for Outlier Detection in
Mid-Infrared Spectroscopy |
HU Rui1, 2, LI Yu-jun1, 2*, JIAO Shang-bin1, 2, SUN Peng-cheng1, 2, WU Chen-yan1, 2 |
1. School of Automation and Information Engineering, Xi'an University of Technology, Xi'an 710048, China
2. Shaanxi Province Complex System Control and Intelligent Information Processing Key Laboratory, Xi'an 710048, China
|
|
|
Abstract In the quantitative analysis of alkane gas mixtures by infrared spectroscopy, the manual calibration sample preparation process is complex (requiring precise control of parameters such as multi-component gas concentration, ambient temperature, and gas pressure), and operational deviations can easily lead to the deviation of spectral data from the calibration concentration, resulting in anomalous samples. The traditional single anomaly detection method is difficult to handle complex anomaly patterns in high-dimensional and nonlinear data effectively. To address this problem, this paper proposes a hybrid anomaly detection framework that synergizes kernel martens distance (KMD) and K-means clustering, which innovatively combines kernelized feature mapping with dynamic density clustering, thereby overcoming the matrix singularity problem and the limitation of insufficient sensitivity to local anomalies in high-dimensional sample scenarios. In this paper, we use the kernel Marginal Distance (KMD) to construct a nonlinear high-dimensional feature space, quantify the anomaly degree of the spectral-concentration mapping relationship through the covariance matrix, and set a 95% confidence threshold (χ2_{0.95}) to screen potential anomaly candidate samples. Combined with the K-means algorithm, the training set is divided into seven optimisation sub-clusters (determined based on the elbow rule), and a dynamic threshold is set to reject anomalous samples by the standard deviation of the distance from the test sample to the nearest centre of mass. The final dual-threshold joint decision-making is achieved through the logical and (AND) mechanism. The experiment was carried out using a German Bruker Tensor27 spectrometer to collect 938 sets of samples (wavelength 2.5~25 μm, resolution 4 cm-1), with methane and ethane component gases as the focus of analysis. The model was validated by a partial least squares (PLS) regression model and compared with the traditional Marginal Distance (MD) method. The results showed that after excluding the anomalous samples, the relative error (MRE) of methane concentration prediction decreased from 38.29% to 18.77%, which was 11.52 percentage points more than that of the MD method (30.44%). The MRE of ethane decreased from 54.51% to 26.03%, which was 13.39 percentage points more than that of the MD method (39.42%), and the accuracies of the model analyses were both increased by more than 50%. The proposed method not only theoretically breaks the bottleneck of anomaly detection in high-dimensional spaces, but also demonstrates its effectiveness in the quantitative analysis of infrared spectra of complex gas mixtures in practical applications. Compared to traditional methods, the hybrid detection framework of kernel Martens distance and K-means clustering demonstrates significant robustness in handling nonlinear and multidimensional data. The method offers a reliable and effective solution for cleaning anomaly data in the quantitative analysis of infrared spectra of alkane gas mixtures.
|
Received: 2025-03-19
Accepted: 2025-06-26
|
|
Corresponding Authors:
LI Yu-jun
E-mail: leo@xaut.edu.cn
|
|
[1] JING Wen-feng, YAN Rong-hui, CHEN Zhong-pu, et al(荆文峰,阎荣辉,陈中普,等). Mud Logging Engineering(录井工程), 2019, 30(3): 124.
[2] Griffith D W T. Applied Spectroscopy, 1996, 50(1): 59.
[3] Platonov I A, Rodinkov O V, Gorbacheva A R, et al. Journal of Analytical Chemistry, 2018, 73(2): 109.
[4] WANG Zhi-qi, YANG Hong-jie, DONG Xu-bin, et al(汪智琦,杨洪杰,董旭斌,等). Instrumentation User(仪器仪表用户), 2019, 26(11): 6.
[5] ZHANG Xin, ZHANG Zheng-dong, DU Biao, et al(张 鑫,张正东,杜 彪,等). Chemical Reagents(化学试剂), 2024, 46(8): 59.
[6] Kwasny M, Bombalska A. Sensors, 2023, 23(5): 2834.
[7] Rothman L S, Gordon I E, Babikov Y, et al. Journal of Quantitative Spectroscopy and Radiative Transfer, 2013, 130: 4.
[8] LI Shao-min, SUN Li-qun(李绍民,孙利群). Acta Physica Sinica(物理学报), 2023, 72(1): 010701.
[9] WANG Xing, HUANG Xiao-yu, LIU Xuan-pu, et al(汪 星,黄小瑜,刘瑄璞,等). Journal of Xidian University(西安电子科技大学学报), 2018, 45(4): 106.
[10] LI Shu-yuan, ZHAO Jian, ZHAO Yi-jun(李书缘,赵 俭,赵乂鋆). Metrology & Measurement Technology(计测技术), 2024, 44(1): 80.
[11] LI Tong, ZHAI Yong-nan, HUA Ying-fan(李 彤,翟永南,华英凡). Systems Engineering-Theory & Practice(系统工程理论与实践), 2024, 44(2): 752.
[12] Gu H, Wang L. International Journal of Chemical Engineering, 2022, 2022: 8460463 (doi: 10.1155/2022/8460463).
[13] Muandet K, Fukumizu K, Sriperumbudur B, et al. Foundations and TrendsD○R in Machine Learning, 2017, 10(1-2): 1: 10.1561/2200000060.
[14] WEI Meng-sha, GONG Yun, ZHANG Xiao-yu, et al(卫梦莎,龚 云,张小宇,等). Bulletin of Surveying and Mapping(测绘通报), 2024, (9): 117.
[15] Chang C C, Lin C J. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 10.1145/1961189.1961199.
[16] Schölkopf B, Smola A J. Cambridge, MA: MIT Press, 2001. doi:10.7551/mitpress/4175.001.0001.
[17] Hoffmann H. Pattern Recognition, 2007, 40(3): 863.
[18] De Maesschalck R, Jouan-Rimbaud D, Massart D L. Chemometrics and Intelligent Laboratory Systems, 2000, 50(1): 1.
[19] Aggarwal C C. Cham: Springer, 2017.
[20] Thorndike R L. Psychometrika, 1953, 18(4): 267.
[21] LI Yu-jun, TANG Xiao-jun, LIU Jun-hua(李玉军,汤晓君,刘君华). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2010, 30(3): 774. |
[1] |
FAN Bing-rui1, 2, ZHAI Ai-ping1, WANG Dong1, LIANG Ting3, ZHANG Gen-wei2*, CAO Shu-ya2*. Research on the Recognition of Mixed Mid-Infrared Spectra of Hazardous Chemicals Based on an Improved High-Order Residual Network[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(09): 2459-2466. |
[2] |
LI Xin-yi1, KONG De-ming1*, NING Xiao-dong2, CUI Yao-yao3. Research on Emulsified Oil Spill Detection Methods Based on
Mid-Infrared Spectroscopy Technology[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(03): 631-636. |
[3] |
WANG Lei1, 2, CHEN Yuan-jie1, 3, LI Lei4, LIU Yong-hong1, 2, XU Ke-ke1, 2, YU Huan-ying1, YANG Lin-lin1, 2, DONG Cheng-ming1, 2, QIAO Lu1, 2*. Rapid Identification of Corni Frucutus From Different Habitats Based on Mid-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(03): 761-767. |
[4] |
MA Hu-yishan1, 2, PAN Nan2, LIN Zhen-yu3, CHEN Xiao-ting2, WU Jing-na4, ZHANG Fang1*, LIU Zhi-yu2*. Research Progress on the Vibrational Spectroscopy Technology in the Quality Detection of Fish Oil[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(02): 301-311. |
[5] |
XIAO Zhong-liang, YUAN Rong-yao, FU Zhuang, LIU Cheng, YIN Bi-lu, XIAO Min-zhi, ZHAO Ting-ting, KUANG Yin-jie, SONG Liu-bin*. Study on the Aging Behavior of Transformer Oil Based on Machine
Learning and Infrared Spectroscopy Technology[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(02): 434-442. |
[6] |
QIAO Lu1, 2, LIU Yong-hong1, 2, XU Ke-ke1, 2, YU Huan-ying1, CHEN Yuan-jie3, YANG Lin-lin1, 2, DONG Cheng-ming1, 2*, WANG Lei1*. Analysis of Mid-Infrared Spectral Characteristics of Soils Cultivated With Salvia Miltiorrhiza at Different Intervals Based on Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(02): 483-491. |
[7] |
YANG Cheng-en1, 2, GUO Rui-xue1, 3, XIN Ming-hao2, LI Meng4, LI Yu-ting2*, SU Ling1, 3*. Quantitative Determination of Polyphenols in Aronia Melanocarpa (Michx.) Elliott. by Mid-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(11): 3075-3081. |
[8] |
XIA Yan-qiu1, XIE Pei-yuan1, NAY MIN AUNG1, ZHANG Tao1, FENG Xin1, 2*. The Improved Genetic Algorithm is Embedded Into the Classical
Classification Algorithm to Realize the Synchronous
Identification of Small Quantity and Multi Types of
Lubricating Oil Additives[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(03): 744-750. |
[9] |
YANG Cheng-en1, 2, LI Meng3, LU Qiu-yu2, WANG Jin-ling4, LI Yu-ting2*, SU Ling1*. Fast Prediction of Flavone and Polysaccharide Contents in
Aronia Melanocarpa by FTIR and ELM[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2024, 44(01): 62-68. |
[10] |
DUAN Ming-xuan1, LI Shi-chun1, 2*, LIU Jia-hui1, WANG Yi1, XIN Wen-hui1, 2, HUA Deng-xin1, 2*, GAO Fei1, 2. Detection of Benzene Concentration by Mid-Infrared Differential
Absorption Lidar[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(11): 3351-3359. |
[11] |
LIU Bo-yang1, GAO An-ping1*, YANG Jian1, GAO Yong-liang1, BAI Peng1, Teri-gele1, MA Li-jun1, ZHAO San-jun1, LI Xue-jing1, ZHANG Hui-ping1, KANG Jun-wei1, LI Hui1, WANG Hui1, YANG Si2, LI Chen-xi2, LIU Rong2. Research on Non-Targeted Abnormal Milk Identification Method Based on Mid-Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3009-3014. |
[12] |
LIU Si-qi1, FENG Guo-hong1*, TANG Jie2, REN Jia-qi1. Research on Identification of Wood Species by Mid-Infrared Spectroscopy Based on CA-SDP-DenseNet[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 814-822. |
[13] |
YANG Cheng-en1, SU Ling2, FENG Wei-zhi1, ZHOU Jian-yu1, WU Hai-wei1*, YUAN Yue-ming1, WANG Qi2*. Identification of Pleurotus Ostreatus From Different Producing Areas Based on Mid-Infrared Spectroscopy and Machine Learning[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(02): 577-582. |
[14] |
LI Xiao1, CHEN Yong2, MEI Wu-jun3*, WU Xiao-hong2*, FENG Ya-jie1, WU Bin4. Classification of Tea Varieties Using Fuzzy Covariance Learning
Vector Quantization[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(02): 638-643. |
[15] |
FENG Hai-zhi1, LI Long1*, WANG Dong2, ZHANG Kai1, FENG Miao1, SONG Hai-jiang1, LI Rong1, HAN Ping2. Progress of the Application of MIR and NIR Spectroscopies in Quality
Testing of Minor Coarse Cereals[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(01): 16-24. |
|
|
|
|