|
|
|
|
|
|
XGBOOST Based Stellar Spectral Classification and Quantized Feature |
ZHANG Xiao1,2, LUO A-li1* |
1. Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China
2. University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract Star spectral classification is a foundational work of stellar research. The Morgan-Keenan (MK) classification system which was developed in 1970s is the most widely used classical classification system. However, MK based interactive decision classification system has some difficulties when dealing with massive quantity of astronomical spectral data. Nowadays the most widely used method of automatically classification is template match which neglects measuring the spectral line. As a result, one of the most popular topics is how to extract features from massive data objectively and precisely and to apply the features for making classification decisions. In this paper, we processed the spectral data of LAMOST DR4 stars to obtain the line index as input data and used the official released labels of the spectrum as outcome. The XGBoost algorithm was applied to automatically classify the stellar spectra and rank the features. In this way, the identified and potential line indices which are sensitive to classification were revealed. Firstly, we labeled and selected the spectral data of stars with B, A, F and M by LAMOST high signal-to-noise ratio (S/N>30) with the sample size amounting to around 41. 4 million. Then, the line indices of spectral data was calculated to reduce the dimension and to filter out the redundant information. Secondly, the processed star spectral data were randomly divided to a training set and a test set. By modifying the parameters, the required classification decision tree model was fitted by training set using XGBoost algorithm and the stability and availability of the model were validated by test set to avoid over-fitting. In the meantime, the classification features were extracted by the algorithm’s own function. Finally, the branch with the highest probabilities was selected as the final decision tree model. Through experiments, it is shown that the XGBoost model has a better performance in self-adaptability under fixed parameters with less affection in data sets and the overall accuracy rate as high as 88.5%. Moreover, the output classification decision tree is more consistent with identified features and the numerical characteristics of spectrum and its corresponding range are obtainable through the model. This would shed light on providing quantitative rules for evaluating classification decision trees with numerical spectral features.
|
Received: 2018-09-07
Accepted: 2019-01-18
|
|
Corresponding Authors:
LUO A-li
E-mail: lal@nao.cas.cn
|
|
[1] Luo Ali, Zhao Yongheng, et al. Research in Astronomy and Astrophysics, 2015,15(8): 1095.
[2] Schierscher F, Paunzen E. Astronomische Nachrichten, 2011, 332(6): 597.
[3] Hampton E J, Medling A M, Groves B, et al. Monthly Notices of the Royal Astronomical Society, 2017, 470:3395.
[4] Liu Chao, Cui Wenyuan, et al. Research in Astronomy and Astrophysics, 2015, 15(8): 1137.
[5] Du Changde, Luo A, Yang H. New Astronomy, 2017, 51: 51.
[6] Worthey G,Faber S M,Gonzalez J Jesus,et al. The Astrophysical Journal Supplement Series, 94(2):687.
[7] WANG Guang-pei, PAN Jing-chang, YI Zhen-ping, et al(王光沛, 潘景昌, 衣振萍, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析),2016, 36(8):2646.
[8] Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. 785.
[9] Gray R O, Corbally C J. Stellar Spectral Classification. Princeton University Press, 2009. 160. |
[1] |
ZHANG Mei-zhi1, ZHANG Ning1, 2, QIAO Cong1, XU Huang-rong2, GAO Bo2, MENG Qing-yang2, YU Wei-xing2*. High-Efficient and Accurate Testing of Egg Freshness Based on
IPLS-XGBoost Algorithm and VIS-NIR Spectrum[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(06): 1711-1718. |
[2] |
WU Chao1, QIU Bo1*, PAN Zhi-ren1, LI Xiao-tong1, WANG Lin-qian1, CAO Guan-long1, KONG Xiao2. Application of Spectral and Metering Data Fusion Algorithm in Variable Star Classification[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(06): 1869-1874. |
[3] |
LI Shuang-chuan, TU Liang-ping*, LI Xin, WANG Li-li. Besvm: A-Type Star Spectral Subtype Classification Algorithm Based on Transformer Feature Extraction[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(05): 1575-1581. |
[4] |
GUO Feng1, ZHAO Dong-e1*, YANG Xue-feng1, CHU Wen-bo2, ZHANG Bin1, ZHANG Da-shun3MENG Fan-jun3. Research on Hyperspectral Image Recognition of Iron Fragments[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(04): 997-1003. |
[5] |
HU Zheng1, ZHANG Yan1, 2*. Effect of Dimensionality Reduction and Noise Reduction on Hyperspectral Recognition During Incubation Period of Tomato Early Blight[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(03): 744-752. |
[6] |
XU Long-xin1, 2, 3, 4, SUN Yong-hua2, 3, 4*, WU Wen-huan1, ZOU Kai2, 3, 4, HE Shi-jun2, 3, 4, ZHAO Yuan-ming2, 3, 4, YE Miao2, 3, 4, ZHANG Xiao-han2, 3, 4. Research on Classification of Construction Waste Based on UAV Hyperspectral Image[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(12): 3927-3934. |
[7] |
WU Xue1, 2, FENG Wei-wei2, 3, 4*, CAI Zong-qi2, 3, WANG Qing2, 3. Study on Rapid Recognition of Microplastics Based on Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(11): 3501-3506. |
[8] |
YANG Xin1, 2, YUAN Zi-ran1, 2, YE Yin1, 2*, WANG Dao-zhong1, 2, HUA Ke-ke1, 2, GUO Zhi-bin1, 2. Winter Wheat Total Nitrogen Content Estimation Based on UAV
Hyperspectral Remote Sensing[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(10): 3269-3274. |
[9] |
LI Rui1, LI Bo1*, WANG Xue-wen1, LIU Tao1, LI Lian-jie1,2, FAN Shu-xiang2. A Classification Method of Coal and Gangue Based on XGBoost and
Visible-Near Infrared Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(09): 2947-2955. |
[10] |
WU Ye-lan1, GUAN Hui-ning1, LIAN Xiao-qin1, YU Chong-chong1, LIAO Yu2, GAO Chao1. Study on Detection Method of Leaves With Various Citrus Pests and
Diseases by Hyperspectral Imaging[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(08): 2397-2402. |
[11] |
LU Ya-kun1, QIU Bo1*, LUO A-li2, GUO Xiao-yu1, WANG Lin-qian1, CAO Guan-long1, BAI Zhong-rui2, CHEN Jian-jun2. Classification of 2D Stellar Spectra Based on FFCNN[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(06): 1881-1885. |
[12] |
WANG Ming-xuan, WANG Qiao-yun*, PIAN Fei-fei, SHAN Peng, LI Zhi-gang, MA Zhen-he. Quantitative Analysis of Diabetic Blood Raman Spectroscopy Based on XGBoost[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(06): 1721-1727. |
[13] |
LIU Zhong-bao1, WANG Jie2*. Research on the Improvement of Spectra Classification Performance With the High-Performance Hybrid Deep Learning Network[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2022, 42(03): 699-703. |
[14] |
YANG Si-jie1,2, FENG Wei-wei2,3,4*, CAI Zong-qi2,3, WANG Qing2,3. Study on Rapid Recognition of Marine Microplastics Based on Raman Spectroscopy[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(08): 2469-2473. |
[15] |
MA Yang, ZHANG Ji-fu, CAI Jiang-hui, YANG Hai-feng, ZHAO Xu-jun*. Parallel Extraction and Analysis of Abnormal Features of QSO Spectra Based on Sparse Subspace[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2021, 41(04): 1086-1091. |
|
|
|
|