基于熵学习机的恒星光谱分类
刘忠宝1, 任娟娟2, 宋文爱1,*, 张静1, 孔啸2, 富丽贞1
1.中北大学软件学院, 山西 太原 030051
2.中国科学院光学天文重点实验室, 北京 100012
摘要

数据挖掘被广泛应用于恒星光谱分类。 为了提高传统光谱分类方法性能,提出熵学习机(Entropy-based Learning Machine, ELM)。 在该方法中, 熵用来刻画分类的不确定性。 为了得到理想的分类结果, 分类的不确定性应最小, 基于此, 可得ELM的最优化问题。 ELM在处理二分类问题和稀有光谱发现等方面具有一定优势。 SDSS中K型、 F型、 G型恒星光谱数据集上的比较实验表明: ELM在进行恒星光谱分类时, 其分类性能优于 k近邻( k Nearest Neighbor)和支持向量机(Support Vector Machine)等传统分类方法。

关键词: 数据挖掘; 恒星光谱分类; ; 斯隆数字巡天
中图分类号:P144.1 文献标志码:A
Stellar Spectra Classification with Entropy-Based Learning Machine
LIU Zhong-bao1, REN Juan-juan2, SONG Wen-ai1,*, ZHANG Jing1, KONG Xiao2, FU Li-zhen1
1. School of Software, North University of China, Taiyuan 030051, China
2. Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China
*Corresponding author e-mail: songwenai@nuc.edu.cn
Abstract

Data mining are widely used in the stellar spectra classification. In order to improve the efficiencies of traditional spectra classification methods, Entropy-based Learning Machine (ELM) was proposed in this paper. The entropy was used to describe the uncertainty of classification in ELM. In order to obtain the desired classification efficiencies, the classification uncertainty should be minimized, based on which, we can obtain the optimization problem of ELM. It can be verified that ELM performs well in the binary classification and in the rare spectra mining. Several comparative experiments on the 4 subclasses of K-type spectra, 3 subclasses of F-type spectra and 3 subclasses of G-type spectra from Sloan Digital Sky Survey (SDSS) verified that ELM performs better than kNN ( k Nearest Neighbor) and SVM (Support Vector Machine) in dealing with the problem of stellar spectra classification on the SDSS datasets.

Keyword: Data mining; Stellar spectra classification; Entropy; Sloan digital sky survey (SDSS)

Introduction

With the development of data mining algorithms, intelligent classification methods are widely used in the spectra classification. Many popular methods have been proposed by researchers. Autoclass was an effective spectra classification based on Bayesian statistic and it performed well on distinguishing some new spectra[1]. Two-layer BP neural network was utilized to classify the subclasses of the stellar spectra[2]. Wavelet transform was used to analyze the spectra[3]. The self organization mapping algorithm based on unsupervised artificial neural network was proposed by Mahdi, the advantage of which was that it can complete the classification task without training datasets[4]. A new artificial neural network proposed by Navarro performed well on classifying the spectra with low signal-noise ratio[5]. A new spectra classification method based on nearest neighbor algorithm was proposed by Li, which can realize the incremental learning without the training process[6]. A spectra classification method based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) was proposed by Qin[7]. An automatic M-type spectra recognition method based on wavelet transform was proposed by Liu[8]. A spectra classification based on generalized discriminant analysis was proposed by Xu[9]. The advantages of the cover algorithm and kernel technique were utilized to classify the celestial spectra by Yang[10]. A post-processing method of star spectra classification rule based on predicate logic was presented by Cai[11]. Non-linearly Assembling Learning Machine (NALM) was proposed and used in the large-scale stellar spectral classification by Liu[12].

The spectra classification methods mentioned above performed well in practice, but they were insensitive to the outlier data, which may be the reflection of the unknown celestial bodies. In view of this, Entropy-based Learning Machine (ELM) was proposed in this paper. The advantages of ELM were (1) it took the data distribution into consideration in the process of classification, (2)and its performance was much better than several traditional classifiers, which can be verified in the section of “ comparative experiments’ .

The rest of this paper is organized as follows. In Section 2, the background knowledge is provided; ELM proposed in Section 3; The comparative experiments on the SDSS datasets are shown in Section 4; Section 5 conclude our work.

1 Entropy Theory

Entropy[13] has been used to describe the uncertainty of a system, which is widely used in information theory, life sciences, ecology and other related areas. In information theory, the uncertainty of an information system is described by entropy: the entropy is larger, the uncertainty is greater, and vice verse.

Several entropies are used in practice, such as conditional entropy, Shannon entropy, square entropy and cube entropy. In this paper, the square entropy was used to describe the uncertainty of spectra classification. Suppose the given dataset is X={x1, x2, …, xN} where N is the data size. The square entropy can be defined as follows.

Hp=1-p(x)2dx(1)

where p(x) is the Parzen window[14], which reflects the distribution of the training data. The Parzen window p(x) can be defined as follows.

p(x)=i=1Nαik(x, xi)(2)s.t. i=1Nαi=1(3)αi0,  i=1, 2, , N(4)

2 Entropy-based Learning Machine
2.1 The optimization problem

In the binary classification, assume the training dataset X=[x1, x2, …, xN], where xi(i=1, …, N) is the training sample, N is the data size. When 1≤ iN1, the label of xi is equivalent to 1; When N1+1≤ jn, the label of xj is equivalent to -1.

The Parzen window of two classes can be respectively defined as follows.

p1(x)=i=1N1αikδ(x, xi)(5)s.t.i=1N1αi, αi0,  i=1, 2, , N1(6)p2(x)=j=N1+1nβjkδ(x, xj)(7)s.t. j=N1+1nβj=1,  βj0,  i=N1+1, , n(8)

where kδ (x, xi) is the Gaussian kernel function and the parameter δ is the variance.

It can be seen from the concept of entropy that in order to efficiently classify the stellar spectra, the entropies of two classes should be minimized. The optimization problem of ELM is described as follows.

min(1-p1(xi)2dx)+γ(1-p2(xj)2dx)-νρ(9)s.t. yi(p1(xi)-λp2(xi))> ρ,  i=1, , n(10)

where the parameter γ is the balanced factor and γ =N2/N1; The parameter ρ is the margin between the distribution of two classes; The parameter ν is the regulating factor; The parameter λ is the prior coefficient, which reflects the ratio of the prior probabilities of two classes.

By taking the equations (5) and (7) to (9), we can obtain the following objective function.

minα, β1-i=1N1j=1N1αiαjkδ(x, xi)kδ(x, xj)dx+γ-γi=N1+1nj=N1+1nβiβjkδ(x, xi)kδ(x, xj)dx-(11)

The reference[15] indicates that the following equation holds.

kδ(x, xi)kδ(x, xj)dx=k2δ(xi, xj)(12)

The objective function (11) can be simplified as follows.

minα, β1+γi=1N1j=1N1αiαjk2δ(x, xi)-γi=N1+1nj=N1+1nβiβjk2δ(xi, xj)-(13)

By taking the equations (5) to (8) into (10), we can obtain the constraint conditions.

Based on the above analysis, the original optimization problem of ELM can be transformed as follows.

minα, β1+γi=1N1j=1N1αiαjk2δ(x, xi)-γi=N1+1nj=N1+1nβiβjk2δ(xi, xj)-s.t. i=1N1αikδ(xl, xi)-λi=N1+1nβjkδ(xl, xj)> ρ, l=1, , N1λi=N1+1nβjkδ(xk, xj)-i=1N1αikδ(xk, xi)> ρ, k=N1+1, , ni=1N1αi=1,  αi0,  i=1, 2, , N1N1+1jβj=1,  βj0,  i=N1+1,  ,  n

The decision function of ELM is defined as follows.

f(x)=sgn(p(x)-p(xk))(14)

where sgn(y) is a signal function, and if y> 0, sgn(y)=1, or else, sgn(y)=0. It can be seen from equation (14) that if f(x)< 0, the spectra x belongs to one class, or else, it belongs to the other class.

3.2 One-class classification

Rare spectra mining is quite important in astronomy and rare spectra mining is based on the one-class classification method. When there exists only one class in the training dataset, we can conclude that in the objective function (9), the parameter γ and the margin ρ are equivalent to zero. The objective function of ELM can be reformulated as follows.

min 1-p(x)2dx(15)

By taking the equations (5) and (6) to (15), we can obtain the following optimization problem.

min 1-i=1Nj=1Nαiαjkδ(x, xi)kδ(x, xj)dx(16)s.t. i=1Nαi=1αi0,  i=1, 2, , N

Based on the reference [15], by taking (12) to (16), (16) can be rewritten as (17).

min1-i=1Nj=1Nαiαjkσ(xi, xj)(8)

where σ = 2δ.

Based on the above analysis, the optimization problem of ELM in dealing with rare spectra mining problem is described as follows.

min 1-i=1Nj=1Nαiαjkσ(xi, xj)s.t. i=1Nαi=1αi0,  i=1, 2, , N

3 Comparative Experiments

The performance of the proposed spectra classification method ELM was investigated in this section. The traditional classifiers such as kNN, SVM were widely used in practice, and they both performed well in the stellar spectra classification. Therefore, kNN and SVM were used to comparative with the proposed methods in the rare spectra mining. The experimental environments were Intel core i3 CPU, 4G RAM, Windows 7 and Matlab7.0. Sloan Digital Sky Survey (SDSS) and Data Release 8 were used as the experimental datasets. These data with the wavelength coverage from 3 800 to 9 000 Å have been shifted to a common rest-frame and normalized to a constant total flux.

The classification process was composed of the following steps.

Step1: The experimental dataset was divided into two parts, one for training and the other for test.

Step2: PCA was used to reduce the dimension of the stellar spectra. The details of PCA can be seen in the reference [16].

Step3: The test dataset was mapped to a low-dimensional subspace.

Step4: kNN, SVM and the proposed methods were respectively applied in the low-dimensional subspace for classification.

3.1 The binary classification experiments

The datasets consisted of 4 subclasses of K-type spectra, K1-type, K3-type, K5-type and K7-type, whose signal-to-ratios (SNRs) with 10< SNRs< 20, 3 subclasses of F-type spectra, F2-type, F5-type and F9-type, whose SNRs with 50< SNRs< 60, as listed in Table 1(a)— (b). These data with the wavelength coverage from 3 800 to 9 000 Å have been shifted to a common rest-frame and normalized to a constant total flux.

Table 1 (a) The total number of K start with 10< SNRs< 20
Table 1 (b) The total number of F start with 50< SNRs< 60

In our experiments, 30%, 40%, 50%, 60%, 70% of K-, F- spectral datasets were respectively used for training, and the remainders were used for test. Table 2(a)— (b) give the comparative experimental results.

Table 2 (a) The comparative experimental results on the K-type dataset
Table 2 (b) The comparative experimental results on the F-type dataset

It can be seen from Table 2(a)— (b) the classification accuracies of kNN, SVM and ELM rise with the size of training datasets. In the view of the classification results, ELM performs better than kNN and SVM. From average performance, we can see the classification accuracy of ELM is obviously higher than that of kNN and SVM.

3.2 The one-class classification experiments

The experimental datasets consisted of 3 subclasses of G-type spectra, G0-type, G2-type and G5-type, whose SNRs with 60< SNRs< 65, as listed in Table 3.

Table 3 The total number of G stars with 60< SNRs< 65

The rare spectra and the majority of the stellar spectra were listed in Table 4.

Table 4 The majority vs. the rareness of G stars with 60< SNRs< 65

30%, 40%, 50%, 60%, 70% of the above spectral datasets were respectively used for training, and the other data used for test. The classifiers kNN, SVM and ELM were respectively utilized to investigate their performances. The

classification accuracies are shown in Table 5, in which the first two columns of which respectively includes three parts: the total number of the training spectra or the test spectra and its proportion, the number of the rare spectra used for training or test.

Table 5 The experimental results on the G-type dataset

It can be seen from Table 5 that the classification accuracies of kNN, SVM and ELM rise with the growth of the training size. Compared with kNN, SVM, the performance of ELM on distinguishing the rare spectra is much better on the G-type dataset. In the view of the average classification accuracy, ELM performs better than kNN and SVM. It can be concluded that ELM is good at mining the rare spectra from the majority of the spectra.

4 Conclusions

In order to improve the stellar spectra classification efficiencies, Entropy-based Learning Machine (ELM) was proposed in this paper. In ELM, the entropy theory and Parzen window were introduced to construction the optimization problem. ELM took the data distribution into consideration. ELM can deal with the binary classification problem and it also performed well in mining the rare spectra. Comparative experiments on the SDSS datasets verified the effectiveness of the proposed method ELM.

The authors have declared that no competing interests exist.

参考文献
[1] Geobel J, Stutz J, Volk K, et al. A&A, 1989, 222(1-2): 5. [本文引用:1]
[2] Gulati R K, Gupta R, Gothoskar P. A&A, 1997, 322(3): 933. [本文引用:1]
[3] Stark P B, Herron M M, Matteson A. Applied Spectroscopy, 1993, 47(11): 1820. [本文引用:1]
[4] Mahdi B. Astrophysics and Space Science, 2012, 337(1): 93. [本文引用:1]
[5] Navarro S G, Corradi R L M, Mampaso A. A&A, 2012, 538(A76): 1. [本文引用:1]
[6] Li X R, Hu Z Y. International Journal of Computer Vision, 2010, 89(1): 1. [本文引用:1]
[7] Qin Dongmei, Hu Zhanyi, Zhao Yongheng. Spectroscopy and Spectral Analysis, 2003, 23(1): 182. [本文引用:1]
[8] Liu Z T, Li X R, Wu F C, et al. Chinese Journal of Electronics, 2007, 35(1): 157. [本文引用:1]
[9] Xu Xin, Yang Jinfu, Wu Fuchao, et al. Spectroscopy and Spectral Analysis, 2006, 26(10): 1960. [本文引用:1]
[10] Yang J F, Wu C F, Luo A L, et al. Pattern Recognition and Artificial Intelligence, 2006, 19(3): 368. [本文引用:1]
[11] Cai J H, Zhao X J, Sun S W, et al. RAA, 2013, 13(3): 334. [本文引用:1]
[12] Liu Z B, Song L P, Zhao W J. MNRAS, 2016, 455(4), 4289. [本文引用:1]
[13] Penalver B A, Escolano F. IEEE TNN, 2009, 20(11): 1756. [本文引用:1]
[14] Lv S G, Feng Y L. Journal of Mathematical Analysis and Applications, 2012, 386(1): 205. [本文引用:1]
[15] Kim J, Scott C D. IEEE TPAMI, 2010, 32(10): 1822. [本文引用:1]
[16] Deeming T J. MNRAS, 1964, 127: 493. [本文引用:1]