|
|
|
|
|
|
Research on Effectiveness of the Pre-Training Model in Improving the Performance of Spectral Feature Extraction |
REN Ju-xiang1, LIU Zhong-bao2* |
1. College of Information Engineering, Shanxi Vocational University of Engineering Science and Technology, Jinzhong 030619, China
2. School of Information Science, Beijing Language and Culture University, Beijing 100083, China
|
|
|
Abstract The development of observation technology has led to massive spectral data. How to automatically classify these data has received attention from researchers, the most important of which is feature extraction. Given the limitations of manual processing, most of the research uses machine learning algorithms to extract feature-based spectral data. However, these machine learning algorithms cannot handle massive spectral data due to the high spatial and temporal complexities. The pre-trained models emerging in recent years have excellent feature extraction capabilities. Still, there is little research on the effectiveness of such a model in the feature extraction of spectral data. Therefore, this paper takes the stellar spectral data as the research object separately introduces the pre-training models such as BERT, ALBERT, GTP, and Convolutional Neural Networks (CNN) for feature extraction and classification of the stellar spectral data, and tries to verify the effectiveness of these pre-training models for feature extraction of stellar spectral data by comparing the experimental results. Python programming language is used to write the spectral classification program. Based on the feature extraction of the pre-trained models, the CNN model in TensorFlow 1.14 is utilized for spectral data classification. The dataset used for the experiment is the SDSS DR10 stellar spectral dataset, including K-type, F-type, and G-type. The grid search and 5-fold cross-validation are utilized to obtain the experimental optimal parameters. The BERT model has the highest classification accuracies compared to ALBERT and GPT with the same experimental conditions. In terms of the average classification accuracies, the average classification accuracies of the BERT model are 0.025 1, 0.021 5, and 0.022 5 higher than that of ALBERT, and 0.049 7, 0.042 4, and 0.043 2 higher than that of GPT, on the K-type, F-type, and G-type stellar datasets. It is easy to draw the following conclusions by analyzing the experimental results: Firstly, the classification accuracies improve with the scale increase of training data; Secondly, the same model has the highest classification accuracies on the same training dataset of K-type stellar, followed by the F-type and the G-type; Thirdly, the BERT model has the best ability of feature extraction compared with ALBERT and GPT.
|
Received: 2024-01-06
Accepted: 2024-04-11
|
|
Corresponding Authors:
LIU Zhong-bao
E-mail: liu_zhongbao@hotmail.com
|
|
[1] Singh H P, Gulati R K, Gupta R. Monthly Notices of the Royal Astronomical Society, 1998, 295(2): 312.
[2] Liu Z B, Song L P. Publications of the Astrnomical Society of the Pacific, 2015, 127(954): 789.
[3] Liu Z B. Journal of Astrophysics and Astronomy,2016, 37(2): 12.
[4] Liu W, Zhu M, Dai C, et al. Monthly Notices of the Royal Astronomical Society, 2019, 483(4): 4774.
[5] Jiang B, Wei D L, Liu J Z, et al. Universe, 2020, 6(4): 60.
[6] Zhao Z, Wei J Y, Jiang B. Advances in Astronomy, 2022, 2022: 4489359.
[7] HE Dong-yuan, LIU Wei, CAO Shuo, et al(何东远, 刘 伟, 曹 硕, 等). Journal of Beijing Normal University (Natural Science)[北京师范大学学报(自然科学版)], 2020, 56(1): 37.
[8] Shi J H, Qiu B, Luo A L, et al. Monthly Notices of the Royal Astronomical Society, 2023, 520(2): 2269.
[9] JIANG Bin, ZHAO Zi-liang, WANG Shu-ting, et al(姜 斌, 赵梓良, 王淑婷, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2020, 40(9): 2913.
[10] Devlin J, Chang M W, Lee K, et al. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 2019: 4171.
[11] Lan Z, Chen M, Goodman S, et al. Proceedings of the 8th International Conference on Learning Representations, AddisAbaba, Ethiopia, 2020: 1.
[12] Radford A, Narasimhan K, Salimans T, et al. [2024-1-5] https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. |
[1] |
LI Xin-xing1, LIANG Bu-wen1, BAI Xue-bing1, LI Na2*. Research Progress of Spectroscopy in the Detection of Soil Moisture Content[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2020, 40(12): 3705-3710. |
[2] |
CEN Yi1, ZHANG Lin-shan1,2, SUN Xue-jian1*, ZHANG Li-fu1, LIN Hong-lei1, ZHAO Heng-qian3, WANG Xue-rui4. Spectral Analysis of Main Mineral Pigments in Thangka[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39(04): 1136-1142. |
[3] |
ZHAO Heng-qian1, ZHAO Xue-sheng1*, CEN Yi2, YANG Hang2 . Research on the Impact of Absorption Feature Extraction on Spectral Difference Between Similar Minerals [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2017, 37(03): 869-874. |
[4] |
LI Xiang-ru1, FENG Chun-ming2, WANG Yong-jun1, LU Yu1 . A Novel Spectrum Feature Extraction Method [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2011, 31(10): 2856-2860. |
[5] |
GAI Ying-ying1,2,FAN Wen-jie1*,XU Xi-ru1,YAN Bin-yan1,WANG Huan-jiong3,4,LIU Yuan1 . Flower Species Identification and Coverage Estimation Based on Hyperspectral Remote Sensing Data in Hulunbeier Grassland [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2011, 31(10): 2778-2783. |
[6] |
LI Xiang-ru1, 2, 5,HU Zhan-yi1,ZHAO Yong-heng3,LI Xiao-ming4 . RVM Supervised Feature Extraction and Seyfert Spectra Classification[J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2009, 29(06): 1702-1706. |
[7] |
LI Xiang-ru1,HU Zhan-yi1*,ZHAO Yong-heng2. Supervised Feature Extraction Based on FDA and Galaxy Spectra Classification [J]. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2007, 27(09): 1898-1901. |
|
|
|
|