Research on the Classification Method of Benthic Fauna Based on
Hyperspectral Data and Random Forest Algorithm
DONG Jian-jiang1, TIAN Ye1, ZHANG Jian-xing2, LUAN Zhen-dong2*, DU Zeng-feng2*
1. College of Physics and Optoelectronic Engineering, Ocean University of China, Qingdao 266100, China
2. Key Laboratory of Marine Geology and Environment & Center of Deep Sea Research, Institute of Oceanology, Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
Abstract:This study aims to identify underwater benthic animals in situ, use random forest algorithm to achieve recognition classification detection, classify and identify target organisms for analysis, dig deeper into the data, and improve efficiency and reliability of decision making. The hyperspectral data of five common economic animals (scallop, ctenophore, veined red snail, wrinkled disc abalone, and imitation spiny ginseng) in different underwater environments were acquired, normalized and processed using random forest (Random Forest, RF) in machine learning, random forest based on principal component analysis method (Principal Component Analysis-Random Forest, PCA-RF), and random forest based on recursive feature elimination method (Recursive feature elimination- Random Forest, RFE-RF). Three random forest algorithms were used to classify five benthic species and for comparative analysis. By ranking the importance of the variables of RF, the reflection spectrum intensity data corresponding to the bands with higher ranking and higher contribution to the model were filtered. Then the top-ranked feature band data were input into the classifier, and the classification accuracy was obtained by optimizing the parameters. The classification results of the data were output to the confusion matrix, and the identification of the five samples could be seen. The lowest recognition accuracy of 64% was obtained for the veined red snail sample; the highest recognition accuracy of 100% was obtained for imitation spiny ginseng and ctenophore scallops; the recognition accuracies of 91% and 96% were obtained for the scallop and wrinkled disc abalone, respectively. The final classification accuracies of the three methods were 90.13% for RF, 95.20% for PCA-RF, and 98.74% for RFE-RF, which showed the feasibility of using the random forest algorithm in the classification of underwater hyperspectral data.
董建江,田 野,张建兴,栾振东,杜增丰. 基于随机森林算法的底栖动物高光谱数据分类方法研究[J]. 光谱学与光谱分析, 2023, 43(10): 3015-3022.
DONG Jian-jiang, TIAN Ye, ZHANG Jian-xing, LUAN Zhen-dong, DU Zeng-feng. Research on the Classification Method of Benthic Fauna Based on
Hyperspectral Data and Random Forest Algorithm. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43(10): 3015-3022.
[1] SUN Dong-yang, LIU-Hui, ZHANG Ji-hong, et al(孙东洋, 刘 辉, 张纪红, 等). Oceanologia et Limnologia Sinica(海洋与湖沼), 2021, 52(5): 1160.
[2] Li Chongyi, Guo Jichang, Cong Runmin, et al. IEEE Transactions on Image Processing, 2016, 25(12): 5664.
[3] Barat Christian, Phlypo Ronald. EURASIP Journal on Advances in Signal Processing, 2010, 2010: 512767.
[4] Redmon Joseph, Divvala Santosh, Girshick Ross, et al. You Only Look Once: Unified, Real-Time Object Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 779.
[5] Jaffe J S. IEEE Journal of Oceanic Engineering, 2014, 40(3): 683.
[6] Johnsen Sönke. Scientific American, 2000, 282(2): 80-89.
[7] Reed Irving, Yu Xiaoli. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990, 38(10): 1760.
[8] Zhao Chunhui, Yao Xifeng. IEEE Geoscience and Remote Sensing Letters, 2018, 15(11): 1760.
[9] CHANG Chein-I, CAO Hong-ju, Song Meiping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 4915.
[10] Geng Xiurui, Yang Weitun, Ji Luyan, et al. Remote Sensing Letters, 2017, 8(7): 696.
[11] Mohite J, Karale Y, Pappula S, et al. Proc SPIE, 2017, 10217: 102170P.
[12] Adam E M, Mutanga O, Rugege D, et al. International Journal of Remote Sensing, 2012, 33(2): 552.
[13] Belgiu Mariana, Drǎguţ Lucian. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114: 24.
[14] Fonseca G A, Ferreira D D, Costa F B, et al. Journal of Control, Automation and Electrical Systems, 2022, 33(2): 598.
[15] Breiman Leo. Machine Learning, 2001, 45(1): 5.
[16] Ho Tin Kam, Random Decision Forests, Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE, 1995, 278.
[17] Goldstein Benjamin Alan, Polley Eric, Farren B S Briggs. Statistical Applications in Genetics and Molecular Biology, 2011, 10(1): 32(doi: 10.2202/1544-6115.1691).
[18] Díaz-Uriarte Ramón, Alvarez de Andrés Sara. BMC Bioinformatics, 2006, 7(1): doi: 10.1186/1471-1105-7-3.
[19] Vincenzi Simone, Zucchetta Matteo, Franzoi Piero, et al. Ecological Modelling, 2011, 222(8): 1471.
[20] Dye Michelle, Mutanga Onisimo, Ismail Riyad. Geocarto International, 2011, 26(4): 275.