Transformer-Based Method for Segmentation of Gastric Cancer
Microscopic Hyperspectral Images
ZHANG Ran1, 2, JIN Wei1, 2, MU Ying1, YU Bing-wen2, BAI Yi-wen2, SHAO Yi-bo1, 2, PING Jin-liang3*, SONG Peng-tao3, HE Xiang-yi3, LIU Fei3, FU Lin-lin3
1. College of Control Science and Engineering, Zhejiang University, Hangzhou 310058, China
2. Huzhou Institute of Zhejiang University, Huzhou 313000, China
3. Huzhou Central Hospital, Huzhou 313000, China
Abstract:Gastric cancer is the third leading cause of cancer-related deaths globally, posing a serious threat to human life and health. Therefore, early identification of gastric cancer lesions is crucial for early diagnosis of gastric cancer. As an emerging technique, microscopic hyperspectral imaging technology can simultaneously obtain rich spectral information and spatial information of biological tissues at the microscopic level, providing a new approach for early pathological slice diagnosis. In this paper, gastric cancer microscopic hyperspectral images in the range of 400~1 000 nm were collected using a microscopic hyperspectral imaging system. The gastric cancer microscopic hyperspectral dataset containing 230 images was constructed through preprocessing, such as spectral calibration. Although spatial attention-based methods have achieved significant results in image classification, segmentation, and other fields, they still face challenges of high computational complexity and insufficient utilization of spectral information when dealing with hyperspectral images. Therefore, this paper proposes a backbone network model based on convolution and attention mechanism called Mixing Dual-Branch Transformer (MDBT). This model achieves spatial and channel feature aggregation between blocks and within blocks by alternately applying spatial and channel mixing modules. Specifically, this paper designs window attention, convolution dual branches, and spatial and channel interaction structures. This design not only reduces computational complexity but also achieves window-to-window information interaction and feature fusion through convolutional interaction, overcoming the limitation of window attention's receptive field and further improving the global modeling ability of the Transformer. In the image segmentation experiments, we adopt the UperNet model as the decode head network to reconstruct the features extracted by the backbone network to obtain the final segmentation results. Five-fold cross-validation experiments were conducted on the collected gastric cancer hyperspectral dataset, and the results show that the average priceand mIoU of this paper's model reach 85.39 and 74.66, respectively, outperforming mainstream image segmentation network models such as UNet, Swin, PVT, and VIT. Meanwhile, ablation experiments are designed to verify the optimization effects of the proposed spatial and channel dual mixing modules, convolution, window attention dual branches, and other structures on experimental results. Experimental results demonstrate that the proposed MDBT model can effectively utilize hyperspectral images' rich spatial and spectral information, improve the accuracy of gastric cancer image segmentation, and prove the research significance and application value of microscopic hyperspectral imaging technology in gastric cancer diagnosis.
Key words:Microscopic hyperspectral; Image segmentation; Deep learning; Transformer
张 然,金 伟,牟 颖,于丙文,柏怡文,邵益波,平金良,宋鹏涛,何湘漪,刘 飞,付琳琳. 基于Transformer的胃癌显微高光谱图像分割方法[J]. 光谱学与光谱分析, 2025, 45(02): 551-557.
ZHANG Ran, JIN Wei, MU Ying, YU Bing-wen, BAI Yi-wen, SHAO Yi-bo, PING Jin-liang, SONG Peng-tao, HE Xiang-yi, LIU Fei, FU Lin-lin. Transformer-Based Method for Segmentation of Gastric Cancer
Microscopic Hyperspectral Images. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(02): 551-557.
[1] Bray Freddie, Laversanne Mathieu, Sung Hyuna, et al. CA: A Cancer Journal for Clinicians, 2024, 74(3): 229.
[2] Khan Muhammad Jaleed, Khan Hamid Saeed, Yousaf Adeel, et al. IEEE Access, 2018, 6: 14118.
[3] Lu Bing, Dao Phuong D, Liu Jiangui, et al. Remote Sensing, 2020, 12(16): 2659.
[4] Bengs Marcel, Gessert Nils, Laffers Wiebke, et al. Spectral-Spatial Recurrent-Convolutional Networks for in-vivo Hyperspectral Tumor Type Classification, Medical Image Computing and Computer Assisted Intervention-MICCAI 2020: 23rd International Conference, Lima, Peru, October 4-8, 2020, Proceedings, Part Ⅲ 23, 2020.
[5] Jeyaraj Pandia Rajan, Samuel Nadar Edward Rajan,Panigrahi Bijaya Ketan. ResNet Convolution Neural Network Based Hyperspectral Imagery Classification for Accurate Cancerous Region Detection, 2019 IEEE Conference on Information and Communication Technology, 2019.
[6] Gao Hongmin, Yang Mengran, Cao Xueying, et al. Machine Vision and Applications, 2023, 34(5): 72.
[7] Yun Boxiang, Lei Baiying, Chen Jieneng, et al. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(6): 4610.
[8] Liu Ze, Lin Yutong, Cao Yue, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
[9] Chen Qiang, Wu Qiman, Wang Jian, et al. Mixformer: Mixing Features Across Windows and Dimensions, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
[10] Hu Jie, Shen Li, Sun Gang. Squeeze-and-Excitation Networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[11] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, et al. An Image is Worth 16×16 Words: Transforms for Image Recognition at Scale, 2021. arXiv: 2010. 11929.
[12] Wang Wenhai, Xie Enze, Li Xiang, et al. Computational Visual Media, 2022, 8(3): 415.
[13] MMSegmentation Contributors. 2020, https://github.com/open-mmlab/mmsegmentation.
[14] Xiao Tete, Liu Yingcheng, Zhou Bolei, et al. Unified Perceptual Parsing for Scene Understanding, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
[15] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[16] Ronneberger Olaf, Fischer Philipp, Brox Thomas. U-net: Convolutional Networks for Biomedical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part Ⅲ 18, 2015.
[17] Chu Xiangxiang, Tian Zhi, Wang Yuqing, et al. Advances in Neural Information Processing Systems, 34(Neurl PS 2021), 9355.
[18] Xie Enze, Wang Wenhai, Yu Zhiding, et al. Advances in Neural Information Processing Systems 34 (Neurl PS 2021), 12077.