面向可见光谱图像的跨模态双通道伪装目标检测方法

doi:10.3964/j.issn.1000-0593(2025)09-2632-10

摘要
参考文献
相关文章 (15)

全文: PDF (30305 KB)
输出: BibTeX | EndNote (RIS)

摘要：面向可见光谱图像的伪装目标检测任务旨在利用可见光谱信息检测和周围环境具有视觉一致性的伪装目标。这种视觉一致性导致的目标边界区分难和辨识性特征学习难等问题，限制了现有目标检测方法在伪装目标检测任务中的有效性。为此，本文提出一种跨模态动态协同双通道网络(CDCDN)，探索了全局-局部多层次视觉感知和视觉-语言模型(VLM)在伪装目标检测中的应用潜力。具体而言，首先，针对目标边界区分难，设计了动态协同双通道模块，通过双通道将检测过程解耦为全局信息定位和局部特征细化，从多层次的视觉角度进行针对性的检测和优化。在此基础上构建了动态信息协同及融合机制，通过全局门控约束与局部感知校正实现了全局与局部信息的相互补充和校正，从而增强了目标检测模型在目标边界模糊场景中的空间捕获能力。其次，针对辨识性特征学习难，设计了跨模态场景对象匹配模块，通过引入VLM来建立视觉和语言模态的跨模态交互，增强了目标与背景在特征空间中的差异性，从而提升了目标检测模型在缺乏辨识性特征场景中的语义区分能力。在MHCD2022和COD10K两个数据集上分别评估了mAP@0.5、mAP@0.5∶0.95和mAP@0.75指标。CDCDN在MHCD2022数据集上分别达到67.6%、42.6%和48.4%，在COD10K数据集上分别达到67.9%、40.6%和41.0%。与五种主流的目标检测方法Faster R-CNN、DETR、Lite-DETR、YOLOv5、YOLOv10相比，CDCDN在三个指标上均取得了最优的检测精度。荒地、草地、树林和雪地四种常见伪装场景的可视化检测结果进一步验证了CDCDN具有良好的场景适应性。在消融实验中，逐步消融CDCDN中的关键组件，以系统地评估其贡献，结果显示各个关键组件都有助于模型检测性能的提升。综合实验结果表明，CDCDN可准确检测和周围环境具有高度视觉一致性的伪装目标，为伪装目标检测提供了一种新的解决方案。

关键词：可见光谱；伪装目标检测；双通道解耦；信息协同；跨模态交互

Abstract：The camouflaged object detection (COD) task for visible-spectrum images aims to utilize visible-spectrum information to detect camouflaged objects that are visually consistent with their surrounding environment. This visual consistency poses challenges such as difficulty in distinguishing object boundaries and learning discriminative features, which limit the effectiveness of existing object detection methods for COD. A Cross-modal Dynamic Collaborative Dual-channel Network (CDCDN) is proposed to explore the potential of global-local multi-level visual perception and visual-language models in COD. First, to address the challenge of distinguishing object boundaries, a dynamic, collaborative, dual-channel module is designed. Through the dual channels, the detection process is decoupled into global information localizationand local feature refinement, enabling object detection and optimization from a multi-level visual perspective. A dynamic information collaboration and fusion mechanism is established, through which global and local information are mutually complemented and corrected by global gating constraints and local perception correction. The spatial capture capability of the model is enhanced in scenarios with blurred object boundaries. To address the difficulty in learning discriminative features, a cross-modal scene-object matching module is designed. By incorporating a pre-trained VLM, this module establishes cross-modal interactions between the visual and language modalities, thereby enhancing the distinction between objects and backgrounds in the feature space and improving the model's semantic discrimination in scenes with limited discriminative features. CDCDN is evaluated on the MHCD2022 and COD10K datasets using the mAP@0.5, mAP@0.5∶0.95, and mAP@0.75 metrics. CDCDN achieves scores of 67.6%, 42.6%, 48.4% on the MHCD2022 dataset, and 67.9%, 40.6%, 41.0% on the COD10K dataset, respectively. Compared to five mainstream object detection methods, including Faster R-CNN, DETR, Lite-DETR, YOLOv5, and YOLOv10, CDCDN achieves the best detection accuracy across all three metrics.Visualization of detection results in four common camouflaged scenes -barren land, grassland, woodland, and snowfield -demonstrates the adaptability of CDCDN to various scenes. In an ablation study, the key components of CDCDN are incrementally removed to systematically evaluate their contributions, with results showing that each component significantly enhances the model's detection performance. Comprehensive experimental results indicate that CDCDN can accurately detect camouflaged objects with high visual consistency to their surroundings, providing a novel solution for COD.

Key words：Visible spectrum; Camouflaged object detection; Dual-channel decoupling; Information collaboration; Cross-modal interaction

收稿日期: 2024-12-31 修订日期: 2025-05-20

中图分类号:

TP391

基金资助: 国家自然科学基金项目(62176259，62373364，62303468)，江苏省自然科学基金项目(BK20221116)资助

通讯作者: 王雪松 E-mail: wangxuesongcumt@163.com

作者简介: 程玉虎，1973年生，中国矿业大学信息与控制工程学院教授 e-mail: chengyuhu@163.com

引用本文:

程玉虎，吴世佳，王浩宇，王雪松. 面向可见光谱图像的跨模态双通道伪装目标检测方法[J]. 光谱学与光谱分析, 2025, 45(09): 2632-2641.
CHENG Yu-hu, WU Shi-jia, WANG Hao-yu, WANG Xue-song. Cross-Modal Dual-Channel Camouflaged Object Detection Method for Visible-Spectrum Image. SPECTROSCOPY AND SPECTRAL ANALYSIS, 2025, 45(09): 2632-2641.

链接本文:

https://www.gpxygpfx.com/CN/10.3964/j.issn.1000-0593(2025)09-2632-10 或 https://www.gpxygpfx.com/CN/Y2025/V45/I09/2632

[1] Zheng Y, Zhang X, Wang F, et al. IEEE Signal Processing Letters, 2019, 26(1): 29.
[2] XU Jing-yu, BAO Ni-sha, LANG Jie-shuang，et al(徐景余, 包妮沙, 郎洁双, 等). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2024, 44(12): 3534.
[3] Talas L, Baddeley R J, Cuthill I C. Philosophical Transactions of the Royal Society B: Biological Sciences, 2017, 372(1724): 20160351.
[4] Liu Y, Wang C, Zhou Y. Defence Technology, 2023, 21: 176.
[5] Fan D P, Ji G P, Sun G, et al. Proceedings of the Computer Vision and Pattern Recognition，2020. 2777.
[6] Lv Y, Zhang J, Dai Y, et al. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(7): 3462.
[7] Zhou T, Zhou Y, Gong C, et al. IEEE Transactions on Image Processing, 2022, 31: 7036.
[8] Cong R, Sun M, Zhang S, et al. Proceedings of the 31st ACM International Conference on Multimedia，2023. 1179.
[9] Khan A, Khan M, Gueaieb W, et al. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision，2024. 1434.
[10] Liang B, Luo H. Expert Systems with Applications, 2024, 238: 121778.
[11] Zou Z, Chen K, Shi Z, et al. Proceedings of the IEEE, 2023, 111(3): 257.
[12] Ren S, He K, Girshick R, et al. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137.
[13] Law H, Deng J. Proceedings of the European Conference on Computer Vision，2018. 734.
[14] Carion N, Massa F, Synnaeve G, et al. Proceedings of the European Conference on Computer Vision, 2020. 213.
[15] Liu M, Di X. Neurocomputing, 2023, 549: 126466.
[16] Woo S, Park J, Lee J Y, et al. Proceedings of the European Conference on Computer Vision，2018. 3.
[17] Li J, Li D, Xiong C, et al. Proceedings of the International Conference on Machine Learning, 2022. 12888.
[18] Michel P, Levy O, Neubig G. Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019. 14037.
[19] Li F, Zeng A, Liu S, et al. Proceedings of the Computer Vision and Pattern Recognition，2023. 18558.
[20] Khanam R, Hussain M. What is YOLOv5: A Deep Look Into the Internal Features of the Popular Object Detector, 2024, 10.48550/arXiv_2407_20892.
[21] Wang A, Chen H, Liu L, et al. Yolov10: Real-Time End-to-End Object Detection, 2024, arXiv: 2405. 14458.