|
|
|
|
|
|
A Multi-Task Convolutional Neural Network for Infrared and Visible Multi-Resolution Image Fusion |
ZHU Wen-qing1, 2, 3, ZHANG Ning1, 2, 3, LI Zheng1, 2, 3*, LIU Peng1, 3, TANG Xin-yi1, 3 |
1. Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
3. Key Laboratory of Infrared System Detection and Imaging Technology, Chinese Academy of Sciences, Shanghai 200083, China
|
|
|
Abstract Infrared and visible image fusion have always been a research hotspot in the image field. Fusion technology can compensate for a single sensor’s deficiency and provide good imaging pandation for image understanding and analysis. Due to the limitation of production technology and cost, the resolution of infrared detectors is much lower than that of visible detectors, which prevents practical usage to a great extent. A multi-task convolutional neural network framework combining infrared super-resolution and image fusion tasks is proposed, which is applied to the infrared and visible multi-resolution image fusion. In terms of network structure, firstly, a dual-channel network is designed to extract infrared and visible features respectively, so that the resolution of each source image does not limit the proposed algorithm. Secondly, the feature up-sampling block is proposed, using the bilinear interpolation method to increase the number of pixels. Then the mapping relationship between pixel smooth space and high-frequency space is refined via a multilayer perceptron. Therefore, the infrared images can be presented on an arbitrary scale, where the training tasks are not provided. Furthermore, the linear self-attention mechanism is introduced into the network to learn the nonlinear relationship between feature space positions, suppress irrelevant information and enhance global information expression. In terms of the loss function, the gradient loss is proposed to retain the filter response with larger absolute values in the infrared and visible images and calculate the Frobenius norm between the value and the response value of the reconstructed fusion image. Thus, fusion images can be generated without ideal images as ground truth supervising network learning. Finally, the fused and high-resolution infrared images can be reconstructed simultaneously by optimizing the multi-task model under the combined action of gradient loss and pixel loss. The proposed approach is trained on the RoadScene dataset and compared with the other four related algorithms on the TNO dataset. In terms of subjective performance, the proposed method can input source images with the arbitrary resolution, and fusion images have prominent infrared targets and rich visible details. When the resolution of source images is quite different, the proposed method can still reconstruct high-resolution infrared images with clear features and has robust generalization. The objective performance is excellent in multiple evaluation metrics such as entropy, the sum of the correlations of differences and spatial frequency. Experimental results demonstrate that fusion images have a large amount of information, high information conversion rate and high clarity, which verifies the effectiveness of the proposed method.
|
Received: 2021-12-10
Accepted: 2022-02-23
|
|
Corresponding Authors:
LI Zheng
E-mail: Lizheng_sitp@163.com
|
|
[1] Li S, Kang X, Hu J. IEEE Transactions on Image Processing, 2013, 22(7): 2864.
[2] Liu Y, Wang Z F. IET Image Processing, 2015, 9(5): 347.
[3] SHEN Yu, YUAN Yu-bin, PENG Jing(沈 瑜,苑玉彬,彭 静). Spectroscopy and Spectral Analysis(光谱学与光谱分析), 2021, 41(7): 2023.
[4] Liu Y, Chen X, Cheng J, et al. International Journal of Wavelets Multiresolution and Information Processing, 2018, 16(3): 1850018.
[5] Ma J Y, Yu W, Liang P W, et al. Information Fusion, 2019, 48: 11.
[6] Li H, Wu X J, Durrani T. IEEE Transactions on Instrumentation and Measurement, 2020, 69(12): 9645.
[7] Ma J, Xu H, Jiang J, et al. IEEE Transactions on Image Processing, 2020, 29: 4980.
[8] Li H, Cen Y, Liu Y, et al. IEEE Transactions on Image Processing, 2021, 30: 4070.
[9] Dong C, Loy C C, Tang X O. Computer Vision-Eccv 2016, Pt Ii, 2016, 9906: 391.
[10] Kingma D P, Ba J. arXiv Preprint arXiv, 2014, 1412: 6980.
[11] Zhou Z Q, Wang B, Li S, et al. Information Fusion, 2016, 30: 15.
[12] Aslantas V, Bendes E. Aeu-International Journal of Electronics and Communications, 2015, 69(12): 160.
[13] Cui G M, Feng H J, Xu Z H, et al. Optics Communications, 2015, 341: 199.
[14] Eskicioglu A M, Fisher P S. Ieee Transactions on Communications, 1995, 43(12): 2959.
|
|
|
|