A Novel Layer Based Image Fusion Approach via Transfer Learning and Coupled Dictionary

Hu, Kai; Sun, Bin; Deng, Qiao; Yang, Qi

doi:10.1007/978-981-10-7302-1_17

Kai Hu¹⁶,
Bin Sun¹⁶,
Qiao Deng¹⁶ &
…
Qi Yang¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 772))

Included in the following conference series:

CCF Chinese Conference on Computer Vision

Abstract

A novel layer based image fusion method is proposed in this paper. It exploits and utilizes the implicated patterns among source images with two parts: (i) proposed a more precise model roots in transfer learning and coupled dictionary for layering source images; (ii) designed appropriate fusion scheme which bases on multi-scale transformation for recombining layers into final fused image efficiently. Rigorous experimental comparison in subjective and objective demonstrates that proposed image fusion method achieves better result in visual perception and computer process.

You have full access to this open access chapter, Download conference paper PDF

Image Fusion Based on Feature Decoupling and Proportion Preserving

A dual-stream feature decomposition network with weight transformation for multi-modality image fusion

Article Open access 03 March 2025

Contrastive feature decomposition for single image layer separation

Article 27 February 2024

Keywords

1 Introduction

Due to limitation of single sensor, multi-sensor images are often used for providing comprehensive description of target. Image fusion combines multi-sensor images into a single more comprehensive fused image for visual perception and computer process. Image fusion has been widely used in many applications, such as visible and infrared photography, medical detection, remote imagery and so on.

In early studies, researchers usually directly define significance and match measure of image content in original or transform domain. A series of approaches work in different domains that multi-scale transformation [1,2,3,4] and spare representation [5, 6] are the most typical types. And many approaches make effort to define better indexes and fusion rule [7]. More detailed introduction about these algorithm could be found in the excellent review of Li [8].

Since multi-sensor images provide descriptions of same target with different sensor types and environment, some relationships could be expected to exist among them [9]. The same target leads to redundancy while variety sensor types and environment generate complementary in information. Some novel approaches are designed working with redundancy and complementarity of source images. With sophisticated manual definition of this relationship. For example, satellites multi-spectrum and panchromatic images fusion of SIRF [10], infrared-visible image fusion [11] and MRSR in multi-focus image fusion [12] are proved superior to previous work in result. Some approaches design algorithm automatically reveal implicated relationship among source images [13, 14].

In this paper, inspired by [13, 15], a novel algorithm is proposed for exploiting redundancy and complementarity via transfer learning, coupled dictionary and some specific prior knowledge. After separated information of source images, a fusion scheme was designed for obtaining final fused image.

The rest of this paper is organized as follows. Introduction and discussion about related works are exhibited in Sect. 2. Section 3 describes our work in detail and Sect. 4 compares proposed method with state-of-the-art algorithm.

2 Related Work

2.1 Layer Division Based Image Fusion

The relationship of redundancy and complementarity could be mathematically modeled in the following form with probabilistic view. In order to simplify discussion, we only consider the situation with only two source images.

$$\begin{aligned} P(x_1) = P(x_1\overline{x_2})+P(x_1x_2) \end{aligned}$$

(1)

$$\begin{aligned} P(x_2) = P(x_2\overline{x_1})+P(x_1x_2) \end{aligned}$$

(2)

This formulation divides original image content into two parts. Left part of both equations represent marginal distribution of source images $x_1$ and $x_2$.$P(x_1\overline{x_2})$ and $P(x_2\overline{x_1})$ are individual components while $P(x_1x_2)$ is correlated component. Essentially, the core task of exploiting redundancy and complementarity is to estimate individual and correlated components across two images.

Yu [13] actually extracts features of source images through K-SVD dictionary and labels them by proposed joint spare representation (JSR). Finally, correlated layer and individual layer are reconstructed with labeled features respectively. It is an approximation of correlated components and individual components with K-SVD dictionary atoms respectively.

$$\begin{aligned} P(x) = P(f(\theta _C))+P(f(\theta _I)),\theta _C+\theta _I = \theta \end{aligned}$$

(3)

$\theta $ is all atoms in K-SVD dictionary while $\theta _C$ and $\theta _I$ are correlated and individual features. $f(\cdot )$ represents spare reconstruction procedure for estimating ideal layer division.

The more precise dictionary, the better divided layers [15]. Unfortunately, existing dictionary training algorithm cannot guarantee its feature is orthogonal, exactly precise and no reconstructive error. As illustrated in Fig. 1, these defects make its model not very precise.

In consideration of difficulties in estimating individual parts and correlated parts directly, we propose a novel model to estimate posterior probability since marginal probability is known. We reformulate (1) and (2) to the following form.

$$\begin{aligned} P(x_1) = P(x_1\overline{x_2})+P(x_1|x_2)\cdot P(x_2) \end{aligned}$$

(4)

$$\begin{aligned} P(x_2) = P(x_2\overline{x_1})+P(x_1|x_2)\cdot P(x_1) \end{aligned}$$

(5)

In this form, more precise estimation of union probability is obtained with fusion of posterior probability. This task has been conducted for many years in transfer learning technique and coupled dictionary which are introduced in Sect. 2.2.

2.2 Introduction of Transfer Learning and Coupled Dictionary

Transfer learning is a technique which aims to help improving performance in target task through transfer knowledge from a source task [16]. The core of transfer learning is the mechanism to extract and transfer knowledge of source to target. Some work [17, 18] focus on transferring knowledge across two unlabeled domains. It is reported that transfer learning hasn’t been used in image fusion tasks though it highly matches image fusion.

Coupled dictionary training is often used to observe feature spaces for associating cross-domain image data and jointly improve presentative ability of each dictionary [19]. Rui Gao proposed a multi-focus image fusion approach with coupled dictionary [20].

3 Proposed Method

3.1 Transfer Learning and Coupled Dictionary Based Layer Division

In Sect. 2.1, the problem is converted to estimation of $P(x_1|x_2)$ and $P(x_2|x_1)$ in (4) and (5). In this model, both images will be divided into two layers due to asymmetric of $P(x_1|x_2)$and $P(x_2|x_1)$.

Let $D_1$ is the feature dictionary which is trained from image $x_1$, it is reasonable to treat this dictionary $D_1$ as source knowledge in transfer learning [18]. Reconstructing $x_2$ with $D_1$, correlated component could be reconstructed well while individual component presumably lose. With same process in $x_1$ and $x_2$, the optimal goal could formulate as following:

$$\begin{aligned} min \Big \{\Vert X_1 - D_2\cdot A_1\Vert _2 + \Vert X_2 - D_1\cdot A_2\Vert _2 + l(A_1,A_2,D_1,D_2)\Big \} \end{aligned}$$

(6)

$X_1$ is the sample matrix of $x_1$ which consists of vectorized patches obtained from sliding windows. A1 and A2 are the coefficient matrixes. $l(A_1,A_2,D_1,D_2)$ is regularization condition presents some expected properties for dictionary and reconstructive process. The scheme of layer division is illustrated in Fig. 2.

3.2 Feature Extraction and Exchanged Representation

Learning dictionary D is the process for obtaining a series features which determine the accuracy in layer division. Unfortunately, based on discussion in Sect. 2.1, reconstructive error still influences the result of layer division in proposed model. Instead of designing a brand new sophisticated algorithm for seeking better dictionary, inspired by [12], an alternative method is taken in this paper.

Undoubtedly, original image patches could transfer knowledge between images and do not lose any information. However, in procedure of knowledge transfer that original image patches are not good at distinguishing features. In order to ensure that each component could be presented correctly in any layer images, it is necessary to ensure that elements in all layer images is positive for consistency with original patches.

$$\begin{aligned} \begin{aligned} min \Big \{\Vert X_1 - D_2\cdot A_1\Vert _2 + \Vert X_2 - D_1\cdot A_2\Vert _2 + l(A_1,A_2,D_1,D_2)\Big \},\\ (X_l^h)_{ij}\ge 0;l = 1,2; h=C,I \end{aligned} \end{aligned}$$

(7)

Our optimal formulation is changed to the following form. Logical notation means the logical relation between corresponding elements in both matrixes:

$$\begin{aligned} \begin{aligned} min \Big \{\Vert X_1 - D_2\cdot A_1\Vert _2 + \Vert X_2 - D_1\cdot A_2\Vert _2 + l(A_1,A_2,D_1,D_2)\Big \},\\ (X_l^h)_{ij}\ge 0;l = 1,2; h=C,I \end{aligned} \end{aligned}$$

(8)

A general condition in image fusion field is that source images has been pre-registered before further processing, and so does our work. This condition guarantees a property in image fusion that correlated knowledge which depicts the same phenomenon existing in same region of both source images. So, the j-th patch of $X_1$ only share correlated knowledge with j-th patch of $X_2$. We could rewrite our optimal formulation (7) with this property as the following form.

$$\begin{aligned} \begin{aligned} min&\Big \{\Vert X_1 - D_2\cdot A_1\Vert _2 + \Vert X_2 - D_1\cdot A_2\Vert _2 +\lambda _1 {\sum \limits _{j\,=\,1}^{j\,=\,m} (\Vert \alpha _{1_{j,j}}\Vert _1+\Vert \alpha _{2_{j,j}}\Vert _1-2)} \\&+\lambda _2 {\sum \limits _{j\,=\,1}^{j\,=\,m} (\Vert \alpha _{1_{i,j}}\Vert _1+\Vert \alpha _{2_{i,j}}\Vert _1)}\Big \},i\not =j \end{aligned} \end{aligned}$$

(9)

In this formulation, $\alpha _{1_{j}}$ is the j-th column of $A_1$. The first regularization with $\lambda _1$ ensures a patch accept knowledge from corresponding patch in another image. $\lambda _2$ controls tolerance that patch accept knowledge from other patch. Third and fourth regularization determine penalty on negative element of layer images. When these parameter are infinite, the optimal solve actually occur when any element is equal between each column of $X_1$ and $X_2$.

Definition of coupled dictionary problem is similar to (8), except for exchanged position of $D_1$ and $D_2$. In this paper, we proposed a novel algorithm to solve this optimization reference to property mentioned in [19].

3.3 Fusion Scheme Base on Proposed Layer Division Method

Indeed, correlated layer and individual layer store redundant and complementary information respectively. The ideal situation is that fused image inherits all of redundant and complementary information directly. Due to possible incompatibility between complementary components, the better fusion rule in reality is to chose more informative one. Furthermore, complementary information is no longer influenced by redundant information when its information is measured. Hence the complementary component in fused image looks enhanced compare with source images.

Instead of spare representation, which cannot handle high frequency information efficiently [7], multi-scale transformation is much better to corresponding layers. Our fusion scheme is designed with DTCWT [2] and NSCT [4] as Fig. 3.

4 Experiment

Infrared-visible (IR-VI) image pair and multi-focus image pair are chosen in experiment due to their distinctive property in distribution of individual and correlated components. A comparison of divided layers between work in [13] and proposed method is held in the first part. Then subjective and objective comparisons of final fused image are held.

4.1 Layer Division Results and Discussion

Approach of [13] works with sliding windows in size 8$\,\times \,$8 and K-SVD dictionary in size 64$\,\times \,$500. In proposed method, on account of low similarity in IR-VI images but high similarity in multi-focus images, sliding windows with size 3$\,\times \,$3 and 16$\,\times \,$16 are applied in them respectively.

In Figs. 4 and 5, divided layers of proposed method and [13] are illustrated. Obviously, more clear pattern is emerged in proposed methods divided layers and no clutter information exists in them. Individual layers of proposed method show some apparent dissimilarity compare to JSR approach.

4.2 Fusion Results and Comparison

Divided layers finally integrated into fused image with multi-scale transformation fusion scheme, so discrete wavelet transform (DWT), non-sampled contourlet transform (NSCT) are chosen for a comparative study. Level of multi-scale decomposition is 4 in both DWT and NSCT. Besides, a comparison among some approaches is held with optimal parameters as descriptions in their papers [7, 13].

In accordance with the point that multi-sensor images are used for comprehensively describe target, complementary information may be more important than redundant information. Because proposed approaches decrease interference between complementary information and redundant information in some degree, there are some details enhanced of results in Figs. 6 and 7. Dissimilar with some image enhanced algorithm, this emphasis of individual component does not generate artificial component.

In Fig. 6, proposed method inherit the elliptical structure of fence in the gallery, bracket of left traffic light and line on the road when other approaches lost them due to only reserving information of single one source. We mark these regions with red box. Besides, fused image of proposed method clearly depicts the details of stools in front of bar, windows lattice and backpack of pedestrian who walk pass the bar.

In the fused scenario consisted of multi-focus images, proposed method obtain sharper and clearer results than other methods. By carefully comparison border of focus regions, no shadow or oversharp edge exists in results. We also mark some important regions with red box. Proposed model makes gradient information more significant while gradient is individual component among source images.

According to the work of Liu [21], four objective metrics which emphasis on different views are chosen for evaluating and comparing proposed method. $Q_{VIFF}$ [22] simulates human vision when $Q_e$ comes from information theory. $Q_{SF}$ [23] presents the richness of gradient information and $Q_{SSIM}$ [22] measure the similarity of structure between fused image and source images.

According to Table 1, high scores in $Q_{VIFF}$, $Q_e$ and $Q_{SF}$ are obtained by results of proposed method in both IR-VI and multi-focus scenarios. Due to the emphasis of individual component, proposed method does not keep a consistent structure with single source image decreases its score in $Q_{SSIM}$ (Fig. 8).

Table 1. Objective metric scores of IR-VI fusion and multi-focus fusion

Full size table

5 Conclusion

In this paper, we proposed a novel approach for dividing image into correlated layer and individual layer. Compared with previous work, our layer is better in revealing implicit pattern among source images. For layer division task, we make use of existing morphology transformation to fuse divided layer respectively and combine them into final fused image. The experimental results show that proposed method is competitive to state-of-the-art approaches. Since misalignment, noise and moving objection are presented differently in each images, robust approach is the focus in our future work.

References

Zhang, Z., Blum, R.S.: A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. In: Proceedings of the IEEE, vol. 87, p. 1315 (1999)
Google Scholar
Lewis, J.J., O’Callaghan, R.J., Nikolov, S.G., Bull, D.R., Canagarajah, N.: Pixel- and region-based image fusion with complex wavelets. Inf. Fusion 8, 119 (2007)
Article Google Scholar
Nencini, F., Garzelli, A., Baronti, S., Alparone, L.: Remote sensing image fusion using the curvelet transform. Inf. Fusion 8, 143 (2007)
Article Google Scholar
Li, T., Wang, Y.: Biological image fusion using a NSCT based variable-weight method. Inf. Fusion 12, 85 (2011)
Article Google Scholar
Yang, B., Li, S.: Multifocus image fusion and restoration with sparse representation. IEEE Trans. Instrum. Meas. 59, 884 (2010)
Article Google Scholar
Kim, M., Han, D.K., Ko, H.: Joint patch clustering-based dictionary learning for multimodal image fusion. Inf. Fusion 27, 198 (2016)
Article Google Scholar
Liu, Y., Liu, S., Wang, Z.: A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 24, 147 (2015)
Article Google Scholar
Li, S., Kang, X., Fang, L., Hu, J., Yin, H.: Pixel-level image fusion: a survey of the state of the art. Inf. Fusion 33, 100 (2017)
Article Google Scholar
Kong, W.W., Lei, Y., Ren, M.M.: Fusion technique for infrared and visible images based on improved quantum theory model. In: Zha, H., Chen, X., Wang, L., Miao, Q. (eds.) CCCV 2015. CCIS, vol. 546, pp. 1–11. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48558-3_1
Chapter Google Scholar
Chen, C., Li, Y., Liu, W., Huang, J.: SIRF: simultaneous satellite image registration and fusion in a unified framework. IEEE Trans. Image Process. 24, 4213 (2015)
Article MathSciNet Google Scholar
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100 (2016)
Article Google Scholar
Zhang, Q., Levine, M.D.: Robust multi-focus image fusion using multi-task sparse representation and spatial context. IEEE Trans. Image Process. 25, 2045 (2016)
Article MathSciNet Google Scholar
Yu, N., Qiu, T., Bi, F., Wang, A.: Image features extraction and fusion based on joint sparse representation. IEEE J. Sel. Top. Signal Process. 5, 1074 (2011)
Article Google Scholar
Son, C., Zhang, X.: Layer-based approach for image pair fusion. IEEE Trans. Image Process. 25, 2866 (2016)
Article MathSciNet Google Scholar
Panagakis, Y., Nicolaou, M.A., Zafeiriou, S., Pantic, M.: Robust correlated and individual component analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1665 (2016)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345 (2010)
Article Google Scholar
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817 (2005)
MathSciNet MATH Google Scholar
Dai, W., Yang, Q., Xue, G., Yu, Y.: Self-taught clustering. In: ICML 2008, 8 p. Helsinki, USA (2008)
Google Scholar
Huang, D., Wang, Y.F.: Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: IEEE International Conference on Computer Vision, 2496 p. (2013)
Google Scholar
Gao, R., Vorobyov, S.A., Zhao, H.: Multi-focus image fusion via coupled dictionary training. In: International Conference on Acoustics Speech and Signal Processing, 1666 p. (2016)
Google Scholar
Liu, Z., Blasch, E., Xue, Z., Zhao, J., Laganiere, R., Wu, W.: Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: a comparative study. IEEE Trans. Pattern Anal. Mach. Intell. 34, 94 (2012)
Article Google Scholar
Han, Y., Cai, Y., Cao, Y., Xu, X.: A new image fusion performance metric based on visual information fidelity. Inf. Fusion 14, 127 (2013)
Article Google Scholar
Zheng, Y., Essock, E.A., Hansen, B.C., Haun, A.M.: A new metric based on extended spatial frequency and its application to DWT based fusion algorithms. Inf. Fusion 8, 177 (2007)
Article Google Scholar

Download references

Acknowledgments

The authors thank Glenn Easley, Yu Liu, Zheng Liu for sharing the code of their works. The work is supported by the Fundamental Research Funds for the Central Universities of China (No. ZYGX2015J122 and No. ZYGX2015KYQD032), New Characteristic Teaching Material Construction (No. Y03094023701019427), National Natural Science Foundation of China (No.61701078), and the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry [2015] 1098.

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
Kai Hu, Bin Sun, Qiao Deng & Qi Yang

Authors

Kai Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Qiao Deng
View author publications
You can also search for this author in PubMed Google Scholar
Qi Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Sun .

Editor information

Editors and Affiliations

Civil Aviation University of China, Tianjin, China
Jinfeng Yang
Tianjin University, Tianjin, China
Qinghua Hu
Nankai University, Tianjin, China
Ming-Ming Cheng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Huazhong University of Science and Technology, Wuhan, China
Xiang Bai
Xi’an Jiaotong University, Xi’an, China
Deyu Meng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, K., Sun, B., Deng, Q., Yang, Q. (2017). A Novel Layer Based Image Fusion Approach via Transfer Learning and Coupled Dictionary. In: Yang, J., et al. Computer Vision. CCCV 2017. Communications in Computer and Information Science, vol 772. Springer, Singapore. https://doi.org/10.1007/978-981-10-7302-1_17

Download citation

DOI: https://doi.org/10.1007/978-981-10-7302-1_17
Published: 30 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7301-4
Online ISBN: 978-981-10-7302-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics