Fast saliency-aware multi-modality image fusion
Introduction
Recent advances in imaging, networking, data processing and storage technology have resulted in tremendous expansion of the use of multi-modality image/video in a variety of fields. A typical application is surveillance imaging where people usually combine the advantages of different imaging sensors in order to enhance the capability of vision systems. At the core of such an application is multi-modality image fusion, which enables to combine multiple images captured by different modalities into a single representation. This single fused image provides comprehensive information about the scene such that the operator does not need to check each image separately. This is nicely illustrated in the case of fire monitoring based on the combination of IR and ViS images, where the system is expected to locate the fire at an early stage. While a ViS image may allow the operator to readily spot a billowing smoke plume, the actual location of the fire is more easily deduced by inspecting the corresponding hot spot in the IR image. If we combine two images properly, one can see both bright spot and smoke in the fused image, enabling the operator to quickly and precisely locate the fire.
Image fusion has been studied extensively [1], [2]. Depending on the intended application, different fusion methods have been developed, but two basic research lines have gained prominence: pixel-based and region-based fusion. Pixel-based image fusion combines the images at the pixel level while region-based image fusion considers pixels constituting the same object as an entity. From the perceptual point of view, region-based fusion is often superior since meaningful objects always attract more attention than incoherent individual pixels. However, the usage of the region-based fusion is not straightforward due to several problems. Firstly, an assumption underlying region-based fusion is that segmented images from multiple modalities are similar in terms of region location and size. Unfortunately, this does not hold when two modalities are significantly different from each other, as is the case for long wavelength IR and ViS image. Secondly, image segmentation is computationally expensive so that the fusion based on it is not suited for applications, such as surveillance imaging, where real-time processing is paramount importance. Thirdly, the region-based fusion treats each segmented region on the same footing, irrespective of the region's saliency. For many applications, only a select few regions bear significance.
In this paper, an IR and ViS image/video fusion algorithm is proposed to enhance the visualization of a surveillance imaging system. The core idea is to take the saliency of the region into account during the fusion procedure. Our work differs from the existing work in three aspects.
- •
We select the ViS image as a sort of reference or iconic image [19], and the fusion is biased in favor of the ViS image. The reason is that the ViS image often provides a familiar impression of the scene, thus reducing the cognitive load for the supervisor who has to recognize or locate the target object.
- •
Instead of partitioning the image into seamless regions, we only extract regions that are salient in terms of the intended application. In this paper, we perform saliency detection on the IR image, as the thermography from the IR image can “see the object” without illumination.
- •
We consider the saliency detection as a classification problem in which a Markov Random Field (MRF) is called upon to harness the co-occurrence of hot spots and motion. This model helps to generate a better saliency map, as consistency of neighboring pixels is adequately taken into consideration [8].
In this context, we designate a region as salient if it consistently tends to attract attention from viewers. Obviously, it is difficult to design a generic saliency extraction algorithm that can be applied to a multitude of applications. However, we believe that it is possible to define saliency in operational terms, once the context of an application provided. Since in our application (wildfire surveillance and monitoring) regions of interest — either fire or humans — tend to be hotter and moving when compared to the background, we define saliency in terms of high IR brightness and motion.
In the sequel, we first overview the literature in Section 2. In Section 3, we present our framework. We further introduce the idea of MRF-based saliency detection on IR image in Section 4. In Section 5, we describe our MR-based (wavelet) biased fusion algorithm. The experimental results are provided in Section 6. Finally, conclusions are drawn in Section 7.
Section snippets
Prior work on image fusion
Several related survey papers [1], [2], [3] for image fusion have appeared over the years, providing a broad overview of over one hundred papers. In keeping with most of the literature, we divide existing techniques into two categories: pixel-based fusion and region-based fusion.
Pixel-based image fusion algorithm [4], [5] is to combine images at the pixel level. The fusion schemes range from simple spatial pixel-value fusion to more complex transform fusion. The simplest form of the spatial
The system overview
Fig. 1 depicts the proposed system architecture with its main functional units and data flows. The functions of the key modules are as follows.
- •
Hot spot detection in IR. The top 5% pixels in terms of IR brightness are regarded as hot spot pixels. Based on this, a probabilistic map is generated where the probability of a hot spot pixel is proportional to its intensity value.
- •
Motion detection in IR. We exploit background subtraction to extract the motion pixels, assuming the camera is fixed. Here,
Markov random fields (MRFs)-based saliency detection
Saliency detection can be interpreted as a pixel labeling problem, where each pixel in the image is labeled as either salient or non-salient. The pixel labeling is a typical classification problem, which can be treated by a Markovian-based Maximum A-Posterior (MAP) approximation. Given the observed features f and the configuration of labels l, the posterior probability of l is:Maximizing amounts to maximizing the product of the class conditional probability
Biased MR image fusion
We base our MR image fusion on the wavelet transform due to the resemblance between its filtering properties and the human vision process. Basically, our fusion consists of four steps: (1) Generate the saliency map from the IR image; (2) Carry out a wavelet transform for both the IR and ViS images; (3) Coefficient fusion at each scale using different fusion rules; (4) Perform the inverse wavelet transform and construct the fused image. Here, we will focus on explaining the coefficient fusion
Experimental results
Our proposed system is implemented in C++ on a Laptop PC platform (Dual core 2.53 GHz, 4 GB RAM) with a 64-bits operation system. We have tested our algorithm in two surveillance-related scenarios. In the first scenario, it is required to monitor an area within which a fire may occur.2 For this situation, one may separately observe the hot spot (flame) and the smoke on the IR image and ViS image. If we can fuse two images properly, one is able to
Conclusion
In this paper, we propose a fast saliency-aware image fusion algorithm, which is inspired by the region-based image fusion concept. The major difference between traditional algorithms and our algorithm lies in the fact that our algorithm takes the saliency of the object/region into account. The saliency of the region drives the fusion procedure. In order to generate a consistent saliency map from IR image, we feed the co-occurrence of hot spots and motion into a MRF model. Both objective and
Acknowledgments
The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7-ENV-2009-1) under Grant agreement no FP7-ENV-244088 “FIRESENSE—Fire Detection and Management through a Multi-Sensor Network for the Protection of Cultural Heritage Areas from the Risk of Fire and Extreme Weather”.
Jungong Han received his Ph.D. degree in Telecommunication and Information System from XiDian University, China, in 2004. During his Ph.D. study, he spent one year at Internet Media group of Microsoft Research Asia, China. From 2005 to 2010, he was with Signal Processing Systems group at the Technical University of Eindhoven, The Netherlands. In December of 2010, he joined the Centre for Mathematics and Computer Science (CWI) in Amsterdam, as a research staff member. In July of 2012, he started
References (33)
- et al.
Wavelet based image fusion techniquesan introduction, review and comparison
ISPRS J. Photogramm. Remote Sensing
(2007) A general framework for multiresolution image fusionfrom pixels to regions
Inf. Fusion
(2003)- et al.
Multisensor image fusion using the wavelet transform
Graphical Models Image Process.
(1995) - et al.
Background-subtraction using contour-based fusion of thermal and visible imagery
Comput. Vision Image Understanding
(2007) - et al.
Bayesian algorithms for adaptive change detection in image sequence using Markov random fields
Signal Process. Image Commun.
(1995) - et al.
Visible and infrared image registration in man-made environment employing hybrid visual features
Pattern Recognition Lett.
(2013) - et al.
Review article multisensor image fusion in remote sensingconcepts, methods and applications
Int. J. Remote Sensing
(1998) - H. Irshad, M. Kamran, A. Siddiqui, A. Hussain, Image fusion using computational intelligence: a survey, in: Proceedings...
- P. Burt, E. Adelson, Merging images through pattern decomposition, in: Proceedings of the SPIE, vol. 575, 1985, pp....
Hierarchical image fusion
Mach. Vision Appl.
(1990)
The Laplacian pyramid as a compact image code
IEEE Trans. Commun.
Cited by (0)
Jungong Han received his Ph.D. degree in Telecommunication and Information System from XiDian University, China, in 2004. During his Ph.D. study, he spent one year at Internet Media group of Microsoft Research Asia, China. From 2005 to 2010, he was with Signal Processing Systems group at the Technical University of Eindhoven, The Netherlands. In December of 2010, he joined the Centre for Mathematics and Computer Science (CWI) in Amsterdam, as a research staff member. In July of 2012, he started a senior scientist position with Civolution technology in Eindhoven (a combining synergy of Philips Video Content Identification and Thomson STS).
His research interests include multimedia security, multi-sensor data fusion, video content analysis, and computer vision. He has written and co-authored over 60 papers including 3 invited papers in these areas. One of his algorithm implementations has been commercialized and used by a start-up company. He is an associate editor of Elsevier Neurocomputing and Journal of Convergence Section C: Web and Multimedia. He has been (lead) guest editor for several journals, such as IEEE-T-SMC:B and Pattern Recognition Letters. He is a member of IEEE IDSP Standing Committee, and a voting member of IEEE Multimedia Communications Technical Committee.
Eric J. Pauwels joined the computer vision research group at ESAT (Leuven University, Belgium) after completing his Ph.D. in Mathematics, and worked on various mathematical problems in computer vision, including differential, semi-differential and algebraic invariants and their application to object recognition. In 1999, he joined the Signals and Images research group at the Centre for Mathematics and Computer Science (CWI) in Amsterdam where he focuses on two topics: content based image retrieval, and multimodal camera and sensor networks for situational awareness in smart environments. He has contributed to numerous national and European projects and was the scientific coordinator of the FP6 Network of Excellence on Multimedia Understanding through Semantics, Computations and Learning (MUSCLE). He founded and acted as the first chairman for the ERCIM Working Group on Image and Video Understanding. He also organized and chaired the first international workshop on Distributed Sensing and Collective Intelligence in Biodiversity Monitoring.
Paul de Zeeuw is a numerical mathematician, affiliated at the CWI, Amsterdam (NL), since 1979. He studied mathematics and computer science at the University of Leiden and obtained his Ph.D. thesis from the University of Amsterdam. He authored and co-authored many papers on multigrid algorithms for the solution of partial differential equations. One paper in particular is much cited and the accompanying computer code is widely used. He has also been participating in image processing projects, as a spin-off thereof two Matlab toolboxes have been built and made available on the web. Further, he has been author at the Dutch Open University on the topic of numerical linear algebra, and was the secretary of the Dutch-Flemish Numerical Analysis Society from 1997 till 2002, including being editor of its newsletter. He has acted as a reviewer of project proposals. Present focal points are applications of multi-resolution methods in image processing, including image fusion and content-based image retrieval.
- 1
This work was done while Jungong Han was at CWI.