Elsevier

Information Fusion

Volume 45, January 2019, Pages 346-360
Information Fusion

Alternating diffusion maps for multimodal data fusion

https://doi.org/10.1016/j.inffus.2018.01.007Get rights and content

Highlights

  • A nonlinear multimodal data fusion method is proposed.

  • The method is designed to suppress the sensor-specific variables.

  • The method preserves the latent variables measured by two or more sensors.

  • The method relies on multiple applications of alternating diffusion operators.

  • Good performance of automatic assessment of sleep stage is demonstrated.

Abstract

The problem of information fusion from multiple data-sets acquired by multimodal sensors has drawn significant research attention over the years. In this paper, we focus on a particular problem setting consisting of a physical phenomenon or a system of interest observed by multiple sensors. We assume that all sensors measure some aspects of the system of interest with additional sensor-specific and irrelevant components. Our goal is to recover the variables relevant to the observed system and to filter out the nuisance effects of the sensor-specific variables. We propose an approach based on manifold learning, which is particularly suitable for problems with multiple modalities, since it aims to capture the intrinsic structure of the data and relies on minimal prior model knowledge. Specifically, we propose a nonlinear filtering scheme, which extracts the hidden sources of variability captured by two or more sensors, that are independent of the sensor-specific components. In addition to presenting a theoretical analysis, we demonstrate our technique on real measured data for the purpose of sleep stage assessment based on multiple, multimodal sensor measurements. We show that without prior knowledge on the different modalities and on the measured system, our method gives rise to a data-driven representation that is well correlated with the underlying sleep process and is robust to noise and sensor-specific effects.

Introduction

Often, when measuring a phenomenon of interest that arises from a complex dynamical system, a single data acquisition method is not capable of capturing its entire complexity and characteristics, and it is usually prone to noise and interferences. Recently, due to technological advances, the use of multiple types of measurement instruments and sensors have become more and more popular; nowadays, such equipment is smaller, less expensive, and can be mounted on every-day products and devices more easily. In contrast to a single sensor, multimodal sensors may capture complementary aspects and features of the measured phenomenon, and may enable us to extract a more reliable and detailed description of the measured phenomenon.

The vast progress in the acquisition of multimodal data calls for the development of analysis and processing tools, which appropriately combine data from the different sensors and handle well the inherent challenges that arise. One particular challenge is related to the heterogeneity of the data acquired in the different modalities; datasets acquired from different sensors may comprise different sources of variability, where only few are relevant to the phenomenon of interest. This particular challenge as well as many others have been the subject of many studies. For a recent comprehensive reviews, see [1], [2], [3].

In this paper we consider a setting in which a physical phenomenon is measured by multiple sensors. While all sensors measure the same phenomenon, each sensor consists of different sources of variability; some are related to the phenomenon of interest, possibly capturing its various aspects, whereas other sources of variability are sensor-specific and irrelevant. We present an approach based on manifold learning, which is a class of nonlinear data-driven methods, e.g. [4], [5], [6], [7], and specifically, we use the framework of diffusion maps (DM) [8]. On the one hand, manifold learning is particularly suitable for problems with multiple modalities since it aims to capture the intrinsic geometric structure of the underlying data and relies on minimal prior model knowledge. This enables to handle multimodal data in a systematic manner, without the need to specially tailor a solution for each modality. On the other hand, applying manifold learning to data acquired in multiple (multimodal) sensors may capture undesired/nuisance geometric structures as well. Recently, several manifold learning techniques for multimodal data have been proposed [9], [10], [11], [12]. In [9], the authors suggest to concatenate the samples acquired by different sensors into unified vectors. However this approach is sensitive to the scaling of each dataset, which might be especially diverse among datasets acquired by different modalities. To alleviate this problem, it is proposed in [10] to use DM to obtain “standardized” representation of each dataset separately, and then to concatenate these “standardized” representations into the unified vectors. Despite handling better multimodal data, this concatenation scheme does not utilize the mutual relations and co-dependencies that might exist between the datasets.

While methods such as those presented in [9], [10], [12] take into account all the measured information, the methods presented in [11], [13], [14], [15] use local kernels to implement nonlinear filtering. Specifically, following a recent line of study in which multiple kernels are constructed and combined [16], [17], [18], [19], in [13], [14], it was shown that a method based on alternating applications of diffusion operators extracts only the common source of variability among the sensors, while filtering out the sensor-specific components. Therefore we choose to establish our method based on DM which relies on those theoretical foundations. For other nonlinear methods, such as local CCA [11] and kernel CCA [15], more efforts are needed to better understand their theoretical foundation, yet, they may also be used as an alternative to DM and alternating diffusion (AD) and empirically tested as well. The shortcoming of alternating applications of diffusion operators arises when having a large number of sensors; often, sensors that measure the same system capture different information and aspects of that system. As a result, the common source of variability among all the sensors captures only a partial or empty look of the system, and important relevant information may be undesirably filtered out.

Here, we address the tradeoff between these two approaches. That is, we aim to maintain the relevant information captured by multiple sensors, while filtering out the nuisance components. Since the relevance of the various components is unknown, our main assumption is that the sources of variability which are measured only in a single sensor, i.e., sensor-specific, are nuisance. Conversely, we assume that components measured in two or more sensors are of interest. Importantly, such an approach implements implicitly a smart “sensor selection”; “bad” sensors that are, for example malfunctioned and measure only nuisance information, are automatically filtered out. These assumptions stem from the fact that the phenomenon of interest is global and not specific to one sensor. We propose a nonlinear filtering scheme, in which only the sensor-specific sources of variability are filtered out while the sources of variability captured by two or more sensors are preserved.

Based on prior theoretical results [13], [14], we show that our scheme indeed accomplishes this task. We illustrate the main features of our method on a toy problem. In addition, we demonstrate its performance on real measured data in an application for sleep stage assessment based on multiple, multimodal sensor measurements. Sleep is a global phenomenon with systematic physiological dynamics that represents a recurring non-stationary state of mind and body. Sleep evolves in time and embodies interactions between different subsystems, not solely limited in the brain. Thus, in addition to the well-known patterns in electroencephalogram (EEG) signals, its complicated dynamics are manifested in other sensors such as sensors measuring breathing patterns, muscle tones and muscular activity, eyeball movements, etc. Each one of the sensors is characterized by different structures and affected by numerous nuisance processes as well. In other words, while we could extract the sleep dynamics by analyzing different sensors, each sensor captures only part of the entire sleep process, whereas it introduces modality artifacts, noise, and interferences. We show that our scheme allows for an accurate systematic sleep stage identification based on multiple EEG recordings as well as multimodal respiration measurements. In addition, we demonstrate its capability to perform sensor selection by artificially adding noise sensors.

The remainder of the paper is organized as follows. In Section 2 we present a formulation for the common source extraction problem and present an illustrative toy problem. In Section 3, a brief review for the method proposed in [13], [14] is outlined, and then, a detailed description and interpretation of the proposed scheme are presented. In Section 4, we first demonstrate the capabilities of the proposed scheme on the toy problem introduced in Section 2. Then, in Section 5, we demonstrate the performance in sleep stage identification based on multimodal measured data recorded in a sleep clinic. Finally, in Section 6, we outline several conclusions.

Section snippets

Problem setting

Consider a system driven by a set of K hidden random variables Θ={θ(1),θ(2),,θ(K)}, where θ(k)Rdk. The system is measured by M observable variables s(m),m=1,,M, where each sensor has access to only a partial view of the entire system and its driving variables Θ. To formulate it, we define a “sensitivity table” given by the binary matrix SZ2K×M, indicating the variables sensed by each observable variable. Specifically, the (k,m)th element in S indicates whether the hidden variable θ(k) is

Diffusion maps

DM is a non-linear data-driven dimensionality reduction method [8]. Assume we have N high-dimensional data-points {si}i=1N. The DM method begins with the calculation of a pairwise affinity matrix based on a local kernel, often using some metric within a gaussian kernel, i.e.,Wi,j=exp(dM(si(1),sj(1))2ɛ),where ε > 0 is a tuneable kernel scale and dM( · ,  · ) is a metric. The choice of the metric dM( · ,  · ) depends on the application; common choices are the Euclidean and the Mahalanobis

Simulation results

Consider the toy problem described in Section 2.1. We simulate 6 hidden scalar variables: 3 common variables (θ(1), θ(2), θ(3)) and 3 nuisance variables (n(1), n(2), n(3)). The variables are statistically independent and uniformly distributed in [0, 2π]. We then build 3 sets of N RGB images: {ri(1)},{ri(2)},{ri(3)},i=1,,N. The sensitivity table of this example is given byST=(110011101).

Each image contains 3 arrows, where each arrow is rotated according to a randomly generated angle: the angles

Application to sleep stage assessment

As mentioned above, the problem of extracting the common hidden variables from multiple data sets taken by different observables can be perceived as a problem of nonlinear filtering. To demonstrate the potential of this particular nonlinear filtering scheme in processing real data, we apply the proposed algorithm to sleep data, where the ultimate goal is to devise an automatic system for sleep stage assessment.

Sleep is a global and recurrent physiological process, which is in charge of the

Conclusions

In this paper, we propose a new algorithm for fusing information measured by multiple, multimodal sensors. The primary focus is on a setting in which all sensors observe the same system, but each introduces different variables – some are related to various aspects of the system of interest, whereas others are sensor-specific and irrelevant. We present a nonlinear data fusion scheme for suppressing the sensor-specific variables while preserving the system variables measured by two or more

Acknowledgments

The authors would like to thank the anonymous reviewers for their helpful suggestions. The authors wish to thank Ofer Karp and Idan Amir for sharing their insights from prior contributions on the subject and for making their code available. Hau- tieng Wu acknowledges the support of Sloan Research Fellow FR-2015-65363. This research was partly supported by the European Union Seventh Framework Programme (FP7) under Marie Curie Grant 630657 and by the Israel Science Foundation (grant no. 1490/16).

References (45)

  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural. Comput.

    (2003)
  • M. Davenport et al.

    Joint manifolds for data fusion

    IEEE Trans. Image Process.

    (2010)
  • Y. Keller et al.

    Audio-visual group recognition using diffusion maps

    IEEE Trans. Signal Process.

    (2010)
  • O. Yair et al.

    Local canonical correlation analysis for nonlinear common variables discovery

    IEEE Trans. Signal Process.

    (2017)
  • M. Salhov, O. Lindenbaum, A. Silberschatz, Y. Shkolnisky, A. Averbuch, Multi-view kernel consensus for data analysis...
  • R.R. Lederman et al.

    Learning the geometry of common latent variables using alternating-diffusion

    Appl. Comp. Harmon. Anal.

    (2015)
  • R. Talmon, H.-t. Wu, Latent common manifold learning with alternating diffusion: analysis and applications, arXiv:...
  • P.L. Lai et al.

    Kernel and nonlinear canonical correlation analysis

    Int, J. Neural Syst.

    (2000)
  • V.R. de Sa

    Spectral clustering with two views

    ICML Workshop on Learning with Multiple Views

    (2005)
  • V.R. de Sa et al.

    Multi-view kernel construction

    Mach. Learn.

    (2010)
  • B. Boots et al.

    Two-manifold problems with applications to nonlinear system identification

    Proc. 29th Intl. Conf. on Machine Learning (ICML)

    (2012)
  • T. Michaeli, W. Wang, K. Livescu, Nonparametric canonical correlation analysis, arXiv: 1511.04839...
  • Cited by (28)

    • Deep learning-based urban big data fusion in smart cities: Towards traffic monitoring and flow-preserving fusion

      2021, Computers and Electrical Engineering
      Citation Excerpt :

      Similarity-based data fusion process is categorized into coupled matrix factorization method, and manifold alignment method; where the coupled matrix factorization method is further categorized into collaborative filtering and matrix factorization. Katz et al. [23] studied the variant data source using two or more than two sensors, and suggested a model using manifold learning process to analyze the data based on its internal structure, making the suggested model dependent on minimal prior knowledge. Since 2010 deep neural networks (DNNs) have gained high popularity not only in the fields of natural language processing and visual and image processing, but these techniques are also suggested by the researchers to analyze urban big data.

    • Urban big data fusion based on deep learning: An overview

      2020, Information Fusion
      Citation Excerpt :

      Coupled matrix factorization [31,42] and manifold learning [32,43] are two classical data fusion methods based on similarity, which can find interesting structures from one dimension of data. Katz et al. [32] extracted the variable source data by two or more sensors, and then proposed a method based on manifold learning to capture the internal structure of the data, making the proposed model dependent on minimal prior knowledge. The probabilistic dependency-based data fusion method is based on graph structure.

    View all citing articles on Scopus
    View full text