Elsevier

Information Fusion

Volume 57, May 2020, Pages 115-129
Information Fusion

Full Length Article
A survey on machine learning for data fusion

https://doi.org/10.1016/j.inffus.2019.12.001Get rights and content

Highlights

  • We sum up a group of main challenges that data fusion might face.

  • We propose a thorough list of requirements to evaluate data fusion methods.

  • We review the literature of data fusion based on machine learning.

  • We comment on how a machine learning method can ameliorate fusion performance.

  • We present significant open issues and valuable future research directions.

Abstract

Data fusion is a prevalent way to deal with imperfect raw data for capturing reliable, valuable and accurate information. Comparing with a range of classical probabilistic data fusion techniques, machine learning method that automatically learns from past experiences without explicitly programming, remarkably renovates fusion techniques by offering the strong ability of computing and predicting. Nevertheless, the literature still lacks a thorough review of the recent advances of machine learning for data fusion. Therefore, it is beneficial to review and summarize the state of the art in order to gain a deep insight on how machine learning can benefit and optimize data fusion. In this paper, we provide a comprehensive survey on data fusion methods based on machine learning. We first offer a detailed introduction to the background of data fusion and machine learning in terms of definitions, applications, architectures, processes, and typical techniques. Then, we propose a number of requirements and employ them as criteria to review and evaluate the performance of existing fusion methods based on machine learning. Through the literature review, analysis and comparison, we finally come up with a number of open issues and propose future research directions in this field.

Introduction

In the era of information explosion, huge volumes of data are created, collected and processed. We can extract and gain valuable information from data to look for the rules of the world and to discover the nature of things. Instead of believing in experiences or intuition, we are more likely and feel more confidence to draw a conclusion or make a decision on the basis of real-world data. However, big data also accompany with difficulties and challenges in data driven service provision because of its “5V” characteristics: Volume, Variety, Velocity, Veracity and Value. Obviously, traditional data processing techniques in the literature are hard to meet the demand in the new era of big data. How to capture reliable, valuable and accurate information in massive data is one of the most significant research topics nowadays.

The cyber world brings us overmuch data to dispose. However, raw data captured from various environments are heterogeneous, complex, imperfect, and of a huge scale, which brings us many challenges to transform them into useful information. All kinds of data processing technologies, including but not limited to data preprocessing, data storage, data transfer, data fusion, data analysis, information retrieval and so on, are major in solving these problems and stemming from diverse processing theories. In this paper, we focus on data fusion. It is a technology that merges data to obtain more consistent, informative and accurate information than the original raw data that are mostly uncertain, imprecise, inconsistent, conflicting and alike. Varieties of data fusion methods have been designed in different application fields. Generally, data fusion is widely used in wireless sensor networks, image processing, radar systems, object tracking, target detection and identification, intrusion detection, situation assessment, etc. [1].

Traditional data fusion techniques include probabilistic fusion (e.g., Bayesian fusion), evidential belief reasoning fusion (e.g., Dempster-Shafer theory), and rough set-based fusion, etc. [2]. In recent years, the development of sensors, processing hardware and many other data processing technologies bring a new development opportunity to data fusion. As a technique with strong abilities to compute and classify data, machine learning is highly expected to improve the overall performance of data fusion algorithms.

Machine learning is a technique that lets the computer “learn” with provided data without thoroughly and explicitly programming of every problem. It aims at modeling profound relationships in data inputs and reconstructs a knowledge scheme. The result of learning can be used for estimation, prediction, and classification. The name of “machine learning” was first proposed in 1959 [3]. After decades, the advance of computation ability of digital computers notably improves the performance of machine learning. Machine learning enables classification and prediction based on known data and can achieve high accuracy and reliability, which makes it more likely to inform a correct decision. In recent years, machine learning has been applied into data fusion to improve its performance and offer satisfactory fusion results.

There are some surveys about data fusion published in recent years with different emphases. Alam et al. [4] completed a literature review on data fusion in IoT, which contains mathematical fusion methods such as probabilistic methods, artificial intelligence, and theory of belief in the domain of IoT. Focusing on IoT narrows down the review, while data fusion with machine learning covers a wide area. Gite et al. [5] and Snidaro et al. [58] focused on data fusion models used in context-aware systems. Pires et al. [6] summarized the state of the art of data fusion techniques about sensors embedded in mobile devices. Navarro-Arribas and Torra [7] reviewed the approaches of information fusion for achieving data privacy. Faouzi et al. [8] concentrated on the application of data fusion models in intelligent transportation systems. Corona et al. [9] studied information fusion methods for computer security. Yao et al. [10] made an overview on web information fusion and integration. Ding et al. [76] reviewed data fusion methods in Internet of Things, mainly focusing on secure and privacy-preserving fusion. We can see that the above surveys hold different concentrations from our survey presented in this paper.

On the other hand, some works provide an overview on machine learning in some specific application scenarios, especially in big data processing related environments. For example, Liao et al. [11] surveyed machine learning applications and achievements in the past decade (2000--2011). Rudin and Wagstaff [12] reviewed the advances of machine learning in real-world problems of science and society. Qiu et al. [13] studied on machine learning for big data processing. They pointed out five significant issues in the learning of big data through a literature review. Zhang et al. [14] reviewed representative works of deep learning in big data.

In summary, we can find many existing surveys about data fusion and machine learning from various views. However, in the context of fast growth of artificial intelligence-based fusion models and their excellent properties, a survey specific to data fusion based on machine learning is still lacking. Although Alam et al. [4] provided a review on data fusion techniques with artificial intelligence, they only paid attention to the literature about data fusion in Internet of Things. Their review is limited with regard to the scope of models. A horizontal comparison with detailed analysis is still missed. Considering the recent advance of machine learning, it becomes essential to comprehend elementary knowledge, current application state and future trends of this field with the help of a thorough survey.

In this paper, we perform a serious survey on data fusion techniques with machine learning. We first comprehensively introduce basic definitions and background knowledge about machine learning and data fusion. Then, we indicate critical challenges of data fusion and propose a number of criteria of data fusion. We make a deep-insight overview on data fusion techniques based on machine learning by commenting the performance of each reviewed work with the help of and by employing the criteria. Through analysis and discussion, as well as comparison, we find some open problems, which further allow us to indicate several research directions to motivate future research in this promising research field. In particular, the main contributions of this paper are described below:

  • We sum up a group of main challenges that data fusion might face. Then, we propose a thorough list of requirements as uniform criteria that can serve as a measure to evaluate the performance of data fusion methods based on machine learning.

  • We review the literature of data fusion based on machine learning in various application scenarios, discuss their advantages and weakness in detail according to the proposed criteria. In each literature review, how a machine learning method can ameliorate fusion performance is especially commented.

  • Based on the completed review and in-depth analysis, some significant open issues and valuable future research directions are presented, which are useful and referable for the researchers and practitioners in this field.

The reminder of the paper is organized below. We provide an overview of background knowledge of data fusion and machine learning in Section 2. To review the literature comprehensively with a uniform measure, we propose a number of criteria on data fusion in Section 3. Section 4 reviews the recent literature about data fusion with machine learning that are categorized into three classes: signal level data fusion, feature level data fusion and decision level data fusion. All the literatures are reviewed with respect to their model structures, application background and technical advantages. Besides, we discuss their performance with the help of the proposed criteria. We also summarize the overall comparison of all the reviewed models/methods in this Section. In Section 5, we point out open issues and propose future research directions in this research field based on the result of literature review. Finally, conclusions are provided in the last section.

Section snippets

Overview of data fusion and machine learning

This section provides background information and concepts related to data fusion. It also specifies the challenges of data fusion and makes a brief introduction of machine learning and its common models.

Criteria of machine learning for data fusion

In this section, we list the criteria that a data fusion model or algorithm should satisfy in order to employ them as evaluation metrics to review the literature in the next section. In what follows, data fusion model, method and algorithm are used interchanged with the same or similar meaning if not specially annotated. Facing the challenges as mentioned in Section 2.3, we propose a list of criteria to comprehensively and thoroughly evaluate the performance of data fusion.

(1) Efficiency (Ef):

Machine learning for data fusion

In this section, we review the state of the art of machine learning for data fusion by classifying the current works into three categories: signal level data fusion, feature level data fusion and decision level data fusion. In each category, we review the literature based on the type of machine learning. For each work, we summarize its main contributions and characteristics, and comment on its performance based on the proposed criteria. At the end, we summarize and compare all the reviewed

Open issues and future research directions

Based on the detailed survey reported in Section 4, we further indicate a number of open issues and suggest some future research directions.

Conclusions

This paper has made a comprehensive review on the literature about machine learning for data fusion. We first provided basic background knowledge about data fusion and machine learning. We further proposed a number of criteria to evaluate the works reviewed in this paper for the purpose of commenting their pros and cons remarkably. We carefully reviewed the recent literature based on the level of fusion taken apart in and the type of machine learning, and then used a table to summarize our main

Declaration of competing interest

None.

Acknowledgments

This work is sponsored by the NSFC (grants 61672410, 61802293 and U1536202), Academy of Finland (grants 308087 and 314203), National Postdoctoral Program for Innovative Talents (grant BX20180238), the Project funded by China Postdoctoral Science Foundation (grant 2018M633461), the Fundamental Research Funds for the Central Universities (grant JB191504), the Shaanxi Innovation Team project (grant 2018TD-007), and the 111 project (grants B16037).

References (82)

  • L. Snidaro et al.

    Context-based information fusion: a survey and discussion

    Inf. Fusion

    (2015)
  • J.Z. Wang et al.

    An approach to rank reviews by fusing and mining opinions based on review pertinence

    Inf. Fusion

    (2015)
  • W.X. Ding et al.

    A survey on data fusion in Internet of Things: towards secure and privacy-preserving fusion

    Inf. Fusion

    (2019)
  • X.Y. Jing et al.

    Network traffic fusion and analysis against DDoS flooding attacks with a novel reversible sketch

    Inf. Fusion

    (2019)
  • Z. Yan et al.

    Big data fusion in Internet of Things

    Inf. Fusion

    (2018)
  • Z. Yan et al.

    Trustworthy data fusion and mining in Internet of Things

    Future Generat. Comput. Syst.

    (2015)
  • J. Liu et al.

    Fusion - an aide to data mining in Internet of Things

    Inf. Fusion

    (2015)
  • X.Y. Jing et al.

    A reversible sketch-based method for detecting and mitigating amplification attacks

    J. Netw. Comput. Appl.

    (2019)
  • D.L. Hall et al.

    An introduction to multisensor data fusion

    Proc. IEEE

    (1997)
  • C. Federico

    A review of data fusion techniques

    Sci. World J.

    (2013)
  • A.L. Samuel

    Some studies in machine learning using the game of checkers. I

    Comput. Games I

    (1988)
  • F. Alam et al.

    Data fusion and IoT for smart ubiquitous environments: a survey

    IEEE Access

    (2018)
  • S. Gite et al.

    On context awareness for multisensor data fusion in IoT

    Springer India

    (2016)
  • I.M. Pires et al.

    From data acquisition to data fusion: a comprehensive review and a roadmap for the identification of activities of daily living using mobile devices

    Sensors

    (2016)
  • N. Faouzi et al.

    Data fusion in intelligent transportation systems: progress and challenges – A survey

    Inf. Fusion

    (2012)
  • S. Liao

    Data mining techniques and applications – a decade review from 2000 to 2011

    Expert Syst. Appl.

    (2012)
  • C. Rudin et al.

    Machine learning for science and society

    Mach. Learn.

    (2014)
  • J. Qiu

    A survey of machine learning for big data processing

    EURASIP J. Adv. Signal Process.

    (2016)
  • F.E. White, Data Fusion Lexicon,...
  • C.L. Bowman and M.S. Murphy, Description of the VERAC NSource tracker/correlator, Naval Res Lab. Report R-01O-80,...
  • C.L. Bowman et al.

    Multisensor fusion of target attributes and kinematics

  • R. Luo et al.

    Multisensor integration and fusion: issues and approaches

    SPIE Sensor Fusion

    (1988)
  • B. Dasarathy

    Sensor fusion potential exploitation-innovative architectures and illustrative applications

    Proc. IEEE

    (1997)
  • Alan N Steinberg et al.

    Revisions to the JDL data fusion model

    Proc. SPIE - Int. Soc. Opt. Eng.

    (1999)
  • S. Ayed et al.

    Data fusion architectures: a survey and comparison

  • X. Jing et al.

    security data collection and data analytics in the Internet: a survey

    IEEE Commun. Surv. Tutor.

    (2018)
  • S. Xiao et al.

    Alert fusion based on cluster and correlation analysis

  • X. Guo et al.

    An anomaly detection based on data fusion algorithm in wireless sensor networks

    Int. J. Distrib. Sens. Netw.

    (2015)
  • N. Ghosh

    Estimation of tool wear during CNC milling using neural network-based sensor fusion

    Mech. Syst. Signal Process.

    (2017)
  • S. Challa et al.

    Distributed data fusion using support vector machines

    Int. Conf. Inf. Fusion

    (2013)
  • M.S. Fahmy

    Biometric fusion using enhanced SVM classification

  • Cited by (407)

    View all citing articles on Scopus
    View full text