Full Length ArticleA survey on machine learning for data fusion
Introduction
In the era of information explosion, huge volumes of data are created, collected and processed. We can extract and gain valuable information from data to look for the rules of the world and to discover the nature of things. Instead of believing in experiences or intuition, we are more likely and feel more confidence to draw a conclusion or make a decision on the basis of real-world data. However, big data also accompany with difficulties and challenges in data driven service provision because of its “5V” characteristics: Volume, Variety, Velocity, Veracity and Value. Obviously, traditional data processing techniques in the literature are hard to meet the demand in the new era of big data. How to capture reliable, valuable and accurate information in massive data is one of the most significant research topics nowadays.
The cyber world brings us overmuch data to dispose. However, raw data captured from various environments are heterogeneous, complex, imperfect, and of a huge scale, which brings us many challenges to transform them into useful information. All kinds of data processing technologies, including but not limited to data preprocessing, data storage, data transfer, data fusion, data analysis, information retrieval and so on, are major in solving these problems and stemming from diverse processing theories. In this paper, we focus on data fusion. It is a technology that merges data to obtain more consistent, informative and accurate information than the original raw data that are mostly uncertain, imprecise, inconsistent, conflicting and alike. Varieties of data fusion methods have been designed in different application fields. Generally, data fusion is widely used in wireless sensor networks, image processing, radar systems, object tracking, target detection and identification, intrusion detection, situation assessment, etc. [1].
Traditional data fusion techniques include probabilistic fusion (e.g., Bayesian fusion), evidential belief reasoning fusion (e.g., Dempster-Shafer theory), and rough set-based fusion, etc. [2]. In recent years, the development of sensors, processing hardware and many other data processing technologies bring a new development opportunity to data fusion. As a technique with strong abilities to compute and classify data, machine learning is highly expected to improve the overall performance of data fusion algorithms.
Machine learning is a technique that lets the computer “learn” with provided data without thoroughly and explicitly programming of every problem. It aims at modeling profound relationships in data inputs and reconstructs a knowledge scheme. The result of learning can be used for estimation, prediction, and classification. The name of “machine learning” was first proposed in 1959 [3]. After decades, the advance of computation ability of digital computers notably improves the performance of machine learning. Machine learning enables classification and prediction based on known data and can achieve high accuracy and reliability, which makes it more likely to inform a correct decision. In recent years, machine learning has been applied into data fusion to improve its performance and offer satisfactory fusion results.
There are some surveys about data fusion published in recent years with different emphases. Alam et al. [4] completed a literature review on data fusion in IoT, which contains mathematical fusion methods such as probabilistic methods, artificial intelligence, and theory of belief in the domain of IoT. Focusing on IoT narrows down the review, while data fusion with machine learning covers a wide area. Gite et al. [5] and Snidaro et al. [58] focused on data fusion models used in context-aware systems. Pires et al. [6] summarized the state of the art of data fusion techniques about sensors embedded in mobile devices. Navarro-Arribas and Torra [7] reviewed the approaches of information fusion for achieving data privacy. Faouzi et al. [8] concentrated on the application of data fusion models in intelligent transportation systems. Corona et al. [9] studied information fusion methods for computer security. Yao et al. [10] made an overview on web information fusion and integration. Ding et al. [76] reviewed data fusion methods in Internet of Things, mainly focusing on secure and privacy-preserving fusion. We can see that the above surveys hold different concentrations from our survey presented in this paper.
On the other hand, some works provide an overview on machine learning in some specific application scenarios, especially in big data processing related environments. For example, Liao et al. [11] surveyed machine learning applications and achievements in the past decade (2000--2011). Rudin and Wagstaff [12] reviewed the advances of machine learning in real-world problems of science and society. Qiu et al. [13] studied on machine learning for big data processing. They pointed out five significant issues in the learning of big data through a literature review. Zhang et al. [14] reviewed representative works of deep learning in big data.
In summary, we can find many existing surveys about data fusion and machine learning from various views. However, in the context of fast growth of artificial intelligence-based fusion models and their excellent properties, a survey specific to data fusion based on machine learning is still lacking. Although Alam et al. [4] provided a review on data fusion techniques with artificial intelligence, they only paid attention to the literature about data fusion in Internet of Things. Their review is limited with regard to the scope of models. A horizontal comparison with detailed analysis is still missed. Considering the recent advance of machine learning, it becomes essential to comprehend elementary knowledge, current application state and future trends of this field with the help of a thorough survey.
In this paper, we perform a serious survey on data fusion techniques with machine learning. We first comprehensively introduce basic definitions and background knowledge about machine learning and data fusion. Then, we indicate critical challenges of data fusion and propose a number of criteria of data fusion. We make a deep-insight overview on data fusion techniques based on machine learning by commenting the performance of each reviewed work with the help of and by employing the criteria. Through analysis and discussion, as well as comparison, we find some open problems, which further allow us to indicate several research directions to motivate future research in this promising research field. In particular, the main contributions of this paper are described below:
- •
We sum up a group of main challenges that data fusion might face. Then, we propose a thorough list of requirements as uniform criteria that can serve as a measure to evaluate the performance of data fusion methods based on machine learning.
- •
We review the literature of data fusion based on machine learning in various application scenarios, discuss their advantages and weakness in detail according to the proposed criteria. In each literature review, how a machine learning method can ameliorate fusion performance is especially commented.
- •
Based on the completed review and in-depth analysis, some significant open issues and valuable future research directions are presented, which are useful and referable for the researchers and practitioners in this field.
The reminder of the paper is organized below. We provide an overview of background knowledge of data fusion and machine learning in Section 2. To review the literature comprehensively with a uniform measure, we propose a number of criteria on data fusion in Section 3. Section 4 reviews the recent literature about data fusion with machine learning that are categorized into three classes: signal level data fusion, feature level data fusion and decision level data fusion. All the literatures are reviewed with respect to their model structures, application background and technical advantages. Besides, we discuss their performance with the help of the proposed criteria. We also summarize the overall comparison of all the reviewed models/methods in this Section. In Section 5, we point out open issues and propose future research directions in this research field based on the result of literature review. Finally, conclusions are provided in the last section.
Section snippets
Overview of data fusion and machine learning
This section provides background information and concepts related to data fusion. It also specifies the challenges of data fusion and makes a brief introduction of machine learning and its common models.
Criteria of machine learning for data fusion
In this section, we list the criteria that a data fusion model or algorithm should satisfy in order to employ them as evaluation metrics to review the literature in the next section. In what follows, data fusion model, method and algorithm are used interchanged with the same or similar meaning if not specially annotated. Facing the challenges as mentioned in Section 2.3, we propose a list of criteria to comprehensively and thoroughly evaluate the performance of data fusion.
(1) Efficiency (Ef):
Machine learning for data fusion
In this section, we review the state of the art of machine learning for data fusion by classifying the current works into three categories: signal level data fusion, feature level data fusion and decision level data fusion. In each category, we review the literature based on the type of machine learning. For each work, we summarize its main contributions and characteristics, and comment on its performance based on the proposed criteria. At the end, we summarize and compare all the reviewed
Open issues and future research directions
Based on the detailed survey reported in Section 4, we further indicate a number of open issues and suggest some future research directions.
Conclusions
This paper has made a comprehensive review on the literature about machine learning for data fusion. We first provided basic background knowledge about data fusion and machine learning. We further proposed a number of criteria to evaluate the works reviewed in this paper for the purpose of commenting their pros and cons remarkably. We carefully reviewed the recent literature based on the level of fusion taken apart in and the type of machine learning, and then used a table to summarize our main
Declaration of competing interest
None.
Acknowledgments
This work is sponsored by the NSFC (grants 61672410, 61802293 and U1536202), Academy of Finland (grants 308087 and 314203), National Postdoctoral Program for Innovative Talents (grant BX20180238), the Project funded by China Postdoctoral Science Foundation (grant 2018M633461), the Fundamental Research Funds for the Central Universities (grant JB191504), the Shaanxi Innovation Team project (grant 2018TD-007), and the 111 project (grants B16037).
References (82)
- et al.
Information fusion in data privacy: a survey
Inf. Fusion
(2012) - et al.
Information fusion for computer security: state of the art and open issues
Inf. Fusion
(2009) - et al.
Web information fusion: a review of the state of the art
Inf. Fusion
(2008) A survey on deep learning for big data
Inf. Fusion
(2018)- et al.
Data fusion in heterogeneous networks
Inf. Fusion
(2020) Multisensor data fusion: a review of the state-of-the-art
Inf. Fusion
(2013)- et al.
Multi-sensor data fusion using support vector machine for motor fault detection
Inf. Sci.
(2012) - et al.
Fusing and mining opinions for reputation generation
Inf. Fusion
(2017) - et al.
Intrusion detection in computer networks by a modular ensemble of one-class classifiers
Inf. Fusion
(2008) - et al.
A deep learning-based multi-sensor data fusion method for degradation monitoring of ball screws
Context-based information fusion: a survey and discussion
Inf. Fusion
An approach to rank reviews by fusing and mining opinions based on review pertinence
Inf. Fusion
A survey on data fusion in Internet of Things: towards secure and privacy-preserving fusion
Inf. Fusion
Network traffic fusion and analysis against DDoS flooding attacks with a novel reversible sketch
Inf. Fusion
Big data fusion in Internet of Things
Inf. Fusion
Trustworthy data fusion and mining in Internet of Things
Future Generat. Comput. Syst.
Fusion - an aide to data mining in Internet of Things
Inf. Fusion
A reversible sketch-based method for detecting and mitigating amplification attacks
J. Netw. Comput. Appl.
An introduction to multisensor data fusion
Proc. IEEE
A review of data fusion techniques
Sci. World J.
Some studies in machine learning using the game of checkers. I
Comput. Games I
Data fusion and IoT for smart ubiquitous environments: a survey
IEEE Access
On context awareness for multisensor data fusion in IoT
Springer India
From data acquisition to data fusion: a comprehensive review and a roadmap for the identification of activities of daily living using mobile devices
Sensors
Data fusion in intelligent transportation systems: progress and challenges – A survey
Inf. Fusion
Data mining techniques and applications – a decade review from 2000 to 2011
Expert Syst. Appl.
Machine learning for science and society
Mach. Learn.
A survey of machine learning for big data processing
EURASIP J. Adv. Signal Process.
Multisensor fusion of target attributes and kinematics
Multisensor integration and fusion: issues and approaches
SPIE Sensor Fusion
Sensor fusion potential exploitation-innovative architectures and illustrative applications
Proc. IEEE
Revisions to the JDL data fusion model
Proc. SPIE - Int. Soc. Opt. Eng.
Data fusion architectures: a survey and comparison
security data collection and data analytics in the Internet: a survey
IEEE Commun. Surv. Tutor.
Alert fusion based on cluster and correlation analysis
An anomaly detection based on data fusion algorithm in wireless sensor networks
Int. J. Distrib. Sens. Netw.
Estimation of tool wear during CNC milling using neural network-based sensor fusion
Mech. Syst. Signal Process.
Distributed data fusion using support vector machines
Int. Conf. Inf. Fusion
Biometric fusion using enhanced SVM classification
Cited by (407)
A review of cancer data fusion methods based on deep learning
2024, Information FusionA manifold intelligent decision system for fusion and benchmarking of deep waste-sorting models
2024, Engineering Applications of Artificial IntelligencePrediction of adsorption of metal cations by clay minerals using machine learning
2024, Science of the Total EnvironmentSub-seasonal soil moisture anomaly forecasting using combinations of deep learning, based on the reanalysis soil moisture records
2024, Agricultural Water Management