Elsevier

Information Fusion

Volume 68, April 2021, Pages 85-117
Information Fusion

Multi-source information fusion based on rough set theory: A review

https://doi.org/10.1016/j.inffus.2020.11.004Get rights and content

Highlights

  • Homogeneous and heterogeneous MSIF models are systematically introduced.

  • Multi-view rough set is discussed by combining several rough set models.

  • Parallel computing model based rough sets is reviewed in view of information fusion.

  • Incremental learning fusion technology based rough sets is surveyed.

  • Cluster ensembles and three-way clustering fusion approaches based rough sets are overviewed.

Abstract

Multi-Source Information Fusion (MSIF) is a comprehensive and interdisciplinary subject, and is referred to as, multi-sensor information fusion which was originated in the 1970s. Nowadays, the types and updates of data are becoming more multifarious and frequent, which bring new challenges for information fusion to deal with the multi-source data. Consequently, the construction of MSIF models suitable for different scenarios and the application of different fusion technologies are the core problems that need to be solved urgently. Rough set theory (RST) provides a computing paradigm for uncertain data modeling and reasoning, especially for classification issues with noisy, inaccurate or incomplete data. Furthermore, due to the rapid development of MSIF in recent years, the methodologies of learning under RST are becoming increasingly mature and systematic, unveiling a framework which has not been mentioned in the literature. In order to better clarify the approaches and application of MSIF in RST research community, this paper reviews the existing models and technologies from the perspectives of MSIF model (i.e., homogeneous and heterogeneous MSIF model), multi-view rough sets information fusion model (i.e., multi-granulation, multi-scale and multi-view decisions information fusion models), parallel computing information fusion model, incremental learning fusion technology and cluster ensembles fusion technology. Finally, RST based MSIF related research directions and challenges are also covered and discussed. By providing state-of-the-art understanding in specialized literature, this survey will directly help researchers understand the research developments of MSIF under RST.

Introduction

The methodology of Multi-Source Information Fusion (MSIF) is based on multi-sensor information fusion, which can achieve more significant information with higher accuracy than a single source (or a single sensor). In the late 1970s, the word of fusion based on the comprehensive meaning of multi-source information began to appear in various publications. Since then, the theories and technologies of MSIF have developed rapidly, as an independent discipline, which has been successfully applied in military command automation system, strategic warning and defense system, multi-target tracking and identification. Moreover, MSIF is also gradually radiating to remote sensing monitoring, medical diagnosis, electronic commerce, wireless communication and fault diagnosis and other civilian fields [1], [2], [3], [4].

The amount of data produced in the world each year is rising at a rate of thirty percent [5]. Data is produced by everything around us via social media exchange, and transmitted by all kinds of networks, sensors, and mobile devices. Data acquiring is no longer limited to a single data source with the full apperceive of information in a Big Data environment. The storage and description of data appears in the form of multiple sources. Various information of knowledge structure are implied in the relationships among data samples from different data sources, which express information among data samples from multiple perspectives. Fortunately, the fundamental principle of MSIF is to make full use of multiple information sources. According to specific criteria, we can combine multiple sources of information with constraints of spatial redundancy, temporal redundancy or complementary information. MSIF is studied widely in different real-life applications and diverse theories and methods have been used in MSIF. For instance, Zhang et al. proposed a multiple-metric learning algorithm to learn jointly a set of optimal homogeneous or heterogeneous metrics in order to fuse the data collected from multiple sensors for joint classification [6]. Dasarathy described a panoramic overview of MSIF in the field of multi-sensor from three complementary perspectives [7]. Yang et al. presented a mixed structure multi-mode data fusion based on D–S evidence theory and subjective Bayesian algorithm [8]. Cai et al. proposed a data fusion method for fault detection by utilizing Bayes network model, and applied the method to the parallel simulation to improve the diagnostic accuracy of ground source heat pump system [9]. Gravina et al. presented the motivations and advantages of multi-sensor data fusion and parameters affecting data fusion, and further discussed the design choices of data fusion affected by parameters at different levels [10]. Yager introduced a general framework for the multi-source data fusion processing and algorithm development [11], and investigated a novel monotonic set measure as a means of representing the multi-source fusion imperative [12]. Li et al. addressed a method of multi-source data clustering on account of homogeneous observations which are applied for multi-target detection from cluttered background with misdetection [13]. Saadi et al. presented a framework to allow intelligent merging of multiple data sources and can be applied to the urban transportation [14].

Rough Set Theory (RST), put forward by Pawlak [15], [16], provides an effective mathematical tool for dealing with uncertainty. RST plays a critical role in extracting useful features, simplifying information processing, studying expression learning, and finding imprecise and uncertain information. At present, RST has been successfully applied to machine learning, decision analysis, process control, approximate reasoning, pattern recognition, data mining and other intelligent information processing fields [17], [18], [19], [20], [21]. From the perspective of data analysis, the main advantages of RST can be summarized as five points [15]. (i) It does not require any prior knowledge of the data. (ii) It has a certain ability to search the smallest collection of the data. (iii) It can evaluate the significance of the data. (iv) It allows the use of both qualitative and quantitative data, and (v) it can generate the set of decision rules from the data.

Due to the superiority of RST, increasing effort has been directed to the study of RST based data analysis, especially the field of data fusion. The fusion process of MSIF refers to deal with the data based on different models and approaches, which is essentially information fusion. The interest in the field of MSIF based on RST has significantly increased, leading to a growing number of techniques and methods. For example, Khan and Banerjee proposed a concept of multiple-source approximation systems based on Pawlak approximation spaces, which is a precursor for researchers to study MSIF in RST [22]. Li and Fei discussed a method of information fusion in wireless sensor network [23]. Liu et al. addressed a framework for performance test and evaluation of the multi-sensor data fusion with C3I applications background [24]. Li et al. proposed a weighted fusion approach based on Granular Computing (GrC) and RST, which has been applied to road safety indicator analysis [25]. Yao et al. came up with a multi-source alert data understanding scheme for security semantic discovery [26]. In order to solve the fusion problem of multi-source information system (MsIS), Xu et al. presented the internal-confidence and external-confidence degrees to estimate the reliability of each information source [27], then they also considered the information fusion issue based on information entropy in fuzzy incomplete information systems [28]. Che et al. employed three approaches to address the information fusion and numerical characterization of the uncertain data [29]. Yang et al. developed a multi-granulation method for information fusion [30]. Sang et al. discussed the three kinds of multi-source decision methods based on the uncertainty of decision-making process [31]. In addition, Huang et al. addressed a new fusion method based on fuzzy information granulation, which can translate multi-source interval-valued data into trapezoidal fuzzy granules [32].

In literature, a detailed information fusion survey paper on RST [33] was published in 2019, which gives a general overview of the state-of-the-art approaches. It provides a hybrid-view about information fusion from five primary perspectives, objects, attributes, rough approximations, attribute reduction and decision making. The survey paper is comprehensive and can be a good introduction to information fusion. However, information fusion is essentially a process of integrating multi-source information from multilevel and multifaceted. In recent years, the number of proposals in the area of MSIF have significantly increased. There is therefore a gap in the current literature that requires a fuller picture of established on MSIF models and technologies. It is essential to review the past research focuses and give the most recent research trends about MSIF. Hence, this paper aims to present a comprehensive survey of the five major aspects of MSIF base on RST: MSIF fusion models (including homogeneous and heterogeneous MSIF model), multi-view rough sets information fusion model, parallel computing information fusion model, incremental learning and cluster ensembles fusion technologies, as shown in Fig. 1, and a discussion about the new trend of MSIF research.

The main contributions of this review can be summarized as follows.

(1) It perceptively summarizes MSIF research achievements and clusters the research into five categories: MSIF fusion models, multi-view rough sets information fusion model, parallel computing information fusion model, incremental learning and clustering ensembles fusion technologies (Fig. 1);

(2) It considers the MSIF models according to two perspectives: homogeneous and heterogeneous models;

(3) It combines several rough set models to study MSIF from different perspectives, and collectively refers to these models as a multi-view rough set model;

(4) It introduces the parallel computing model, which can accelerate the data fusion and be good at dealing with large-scale data via MapReduce framework;

(6) It uncovers an incremental learning fusion technique, such as Incremental Learning Information Fusion (ILIF) under new immigrating multiple objects, attributes and attributes values. Moreover, it identifies related research involving incremental learning;

(7) It reviews cluster ensembles fusion technologies based RST, as well as the methods of roughness and rough k-means for clustering. In addition, it emphasizes the concepts and applications of three-way clustering;

(8) It suggests several emerging research topics and potential research directions in this area.

Section snippets

Problem description

This section first gives the basic definition of MSIF in Section 2.1. Then, in Section 2.2, the general application mechanism of RST in MSIF is introduced.

Preliminary concepts on RST

In this paper, suppose that U={x1,x2,,xn} is an universe (non-empty finite set), 2U is recorded as the collection formed by all subsets of U and |X| indicates the cardinality of X2U.

MSIF models in rough set theory

Nowadays, the high fusion of Cyber–Physical Human System (CPHS) has triggered the explosive growth of data scale and the high complexity of data models. The world has entered the era of networked Big Data [48], [49], which has the characteristics of five Vs, i.e., Volume, Velocity, Variety, Value and Variety [50]. Big data is a fast growing field both from an application and from a research point of view. It is worth noting that the most significant thing is to extract useful knowledge from Big

Multi-view rough sets models based information fusion (MvRSIF)

The data of the same object obtained from different ways or different levels is called multi-view data, which presents the characteristics of polymorphism, multi-source, multi-descriptive and high-dimensional heterogeneity. Multi-view data exists widely in real life. For example, in the field of information technology, a web page can be described either by the text information in this web page or by the text information on the anchor link pointing to this web page. In the task of classifying

Parallel computing model based information fusion (PCIF)

In the age of Big data, the method of parallel computing can save a lot of running time and improve its efficiency. Parallel computing refers to the process of using multiple computing resources to solve computing problems at the same time, and an effective approach to improve the computing speed and processing power of computer systems. The fundamental idea is to use multiple processors to solve the same problem collaboratively. Namely, the problem to be solved is decomposed into many parts,

Incremental learning based information fusion (ILIF)

At present, with the continuous development and wide application of storage technology, all walks of life not only accumulate a huge amount of various data, but also a large number of real-time data will be added at any time. As one of the important data types, the multi-source data widely exists in practical applications. In fact, the multi-source data probably vary rapidly with time. Therefore, dynamics has become one of the important characteristics of multi-source data, which needs accurate

Cluster ensembles based information fusion (CEIF)

In the era of big data, it is quite easy to collect unlabeled samples. However, it is hard to obtain samples with labels, since it may lead to plenty of manpower and resources. Therefore, cluster analysis, which is a technology how to analyze unlabeled samples to obtain the distribution characteristics of data, has become the important research contents of machine learning, pattern recognition and data mining [253], [254], [255]. Aiming at the phenomenon of unclear, vague and overlapping

Conclusion: Findings and future directions

In this paper, our aim is to introduce the research progress of MSIF based on RST including conventional models and techniques, which are MSIF models (homogeneous and heterogeneous MSIF models), MvRSIF (MgIF, MsIF and MvDIF models), PCIF (MapReduce and MP–DP models), ILIF (ILIF under new immigrating multiple objects, attributes and attribute’ values) and CEIF (rough sets for cluster ensembles, rough sets for clustering and three-way clustering). In this section, we further discuss this paper

CRediT authorship contribution statement

Pengfei Zhang: Conceptualization, Writing - original draft, Read and contributed to the manuscript. Tianrui Li: Supervision, Project administration, Read and contributed to the manuscript. Guoqiang Wang: Investigation, Read and contributed to the manuscript. Chuan Luo: Writing - review & editing, Read and contributed to the manuscript. Hongmei Chen: Visualization, Read and contributed to the manuscript. Junbo Zhang: Methodology, Read and contributed to the manuscript. Dexian Wang: Structure

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors thank the anonymous reviewers and the editors for their valuable comments and suggestions in improving this paper.

This work was partially supported by the National Natural Science Foundation of China (Nos. 61573292, 62076171, 61976182), Applied Fundamental Research Program of Science & Technology Department of Sichuan Province (No. 2019YJ0084), and Fundamental Research Funds for the Central Universities (No. 2682020CX89).

References (322)

  • YaoY. et al.

    Multi-source alert data understanding for security semantic discovery based on rough set theory

    Neurocomputing

    (2016)
  • XuW. et al.

    A novel approach to information fusion in multi-source datasets: a granular computing viewpoint

    Inf. Sci.

    (2017)
  • CheX. et al.

    Information fusion and numerical characterization of a multi-source information system

    Knowl.-Based Syst.

    (2018)
  • WeiW. et al.

    Information fusion in rough set theory: An overview

    Inf. Fusion

    (2019)
  • AnA. et al.

    Discovering rules for water demand prediction: an enhanced rough-set approach

    Eng. Appl. Artif. Intell.

    (1996)
  • WitloxF. et al.

    The application of rough sets analysis in activity-based modelling. opportunities and constraints

    Expert Syst. Appl.

    (2004)
  • LiZ. et al.

    A novel three-way decision method in a hybrid information system with images and its application in medical diagnosis

    Eng. Appl. Artif. Intell.

    (2020)
  • WangP. et al.

    A three-way decision method based on Gaussian kernel in a hybrid information system with images: An application in medical diagnosis

    Appl. Soft Comput.

    (2019)
  • DobreC. et al.

    Intelligent services for big data science

    Future Gener. Comput. Syst.

    (2014)
  • CamachoJ.

    Visualizing big data with compressed score plots: approach and research challenges

    Chemometr. Intell. Lab. Syst.

    (2014)
  • GuanY.-Y. et al.

    Set-valued information systems

    Inform. Sci.

    (2006)
  • QianY. et al.

    Set-valued ordered information systems

    Inform. Sci.

    (2009)
  • DaiJ. et al.

    Entropy measures and granularity measures for set-valued information systems

    Inform. Sci.

    (2013)
  • ZhangJ. et al.

    Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems

    Internat. J. Approx. Reason.

    (2012)
  • LuoC. et al.

    Dynamic maintenance of approximations in set-valued ordered decision systems under the attribute generalization

    Inform. Sci.

    (2014)
  • LuoC. et al.

    Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values

    Inform. Sci.

    (2015)
  • DaiJ. et al.

    Fuzzy rough set model for set-valued data

    Fuzzy Sets and Systems

    (2013)
  • WeiW. et al.

    Fuzzy rough approximations for set-valued data

    Inform. Sci.

    (2016)
  • ZhangH.-Y. et al.

    Feature selection and approximate reasoning of large-scale set-valued decision tables based on α-dominance-based quantitative rough sets

    Inform. Sci.

    (2017)
  • HuangY. et al.

    Dynamic variable precision rough set approach for probabilistic set-valued information systems

    Knowl.-Based Syst.

    (2017)
  • ZhaoX.R. et al.

    Three-way decisions with decision-theoretic rough sets in multiset-valued information tables

    Inform. Sci.

    (2020)
  • ZhangJ. et al.

    Composite rough sets for dynamic data mining

    Inform. Sci.

    (2014)
  • ZengA. et al.

    A fuzzy rough set approach for incremental feature selection on hybrid information systems

    Fuzzy Sets and Systems

    (2015)
  • ZengA. et al.

    Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values

    Inform. Sci.

    (2017)
  • LinG. et al.

    An information fusion approach by combining multigranulation rough sets and evidence theory

    Inform. Sci.

    (2015)
  • SunB. et al.

    Heterogeneous multigranulation fuzzy rough set-based multiple attribute group decision making with heterogeneous preference information

    Comput. Ind. Eng.

    (2018)
  • ZhangL. et al.

    Agent evaluation based on multi-source heterogeneous information table using TOPSIS

    Adv. Eng. Inform.

    (2019)
  • ZhangW.-X. et al.

    Incomplete information system andits optimal selections

    Comput. Math. Appl.

    (2004)
  • HurlbertG. et al.

    On universal cycles for multisets

    Discrete Math.

    (2009)
  • EliahouS. et al.

    Mutually describing multisets and integer partitions

    Discrete Math.

    (2013)
  • YuanY.-H. et al.

    A novel multiset integrated canonical correlation analysis framework and its application in feature fusion

    Pattern Recognit.

    (2011)
  • GirishK. et al.

    Multiset topologies induced by multiset relations

    Inform. Sci.

    (2012)
  • RiesgoÁ. et al.

    Basic operations for fuzzy multisets

    Internat. J. Approx. Reason.

    (2018)
  • PanQ.

    Multi-Soure Information Fusion Theory and its Applications

    (2013)
  • LlinasJ. et al.

    Multisensor Data Fusion

    (1990)
  • HallD.

    Mathematical Techniques in Multisenor Data Fusion

    (1992)
  • HallD.

    Handbook of Multisenor Data Fusion

    (2001)
  • X. Dong, F. Naumann, Data fusion - Resolving data conflicts for integration, 2 (2) (2009)...
  • DasarathyB.V.

    Multi-sensor, multi-source information fusion: architecture, algorithms, and applications-a panoramic overview

  • YagerR.R.

    Set measure directed multi-source information fusion

    IEEE Trans. Fuzzy Syst.

    (2011)
  • Cited by (143)

    View all citing articles on Scopus
    View full text