Elsevier

Knowledge-Based Systems

Volume 227, 5 September 2021, 107223
Knowledge-Based Systems

Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set

https://doi.org/10.1016/j.knosys.2021.107223Get rights and content

Abstract

Incremental learning strategy based feature selection approaches can improve the efficiency of reduction algorithm used for datasets with dynamic characteristic, which has attracted increasing research attention. Nevertheless, there is currently no work on incremental feature selection approaches for dynamic interval-valued ordered data. Interval-valued ordered data is a generalized form of single-valued ordered data, which is more widely used in practice. However, the endpoints of the interval numbers are easily polluted by noise, thereby the knowledge granules are very sensitive. Motivated by these two issues, we study incremental feature selection approaches based on a fuzzy dominance neighborhood rough set (FDNRS) for dynamic interval-valued ordered data in this work. First, we propose the FDNRS model for an interval-valued ordered decision system (IvODS) and investigate its related properties. Second, a conditional entropy with robustness is proposed based on the proposed model. This conditional entropy can measure the degree of monotonic consistency of the IvODS, so it is used as a metric and combined with a heuristic feature selection algorithm. Finally, two incremental feature selection algorithms are proposed on the basis of the above researches. Experiments are performed on nine public datasets to evaluate the robustness of the proposed metric and the performance of the incremental algorithms. Experimental results verify that the proposed metric is robust and our incremental algorithms are effective and efficient for updating reducts in dynamic IvODS.

Introduction

With the development of the information age, various complex data need to be dealt with in different fields, among which interval-valued data is one of the important representatives. Interval-valued data is widely used in the real world, it is usually used to characterize inaccurate and ambiguous information, such as fluctuations of commodity prices [1], changes of temperature [2], and the range of physiological indicators [3]. In multi-criteria decision analysis problems, interval-valued data follows a preference-ordered relation, which is called interval-valued ordered data [4]. In practical applications, interval-valued ordered data evolves over time, i.e., dynamic interval-valued ordered data [5], [6], which brings challenges for efficient data mining in such data.

Feature selection is a common data dimensionality reduction method in data mining, it can identify more relevant features and reduce the dimension of data, thereby improving the classification ability of the learning models [7], [8], [9], [10], [11]. For dynamic data, some traditional feature selection methods have exposed the defects of low computational efficiency. To improve efficiency, feature selection algorithms with incremental technology have attracted increasing research attention [12], [13], [14], [15], [16]. Nevertheless, up to now, there is no incremental feature selection method for dynamic interval-valued ordered data. In order to further complete the research in this field, we study the feature selection method with incremental technology on dynamic interval-valued ordered data.

Rough set theory (RST) is a granular computing tool, which is widely used to deal with uncertain and vague information. Interval-valued data is called interval-valued information system (IvIS) in RST. In recent years, some extended rough set models for IvIS have been successively proposed, as shown in Table 1.

Although some of the dominance-based rough set approach (DRSA) models have been extended to IvODS in the above researches, these models cannot describe the preference-ordered relation between objects in IvODS both qualitatively and quantitatively. The fuzzy preference based rough sets model [25], proposed by Hu et al. can make up for this deficiency. Therefore, it is very meaningful to extend this model to IvODS. But this model is not robust, because it does not consider that the boundaries of interval numbers are easily disturbed by noise, then cause the perturbation of the endpoint values. This shortcoming makes the knowledge granule lack of fault tolerance (flexibility), thus providing decision-makers with wrong information, which may eventually lead to wrong decisions. Inspired by this, we introduce the idea of neighborhood into the fuzzy preference based rough sets model, and propose a new model to make the knowledge granule robust, i.e., the FDNRS model of IvODS.

Uncertainty metric is an important research content of RST. In recent years, RST-based uncertainty metrics for interval-valued data have attracted the attention of many scholars. Some representative works are shown in Table 2. However, these metrics do not take into account the preference-ordered relation of between objects in IvODS. For ordered data, Hu et al. proposed rank conditional entropy and fuzzy rank conditional entropy [26], and then they were applied to feature selection [27] and decision trees [28] for monotonic classification tasks. Inspired by this, we introduce a FDNRS based conditional entropy (called fuzzy dominance neighborhood conditional entropy (FDNCE)) to evaluate the consistency degree of the ordering of samples under features and decisions in IvODS. In this study, the FDNCE is used as a feature evaluation index for feature selection in IvODS.

Feature selection is also called attribute reduction in RST. Some RST-based attribute reduction methods have been extended or further improved for interval-valued data, as shown in Table 3. However, the above attribute reduction method has two insufficiencies. On the one hand, these methods do not consider interval-valued data with a preference-ordered relation. On the other hand, for interval-valued data with dynamic characteristics, these methods expose the disadvantage of high time cost. Because these attribute reduction methods must be executed repeatedly when new data arrives or old data is removed, which causes a lot of unnecessary calculations. Therefore, it is very meaningful to study an efficient attribute reduction method that can be applied to data with dynamic interval-valued ordered data.

The feature selection with incremental mechanism can efficiently extract the necessary attributes from dynamic datasets. In recent years, the research on incremental feature selection has attracted the attention of many scholars. Some recent research works are presented in Table 4. Although scholars have done a lot of works on the research of incremental feature selection methods, these existing methods are not suitable for dynamic interval-valued ordered data. This flaw inspires our study.

In this study, we propose incremental feature selection methods based on FDNRS model for dynamic interval-valued ordered datasets with time-evolving objects. The major contributions of this study are as follows.

  • We propose a new rough set model FDNRS for IvODS, and give reasonable explanations of the approximate operators of this model. Moreover, the relevant properties of this model are presented and proved.

  • We define a robust uncertainty metric FDNCE based on FDNRS model, which is used as an uncertainty metric to evaluate the degree of ranking consistency of objects in IvODS. This metric is proven to be non-monotonic, and then is combined with the heuristic feature selection strategy.

  • Based on the above researches, we propose two incremental feature selection algorithms when a group objects are added to or deleted from an IvODS, respectively.

  • Comparison experiments are performed on public datasets, and the results indicate that the robustness of the proposed metric and the effectiveness and efficiency of the proposed incremental algorithms.

The remaining of the paper is organized as follows. Section 2 introduces the related knowledge. In Section 3, the FDNRS model of IvODS is proposed, and its relevant properties are investigated. Section 4 proposes FDNCE and a FDNCE-based heuristic non-monotonic feature selection algorithm for IvODS. In Section 5, two incremental feature selection methods are introduced. The results and analysis of our experiments are reported in Section 6. Finally, Section 7 summarizes the study and outlines the further work.

Section snippets

Preliminaries

In this section, some basic concepts are introduced, which can be found in literatures [4], [54].

Fuzzy dominance neighborhood rough set to IvODS

In this section, we propose a new model to IvODS, called FDNRS model. This model qualitatively and quantitatively considers the preference-ordered relation between objects in IvODS. Not only that, the proposed model also combines the idea of neighborhood to avoid the influence of noise for knowledge. The relevant definitions and properties are introduced as follow.

Conditional entropy based on FDNRS and non-monotonic feature selection in IvODS

As a common uncertainty measure, information entropy is widely used in feature selection tasks [27], [28], [39]. In this section, we first propose a conditional entropy based on FDNRS, called FDNCE, and analyze its monotonicity. Afterwards, we define a non-monotonic reduct search strategy using FDNCE. Finally, we introduce a heuristic feature selection algorithm with the non-monotone reduct search strategy.

Incremental feature selection for dynamic IvODS with the variation of multiple objects

For dynamic IvODS, employing the HFS-IvO algorithm to compute a reduct is very time-consuming, especially in large data. Because this algorithm retrains the changed IvODS as a new one, which needs to recalculate knowledge from scratch. To improve efficiency, this section presents two incremental algorithms for feature selection on the basis of HFS-IvO algorithm.

Experiments and analysis

In this section, we perform a series of experiments to test the robustness of the proposed metric and evaluate the performance of the proposed incremental feature selection algorithms. The configuration of computer used for experiments is as follows. CPU is Intel(R) Core(TM) i7-8700. Clock Speed is 3.20 GHz. Memory is 16.0 GB. Operation System is 64-bit Windows 10. The algorithms are coded in Java and run in Java platform. The code of algorithms can be downloaded from the GitHub homepage.1

Conclusion and future work

In this study, we propose incremental feature selection methods based on FDNRS for dynamic interval-valued ordered data. The main works are as follows: (1) We propose a FDNRS model for IvODS and present its relevant properties. (2) Based on the proposed model, a robust conditional entropy (i.e., FDNCE) is proposed for attribute reduction of IvODS. (3) For dynamically adding objects to or deleting objects from an IvODS, we develop two incremental feature selection algorithms accordingly.

CRediT authorship contribution statement

Binbin Sang: Methodology, Validation, Writing - original draft, Writing - review & editing. Hongmei Chen: Conceptualization, Resources, Visualization, Supervision, Project administration, Funding acquisition. Lei Yang: Formal analysis, Data curation. Tianrui Li: Resources, Supervision, Funding acquisition. Weihua Xu: Resources, Funding acquisition. Chuan Luo: Resources, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61976182, 61572406, 62076171, 61602327, 61876157, 61976245), Sichuan Key R&D project, China (2020YFG0035), Key program for International S&T Cooperation of Sichuan Province, China (2019YFH0097).

References (64)

  • LeungYee et al.

    A rough set approach for the discovery of classification rules in interval-valued information systems

    Internat. J. Approx. Reason.

    (2008)
  • YangXibei et al.

    Dominance-based rough set approach to incomplete interval-valued information system

    Data Knowl. Eng.

    (2009)
  • ZhangHongying et al.

    Variable-precision-dominance-based rough set approach to interval-valued information systems

    Inform. Sci.

    (2013)
  • YangXibei et al.

    α-dominance relation and rough sets in interval-valued information systems

    Inform. Sci.

    (2015)
  • HuQinghua et al.

    Fuzzy preference based rough sets

    Inform. Sci.

    (2010)
  • DaiJianhua et al.

    Uncertainty measurement for interval-valued decision systems based on extended conditional entropy

    Knowl.-Based Syst.

    (2012)
  • DaiJianhua et al.

    Uncertainty measurement for interval-valued information systems

    Inform. Sci.

    (2013)
  • HuangBing et al.

    Information granulation and uncertainty measures in interval-valued intuitionistic fuzzy information systems

    European J. Oper. Res.

    (2013)
  • DaiJianhua et al.

    Uncertainty measurement for incomplete interval-valued information systems based on α-weak similarity

    Knowl.-Based Syst.

    (2017)
  • XieNingxin et al.

    New measures of uncertainty for an interval-valued information system

    Inform. Sci.

    (2019)
  • ZhangXiao et al.

    Multi-confidence rule acquisition and confidence-preserved attribute reduction in interval-valued decision systems

    Internat. J. Approx. Reason.

    (2014)
  • ZengAnping et al.

    A fuzzy rough set approach for incremental feature selection on hybrid information systems

    Fuzzy Sets and Systems

    (2015)
  • LangGuangming et al.

    Incremental approaches for updating reducts in dynamic covering information systems

    Knowl.-Based Syst.

    (2017)
  • DasAsit K. et al.

    A group incremental feature selection for classification using rough set theory based genetic algorithm

    Appl. Soft Comput.

    (2018)
  • WeiWei et al.

    Discernibility matrix based incremental attribute reduction for dynamic data

    Knowl.-Based Syst.

    (2018)
  • CaiMingjie et al.

    Incremental approaches to updating reducts under dynamic covering granularity

    Knowl.-Based Syst.

    (2019)
  • NiPeng et al.

    Incremental feature selection based on fuzzy rough sets

    Inform. Sci.

    (2020)
  • LiuYe et al.

    Discernibility matrix based incremental feature selection on fused decision tables

    Internat. J. Approx. Reason.

    (2020)
  • DuWensheng et al.

    Approximate distribution reducts in inconsistent interval-valued ordered decision tables

    Inform. Sci.

    (2014)
  • ZadehLotfi A.

    Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic

    Fuzzy Sets and Systems

    (1997)
  • ZhangYingying et al.

    Incremental updating of rough approximations in interval-valued information systems under attribute generalization

    Inform. Sci.

    (2016)
  • HuangQianqian et al.

    Dynamic dominance rough set approach for processing composite ordered data

    Knowl.-Based Syst.

    (2020)
  • Cited by (42)

    View all citing articles on Scopus
    View full text