Elsevier

Fuzzy Sets and Systems

Volume 368, 1 August 2019, Pages 1-19
Fuzzy Sets and Systems

Incremental feature weighting for fuzzy feature selection

https://doi.org/10.1016/j.fss.2018.10.021Get rights and content

Abstract

Feature selection presents many challenges and difficulties during online learning. In this study, we focus on fuzzy feature selection for fuzzy data stream. We present a novel incremental feature weighting method with two main phases comprising offline fuzzy feature selection and online fuzzy feature selection. A sliding window is used to divide the fuzzy data set. Each fuzzy input feature is assigned a weight from [0,1] according to the mutual information shared between the input features and the output feature. These weights are employed to access the candidate fuzzy feature subsets in the current window and based on these subsets, the offline fuzzy features selection algorithm is applied to obtain the fuzzy feature subsets by combining the backward feature selection method with the fuzzy feature selection index in the first sliding window. The online feature selection algorithm is performed in each of the new sliding windows. The feature subset in the current window is updated by combining the fuzzy feature selection results from the previous sliding window with the current candidate fuzzy feature set according to the importance level of the fuzzy input feature. Finally, the evolving relationships of the fuzzy input features are found using the fuzzy feature weight between the sliding windows. Simulation results showed that the proposed algorithm obtains significantly improved adaptability and prediction accuracy compared with existing algorithms.

Introduction

The online monitoring of high-dimensional dynamical data has become increasingly important for observing systems in a wide range of advanced applications, such as telephone records, large sets of web pages, multimedia data, and sets of retail chain transactions [1]. Due to the complexity of dynamical systems, incremental learning is essential for knowledge discovery, which is an important aspect of human intelligence. Many applications need to process data in a sequence, which has led to the generation of a wide range of incremental learning methods [2], [3]. In particular, two types of situations must be addressed. In the first situation, the data set cannot be collected in one pass and batch computing cannot be conducted, e.g., online applications [4] or interaction queries [5]. In the second situation, the data set is excessively large and it cannot be calculated in one pass due to limitations in terms of the computation capability and the memory size. Thus, the data set must be divided into segments and added successively. The relationships between the newly data added and the stored data structure must then be analyzed in order to acquire new knowledge. It is difficult to apply an incremental feature selection process to compensate for different operational modes because the importance level of features can change dynamically during the overall learning process. For example, specific features may be much more important initially than later in the process. Therefore, many incremental feature selection algorithms have been proposed for synchronously updating the classifiers [6], [7], [8], [10]. In [6], the incrementally selected features for representing the information in the overall data set are evaluated using classifiers, which increases the computational cost of the feature selection process. In [7], a wrapper feature selection algorithm with an incremental Bayesian classifier (WFSIBC) was proposed where a greedy algorithm is used to find the optimal feature subset, which increases the time complexity and space complexity. The incremental feature neighborhood rough set (IFNRS) algorithm was presented by [8] to select the features incrementally but without considering constraints on discrete data. An incremental feature selection algorithm based on information granularity was proposed by [9] where the non-matrix structure improves the efficiency of the algorithm. Other incremental feature selection algorithms such as those presented by [10], [11], [12], [13] can be employed for regression prediction. In the method described by [10], the useful features are selected incrementally by combining clustering algorithms with four new feature quality indices but the number of clusters needs to be predefined. A genetic algorithm and the K-means algorithm are used to determine the objective function and the optimal range, respectively, in the method proposed by [11] but the features subset is not globally optimal. An online unsupervised multi-view feature selection (OMVFS) algorithm was presented by [12], where feature selection is embedded directly in a clustering algorithm that retains the local information structure, but the parameters need to be set for the clustering algorithm, which leads to poor adaptability. The feature selection based on data streams (FSDS) algorithm was proposed by [13] for determining the importance level of features according to the regression results based on approximate data, but it has no independent evaluation index and it is mainly employed for updating the regression model.

It is difficult to apply an incremental feature selection process to a fuzzy feature space with fuzzy linguistic terms in order to improve the interpretability of the system. A significant problem when learning fuzzy features from data is the so-called curse of dimensionality, where the fuzzy feature space has higher dimensionality than the original feature space, which increases the complexity of the system. This problem was addressed by [14] where the fuzzy features were selected by evaluating the importance level of each fuzzy feature according to the minimum–maximum learning rule and the extended matrix built based on the fuzzy data, but this algorithm is unsuitable for dynamic systems. A local fuzzy feature selection strategy based on switching to a neighboring model was proposed by [15] for capturing the local dynamics, but is not possible to inactivate and reactivate the input variables in different stages. An online incremental feature weighting method was presented by [16] for updating the classifiers according to a separability criterion, where the weight was calculated for each feature and compared with all of the features. The most important feature received a weight of 1 and the others were weighted between 0 and 1. The weights of all the features were then used directly in the model for updating and inference, but this method may lead automatically to the curse of dimension reduction during rule evolution. Soft feature selection using a feature weighting method was presented by [17] based on a classifier, where the Fisher separability criterion was applied in the feature space to select suitable features, although it was more convenient to apply in the empirical feature space. An adaptive incremental fuzzy feature selection method using a neo-fuzzy neural network was presented by [18], which selects the model input and updates the network weight simultaneously, although this method leads to redundant features because the dependencies between the features are not analyzed. As described above, the inclusion of feature weights discriminates features that are more important and less important, thereby providing additional information to users and experts by giving insights into the feature selection process and its interpretability. However, in order to implement feature selection, these algorithms depend on a classifier or regression model, which increases the complexity of the model selection procedure.

Section snippets

Proposed approach

In this study, we propose an incremental feature weighting for fuzzy feature selection (IWFFS) algorithm with two main phases: offline fuzzy feature selection and online fuzzy feature selection. First, the fuzzy data are partitioned by sliding windows. In the first sliding window, the offline fuzzy feature selection algorithm is applied, where the weight of each fuzzy input feature is calculated according to the mutual information between the fuzzy input features and the output feature, and

Basic concept

The incremental feature weighting concept is an attempt to reduce the curse of dimensionality. This method is generally used for incremental feature selection by assigning continuous weights in the range of [0,1], which can be treated as the importance levels of the features in dynamic systems, instead of using crisp weights from {0,1}. When the feature weights approach 0, their importance level is relatively low compared with others that have values near 1. The main advantage of feature

IWFFS framework

To enhance the effectiveness of incremental fuzzy feature selection, reduce the complexity, and identify the evolving relationships for each fuzzy input feature, the proposed framework for the IWFFS is shown in Fig. 1. First, the fuzzy data are partitioned using sliding windows, where offline fuzzy feature selection is used in the first sliding window and online fuzzy feature selection is applied in each of the sliding windows successively. During the offline fuzzy feature selection process,

Experimental setup

Several experiments were conducted to test the proposed dynamic fuzzy features selection method. All of the data sets were downloaded from the UCI Machine Learning Repository [26]. The statistics for the data sets used in our experiments are shown in Table 1, which we employed to conduct all of our experiments and analysis. Table 1 shows that some of the data sets had continuous features but the classes were discrete, such as the classification data sets comprising SD, BCH, Glass, Liver, Wine,

Conclusion

In this study, we proposed the IWFFS algorithm, which mainly evaluates the importance levels of the fuzzy features according to the weights of the fuzzy features in the current window. In the offline fuzzy features selection stage, the optimal fuzzy input features subset is obtained by the backward features selection algorithm and fuzzy features selection index. In the online fuzzy features selection stage, based on the candidate fuzzy features subset in the current sliding window and the

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant No. 61572073), National Key R&D Program of China (No. 2017YFB0306403).

References (31)

  • I.S. Bilal et al.

    Diversification of fuzzy association rules to improve prediction accuracy

  • S.S. Naqvi et al.

    Feature quality-based dynamic feature selection for improving salient object detection

    IEEE Trans. Image Process.

    (2016)
  • W. Zhao et al.

    A dynamic feature selection method based on combination of GA with K-means

  • W. Shao et al.

    Online unsupervised multi-view feature selection

  • H. Huang et al.

    Unsupervised feature selection on data streams

  • Cited by (0)

    View full text