Elsevier

Knowledge-Based Systems

Volume 129, 1 August 2017, Pages 4-16
Knowledge-Based Systems

Stepwise optimal scale selection for multi-scale decision tables via attribute significance

https://doi.org/10.1016/j.knosys.2017.04.005Get rights and content

Abstract

Hierarchically structured data are very common or even unavoidable for data mining and knowledge discovering from the perspective of granular computing in real-life world. Based on this circumstance, multi-scale information system is introduced by Wu and Leung and extends the theory and application of information system. In such table, objects may take different values under the same attribute measured at different scales. Recently, scale selection is the main issue of multi-scale information system, and optimal scale selection is to choose a proper decision table for final decision making or classification. In this paper, we firstly propose the concept of multi-scale attribute significance, and, in the sense of binary classification, another two equivalent definitions are given. Then based on the concept of significance, this paper introduces a novel approach of stepwise optimal scale selection to obtain one optimal scale combination with less time cost compared with the lattice model. Specially, for inconsistent multi-scale decision tables, different types of consistence are considered with different requirements for optimal scale selection. Finally, five algorithms are designed and six numerical experiments are employed to illustrate the feasibility and efficiency of the proposed model.

Introduction

Granular computing (GrC), which derives from the topic of fuzzy information granulation firstly proposed by Zadeh in 1979 [57], [58], is employed as a powerful tool for complex problem solving, massive data mining and fuzzy information processing. Several decades have witnessed the rapid development of GrC [1], [2], [16], [19], [23], [24], [27], [33], [34], [43], [45], [47], [48], [49], [51], [55], [59], [60]. As a primitive notion, granule is a clump of objects drawn together by the criteria of indistinguishability, similarity or functionality [58]. Satisfying a given specification, these elements within a granule are considered as a whole rather than individuals. Therefore, with respect to a particular level of granularity, a universe can be represented by a set of granules. This process is called information granulation, which provides an effective approach to solve a complex problem at a certain level of granulation. Partition model proposed by Yao [52], severing as a significant and commonly used model for GrC, is constructed by granulating a finite universe of discourse through a family of pairwise disjoint subsets under an equivalence relation. Furthermore, Bittner and Stell [3], Yao [49], Wu and Leung [41] have studied the multiple granulation hierarchies. Recently, Xu et al. [46] and Hu et al. [13] studied information fusion and machine learning from the viewpoint of GrC, respectively.

Rough set theory (RST) originally proposed by Pawlak [28] has played a vital role in the extension and development of GrC. As a powerful tool of soft computing, it is able to perform well in construction, interpretation and representation of granules in a universe by an equivalence relation, and provide us with more precise concept to define and analyze notions of GrC. From the view of GrC, equivalence granules can be obtained in Pawlak approximation space based on an equivalence relation, and are the basic components for representing and approximating in Pawlak approximation space.

Some extensions of RST about acquisition of knowledge from information table via an objective knowledge induction process have been successively proposed, such as probabilistic rough set [38], [39], [40], [50], [53], [54], dominance-based rough set [4], [6], [7], [20], [36], multigranulation rough set [10], [11], [12], [14], [21], [25], [26], [30], [31], [32], [56], etc. In these literatures, the information table characterized by only taking on one value for each object at each attribute is called a single-scale information table (SSIT). In a SSIT, an equivalence relation is determined by a subset of attributes and can granulate the universe of discourse into equivalence granules. And the inclusion relation between subsets of attributes implies a coarse or fine relation for granules, which induces a multi-layered granulation structure on the universe. However, objects are usually measured at different scales under the same attribute [17]. For example, many values can be taken under the same attribute for the same object when it is measured at different scales, which induces hierarchical structure for knowledge acquisition. In order to deal with such problem, Wu and Leung [41] proposed a special information table called multi-scale information table (MSIT), in which data are represented by different scales at different levels of granulations having a granular information transformation from a finer to a coarser labelled partition [42].

A simple example is that a point of 40°44′54.74′′N, 73°59′10.88′′W can be located at Northern Hemisphere or Western hemisphere in a coarser granule, or at United State in a coarse granule, or at New York in a fine granule, or at Manhattan in a finer granule; in fact, it is the approximate geographic coordinate of the Empire State Building. For a given subset of attributes, two different scale combinations may induce a kind of granules being either a refinement or a coarsening of the others. Hierarchically structured data usually contain a lot of useful information about objects that we are interested in, but they also mean redundancy. Therefore, the key idea of MSIT is subsystems constitution or SSITs extraction in terms of attributes restricted on their own some scales, then some operations of attribute reduction and knowledge acquirement will be implemented in a proper SSIT that we select as some rule. Thus a proper decision table decomposing from MSIT is an important issue in processing multi-scale information table. In fact, finer granules mean more cost, but coarse granules may fail to capture some useful information, thus an appropriate level of granulation should be selected to approximate subsets of the universe of discourse. As an extreme, if a decision table is obtained that all attributes are selected on their finest scales, these equivalence granules are the finest and are able to capture the most information in knowledge acquirement. However, this may cause many economic losses, because promotion of accuracy of the measurement means more cost, and in fact we can also get the same classification effect with some attributes not necessarily in the finest scales. As another extreme, if a decision table is obtained that all attributes are selected on their coarsest scales, these equivalence granules are the coarsest and will perform badly in decision making. Thus optimal scale selection is a critical issue in processing MSIT. If S is an MSIT, then the basic process for S is extracted as follows.

  • Based on the levels of scales of attributes, S is decomposed into many SSITs through each attribute on its some scale.

  • According to given rules, an appropriate decision table is selected for decision making among these SSITs.

  • In the selected subsystem, the operations of attribute reduction and rules extraction can be done.

In an MSIT, for any attribute, someone may take a refining attribute value as a revision of a coarsening attribute value, then an MSIT can be treated as a dynamic updating information system over time, and some incremental learning algorithms can be used for attribute reduction [15], [37]. However, in this case, the original static hierarchically structured information in MSIT will be replaced by dynamic monotonous information, in other words, hierarchical information will be lost, so this case is not the focus of the paper. And in this paper, we only consider static approach for scale selection from the view of hierarchical structure, and the dynamic case will be considered in future work.

Wu and Leung assumed that all attributes have the same number of levels of scales and studied the optimal scale selection for the special multi-scale decision tables (MSDTs) [41], [44]. On the same assumption, Gu et al. [8], [9] and She et al. [35] studied the knowledge acquisition and rule induction in MSDTs. Furthermore, Li and Hu extended the theory and application of MSDT of diverse attributes with different numbers of levels of scales [18]. In succession, Wu–Leung model, complement model and lattice model proposed in [18], [42] have good performances to solve the problem of optimal scale selection. In fact, Wu–Leung model [42] mainly study optimal scale selection for multi-scale decision tables on the previous assumption from the view of the standard rough set model and a dual probabilistic rough set model, while complement model and lattice model [18] study the general case, and the differences between them have been discussed in [18]. In this paper, we always study the general case based on literature [18]. In particular, lattice model is able to successfully find all the optimal scale combinations from all combinations, but it is time-consuming (e.g. see Table 7), so a faster approach is urgently expected to propose. Besides, another problem troubles us, as a special information table, how to define the concept of multi-scale attribute significance for an MSDT, since attribute significance is an important property in processing decision table. Motivated by these, in this paper, we extend the attribute significance in SSDTs to multi-scale attribute significance in MSDTs, and give another two equivalent definitions in the sense of binary classification. Furthermore, based on the notion of multi-scale attribute significance, we propose a novel approach of stepwise optimal scale selection to get one optimal scale combination in MSDTs. Compared with lattice model in [18], the new approach has a great advantage with less time cost (e.g. see Table 7). Finally, real-life experiments are employed to illustrate its feasibility and efficiency.

The remainder parts of the paper are organized as follows. In Section 2, several basic notions of Pawlak’s rough set, information system are reviewed. In Section 3, some concepts of multi-scale information tables and multi-scale attribute significance are introduced. In Section 4, the concept of stepwise optimal selection for consistent and inconsistent MSDTs and another two equivalent definitions of multi-scale attribute significance are proposed. Five algorithms of computing stepwise optimal scale selection are given in Section 5 and six real-life experiments are employed to test these algorithms in Section 6. Finally, we conclude the paper with a summary and outlook of further work in Section 7.

Section snippets

Preliminaries

In this section, we review several basic concepts and introduce some notions about Pawlak’s rough set and information table.

Related to multi-scale information system

In this section, we review some concepts of multi-scale information. Furthermore, we firstly propose the notion of multi-scale attribute significance.

Stepwise optimal scale selection in multi-scale decision table

In this section, we briefly recall the lattice model for computing all optimal scale combinations in an MSDT. Besides, we firstly propose stepwise optimal scale selection for faster computing one optimal scale combination in a consistent multi-scale decision table and in an inconsistent multi-scale decision table.

Algorithms for stepwise optimal scale selection

In this section, we propose some algorithms to compute the stepwise optimal scale combination based on attributes significance in consistent and inconsistent decision tables.

First, Algorithm 1 is used to compute the positive region for a given single-scale decision table. Be similar to Algorithm 1 in [18], the complexity of Algorithm 1 is O(|U|2). Algorithm 2 is designed to calculate the ascendant sort of significances of attributes in a given multi-scale decision table.

Next, we analyze the

Case study

In order to describe the mechanisms of step optimal scale selection more clearly, Example 6.1 is employed to illustrate the detailed processing of the proposed algorithm.

Example 6.1

Find one optimal scale combination presented in Table 6 by Algorithm 5. From the table, we know that the multi-scale decision table S=(U,C{d}) is inconsistent, since x4 and x6 are indistinguishable w.r.t. RC but d(x4) ≠ d(x6), where U={x1,x2,,x20},C={a1,a2,a3,a4}.

  • Compute the ascendant sort τ of attributes significances.

    In S=

Conclusions

Multi-scale information system services as an extension of information system, where varying values can be taken under the same attribute for each object at different levels of scales. Based on the concept of attribute significance in single-scale decision table, multi-scale attribute significance in multi-scale decision table is proposed. At some extent, another two equivalent definitions based on lower approximation distribution and upper approximation distribution in inconsistent multi-scale

Acknowledgements

The authors thank the editors and anonymous reviewers for their most valuable comments and suggestions in improving this paper. This research was supported by the National Natural Science Foundation of China (Grant Nos. 11571010, 61179038).

References (60)

  • S. Li et al.

    Incremental update of approximations in dominance-based rough sets approach under the variation of attribute values

    Inf. Sci.

    (2015)
  • M. Lichman

    UCI machine learning repository

    (2013)
  • T.Y. Lin et al.

    Data Mining, Rough Sets and Granular Computing

    (2002)
  • Z. Pawlak

    Rough sets

    Int. J. Comput. Inf. Sci.

    (1982)
  • S. Salehi et al.

    Systematic mapping study on granular computing

    Knowl. Based Syst.

    (2015)
  • Y. She et al.

    A local approach to rule induction in multi-scale decision tables

    Knowl. Based Syst.

    (2015)
  • S. Wang et al.

    Efficient updating rough approximations with multi-dimensional variation of ordered data

    Inf. Sci.

    (2016)
  • L.-L. Wei et al.

    Probabilistic rough sets characterized by fuzzy sets

  • S. Wong et al.

    Comparison of the probabilistic approximate classification and the fuzzy set model

    Fuzzy Sets Syst.

    (1987)
  • W.-Z. Wu

    Upper and lower probabilities of fuzzy events induced by a fuzzy set-valued mapping

  • W.Z. Wu et al.

    Theory and applications of granular labelled partitions in multi-scale decision tables

    Inf. Sci.

    (2011)
  • W.-Z. Wu et al.

    Granular computing and knowledge reduction in formal contexts

    IEEE Trans. Knowl. Data Eng.

    (2008)
  • W. Xu et al.

    A novel cognitive system model and approach to transformation of information granules

    Int. J. Approximate Reasoning

    (2014)
  • W. Xu et al.

    A novel approach to information fusion in multi-source datasets: a granular computing viewpoint

    Inf. Sci.

    (2017)
  • Y.Y. Yao

    Rough sets, neighborhood systems and granular computing

    Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering

    (1999)
  • Y.Y. Yao

    Stratfied rough sets and granular computing

  • Y.Y. Yao

    Probabilistic approaches to rough sets

    Expert Syst.

    (2003)
  • L.A. Zadeh, Fuzzy sets and information granularity, 1979, in: N. Gupta, R. Ragade, R. Yager, (eds.) Advances in Fuzzy...
  • X. Zhang et al.

    Quantitative information architecture, granular computing and rough set models in the double-quantitative approximation space of precision and grade

    Inf. Sci.

    (2014)
  • D. Zhou et al.

    Combining granular computing and RBF neural network for process planning of part features

    Int. J.Manuf. Technol.

    (2015)
  • Cited by (76)

    View all citing articles on Scopus
    View full text