Elsevier

Knowledge-Based Systems

Volume 24, Issue 6, August 2011, Pages 858-867
Knowledge-Based Systems

Neighborhood systems-based rough sets in incomplete information system

https://doi.org/10.1016/j.knosys.2011.03.007Get rights and content

Abstract

Neighborhood system formalized the ancient intuition, infinitesimals, which led to the invention of calculus, topology and non-standard analysis. In this paper, the neighborhood system is researched from the view point of knowledge engineering and then each neighborhood is considered as a basic unit with knowledge. By using these knowledge in neighborhood system, the rough approximations and the corresponding properties are discussed. It is shown that in the incomplete information system, the smaller upper approximations can be obtained by neighborhood system based rough sets than by the methods in [Y. Leung, D.Y. Li, Maximal consistent block technique for rule acquisition in incomplete information systems, Information Sciences 115 (2003) 85–106] and [Y. Leung, W.Z. Wu, W.X. Zhang, Knowledge acquisition in incomplete information systems: a rough set approach, European Journal of Operational Research 168 (2006) 164–180]. Furthermore, a new knowledge operation is discussed in the neighborhood system, from which more knowledge can be derived from the initial neighborhood system. By such operations, the regions of lower and upper approximations are further expanded and narrowed, respectively. Some numerical examples are employed to substantiate the conceptual arguments.

Introduction

Rough set theory [18], [19], [20], [21], proposed by Pawlak, is an important mathematical tool, which can be used to deal with vague and uncertain information. The key notions of the rough set theory, i.e. lower and upper operations, were firstly constructed on the basis of an indiscernibility relation (equivalence relation, i.e. reflexive, symmetric and transitive). Therefore, the classical rough set model has been demonstrated to be useful in the discovering of decision rules from the complete information system.

Gore, in his influential book Earth in the Balance [5], notes that “We must acknowledge that we never have complete information. Yet we have to make decisions anyway …”. This quote illustrates not only the difficulty of making decisions about environmental issues, but also the fact that making such decisions with partial information is ultimately inevitable [1]. Therefore, how to employ the rough set technique to deal with the incomplete information systems plays a very important role in the development of rough set theory. In most of the rough set literatures, an incomplete information system indicates a system with unknown values. For an incomplete information system, unknown values may have two semantic explanations [2], [3]: in the first case, all unknown values are “do not care” conditions; in the second case, all unknown values are lost. The “do not care” unknown value is the “everything is possible” value. Such an interpretation of the unknown value is corresponding to the idea that the unknown value is just “missing”, but it does exist. On the other hand, if the unknown value is regarded as lost, then we think that objects may be described incompletely not only because of our imperfect knowledge, but also because it may be definitely impossible to describe them with all attributes [29], it follows that lost unknown value is a non-existing one and it is not comparable with any other values in the domain of the corresponding attributes [27], [28], [30], [32].

In this paper, all known values are considered as “do not care” in the incomplete information system. For such assumption of the unknown values, many important rough set results have been obtained. For example, the first attempt to study “do not care” conditions using rough set theory was presented in Ref. [4], where a method for rule induction was introduced in which unknown values were replaced by all values in the domain of the corresponding attributes. Following such work, Kryszkiewicz [8] proposed her tolerance relation in the incomplete information system. By considering the preference-ordered domains of the attributes in the incomplete information system, Shao and Zhang [26] proposed an expanded dominance relation and the corresponding dominance-based rough approximations, Yang et al. [31], [35] investigated the approach to optimal rules’ acquisition. By introducing the basic concept of discrete mathematics into incomplete information system, Leung and Li [9] proposed the maximal consistent block based rough approximation. Guan et al. further introduced the maximal consistent block into set-valued and continuous valued systems in Refs. [6], [7], respectively. Qian et al. [22] introduced the approximate distribution reducts into incomplete information system in terms of the maximal consistent block based rough approximation. Moreover, Leung et al. also proposed the descriptors based rough approximation in the incomplete information system in Ref. [10].

The main purpose of this paper is to provide a rough set approach to the incomplete information system from the viewpoint of the neighborhood system [11], [12], [13], [33], [37], [38] theory. The concept of neighborhood system is a pre-Granular Computing (GrC) [14], [15], [16], [17] concept, it is also the first model for GrC. Mathematically, neighborhood system is a model that formalizes an ancient intuition, infinitesimal granules, which led to the invention of calculus, topology and non-standard analysis. Roughly speaking, a neighborhood system assigns each object a (possibly finite or infinite) family of subsets. Each subset is referred to as a neighborhood, which can be used to represent the semantics of “near” [12], [33]. Obviously, the classification analysis in the rough set theory can be included into neighborhood system theory because for each object in the system, one or more classes are related to such object. For instance, in Pawlak’s rough set, each object has an equivalence class; in covering approximation space, each object belongs to at least one block of the covering.

Obviously, given an incomplete information system, all maximal consistent blocks can induce a covering instead of a partition on the universe of discourse since the tolerance relation used in maximal consistent block is only reflexive and symmetric. Moreover, all support sets of the descriptors can also induce a covering on the universe of discourse. Therefore, two different coverings are generated by two different approaches. No matter what covering is selected, we can form a neighborhood system. For each object in the universe, its neighborhood systems is the collections of those blocks, which contain such object. The collection of neighborhood systems of all objects forms a neighborhood system on the universe. This is why we can re-investigate the incomplete information system from the viewpoint of the neighborhood system.

Formal GrC model has three semantic views: “Knowledge Engineering”, “Uncertainty Theory” and “How to solve/computing it” [15]. Though neighborhood system was motivated from uncertainty, we will view it from the prospective of knowledge engineering in this paper. Thus, each neighborhood (element of neighborhood system) is regarded as a set of data that carries a unit of basic knowledge. By using the knowledge engineering view, Lin [15] has proposed a new knowledge operation on the neighborhood system. In this paper, some mathematical properties of such new operations are discussed, from which new knowledge can be derived.

To facilitate our discussion, we first present some basic concepts related to incomplete information system such as maximal consistent block technique and descriptors in Section 2. Since covering is a special form of the neighborhood system, then in Section 3, we show how the coverings induced by maximal consistent blocks and support sets of the descriptors can be transformed into the form of neighborhood systems. Moreover, we also use these two neighborhood systems to construct new rough sets in the incomplete information system. The relationships between these new rough sets and the previous rough sets in terms of maximal consistent blocks and the descriptors are then examined. In Section 4, Lin’s knowledge operation is further explored. An immediate result of the knowledge operation is new knowledge (neighborhoods) are obtained. If we add these new knowledge into the original neighborhood systems, then we can generate the expanded neighborhood systems. By comparing with the initial neighborhood system based rough approximations, bigger lower approximation and smaller upper approximation are obtained in the expanded neighborhood systems, it follows that the expanded neighborhood system based rough set is better than the initial neighborhood system based rough set in dealing with the incomplete information systems. We then conclude the paper with a summary and outlook for further research in Section 5.

Section snippets

Pawlak’s rough set

Formally, an information system can be considered as a quadruple I = U, AT, V, f〉 where

  • U is a non-empty finite set of objects, it is called the universe;

  • AT is a non-empty finite set of condition attributes, ∀a  AT, Va is the domain of the attribute a;

  • V is the domain of all attributes such that V = VAT = aATVa;

  • f is an information function where f(x, a)  Va for each x  U and each a  AT.

For an information system I, one can describe relationship between objects through their condition attributes values. With

From GrC to neighborhood system

Neighborhood system theory can be derived from the theory of Granular Computing (GrC). In GrC theory, Lin has presented nine formal GrC models. In the following, we only present the fourth GrC model for the introducing of neighborhood system.

Definition 6

[16]

Fourth GrC model:

  • 1.

    U = {x1, x2 , …} is the universes of discourse;

  • 2.

    U × U is a Cartesian product on U;

  • 3.

    A binary relation R is a subset such that R   U × U;

  • 4.

    β = {R1, R2 , …} is a family of binary relations;

then the pair (U, β), called the Binary Granular Data Model, is a formal

Knowledge operation and neighborhood system based rough set

In Lin’s Granular Computing theory, formal GrC model has three semantic views: “Knowledge Engineering”, “Uncertainty Theory” and “How to solve/computing it” [15]. Though neighborhood system was motivated from uncertainty, we will view it from the prospective of knowledge engineering in this paper. Thus, each neighborhood (element of neighborhood system) is regarded as a set of data that carries a unit of basic knowledge.

Since we take the knowledge engineering view to consider neighborhood

Conclusions

In this paper, we have developed a general framework for the study of neighborhood system based rough sets in the incomplete information system. The main results are:

  • 1.

    Firstly, by using the coverings induced by maximal consistent blocks and support sets of descriptors, two forms of the neighborhood systems based rough set models are constructed. By comparing with Leung’s maximal consistent blocks and descriptors based rough sets, we can obtain the smaller upper approximations if the neighborhood

Acknowledgment

This work is supported by the Natural Science Foundation of China (No.60632050) and Postdoctoral Science Foundation of China (No.20100481149).

References (38)

  • Y.H. Qian et al.

    Approximation reduction in inconsistent incomplete decision tables

    Knowledge-based Systems

    (2010)
  • X.B. Yang et al.

    Dominance-based rough set approach and knowledge reductions in incomplete ordered information system

    Information Sciences

    (2008)
  • X.B. Yang et al.

    Credible rules in incomplete decision system based on descriptors

    Knowledge-Based Systems

    (2009)
  • X.B. Yang et al.

    Dominance-based rough set approach to incomplete interval-valued information system

    Data & Knowledge Engineering

    (2009)
  • W.Z. Wu et al.

    Neighborhood operator systems and approximations

    Information Sciences

    (2002)
  • Y.Y. Yao

    Neighborhood systems and approximate retrieval

    Information Sciences

    (2006)
  • J.W. Grzymala-Busse

    Characteristic relations for incomplete data: a generalization of the indiscernibility relation

  • J.W. Grzymala-Busse

    Data with missing attribute values: generalization of indiscernibility relation and rule induction

  • J.W. Grzymala-Busse

    On the unknown attribute values in learning from examples

  • Cited by (107)

    View all citing articles on Scopus
    View full text