Neighborhood systems-based rough sets in incomplete information system
Introduction
Rough set theory [18], [19], [20], [21], proposed by Pawlak, is an important mathematical tool, which can be used to deal with vague and uncertain information. The key notions of the rough set theory, i.e. lower and upper operations, were firstly constructed on the basis of an indiscernibility relation (equivalence relation, i.e. reflexive, symmetric and transitive). Therefore, the classical rough set model has been demonstrated to be useful in the discovering of decision rules from the complete information system.
Gore, in his influential book Earth in the Balance [5], notes that “We must acknowledge that we never have complete information. Yet we have to make decisions anyway …”. This quote illustrates not only the difficulty of making decisions about environmental issues, but also the fact that making such decisions with partial information is ultimately inevitable [1]. Therefore, how to employ the rough set technique to deal with the incomplete information systems plays a very important role in the development of rough set theory. In most of the rough set literatures, an incomplete information system indicates a system with unknown values. For an incomplete information system, unknown values may have two semantic explanations [2], [3]: in the first case, all unknown values are “do not care” conditions; in the second case, all unknown values are lost. The “do not care” unknown value is the “everything is possible” value. Such an interpretation of the unknown value is corresponding to the idea that the unknown value is just “missing”, but it does exist. On the other hand, if the unknown value is regarded as lost, then we think that objects may be described incompletely not only because of our imperfect knowledge, but also because it may be definitely impossible to describe them with all attributes [29], it follows that lost unknown value is a non-existing one and it is not comparable with any other values in the domain of the corresponding attributes [27], [28], [30], [32].
In this paper, all known values are considered as “do not care” in the incomplete information system. For such assumption of the unknown values, many important rough set results have been obtained. For example, the first attempt to study “do not care” conditions using rough set theory was presented in Ref. [4], where a method for rule induction was introduced in which unknown values were replaced by all values in the domain of the corresponding attributes. Following such work, Kryszkiewicz [8] proposed her tolerance relation in the incomplete information system. By considering the preference-ordered domains of the attributes in the incomplete information system, Shao and Zhang [26] proposed an expanded dominance relation and the corresponding dominance-based rough approximations, Yang et al. [31], [35] investigated the approach to optimal rules’ acquisition. By introducing the basic concept of discrete mathematics into incomplete information system, Leung and Li [9] proposed the maximal consistent block based rough approximation. Guan et al. further introduced the maximal consistent block into set-valued and continuous valued systems in Refs. [6], [7], respectively. Qian et al. [22] introduced the approximate distribution reducts into incomplete information system in terms of the maximal consistent block based rough approximation. Moreover, Leung et al. also proposed the descriptors based rough approximation in the incomplete information system in Ref. [10].
The main purpose of this paper is to provide a rough set approach to the incomplete information system from the viewpoint of the neighborhood system [11], [12], [13], [33], [37], [38] theory. The concept of neighborhood system is a pre-Granular Computing (GrC) [14], [15], [16], [17] concept, it is also the first model for GrC. Mathematically, neighborhood system is a model that formalizes an ancient intuition, infinitesimal granules, which led to the invention of calculus, topology and non-standard analysis. Roughly speaking, a neighborhood system assigns each object a (possibly finite or infinite) family of subsets. Each subset is referred to as a neighborhood, which can be used to represent the semantics of “near” [12], [33]. Obviously, the classification analysis in the rough set theory can be included into neighborhood system theory because for each object in the system, one or more classes are related to such object. For instance, in Pawlak’s rough set, each object has an equivalence class; in covering approximation space, each object belongs to at least one block of the covering.
Obviously, given an incomplete information system, all maximal consistent blocks can induce a covering instead of a partition on the universe of discourse since the tolerance relation used in maximal consistent block is only reflexive and symmetric. Moreover, all support sets of the descriptors can also induce a covering on the universe of discourse. Therefore, two different coverings are generated by two different approaches. No matter what covering is selected, we can form a neighborhood system. For each object in the universe, its neighborhood systems is the collections of those blocks, which contain such object. The collection of neighborhood systems of all objects forms a neighborhood system on the universe. This is why we can re-investigate the incomplete information system from the viewpoint of the neighborhood system.
Formal GrC model has three semantic views: “Knowledge Engineering”, “Uncertainty Theory” and “How to solve/computing it” [15]. Though neighborhood system was motivated from uncertainty, we will view it from the prospective of knowledge engineering in this paper. Thus, each neighborhood (element of neighborhood system) is regarded as a set of data that carries a unit of basic knowledge. By using the knowledge engineering view, Lin [15] has proposed a new knowledge operation on the neighborhood system. In this paper, some mathematical properties of such new operations are discussed, from which new knowledge can be derived.
To facilitate our discussion, we first present some basic concepts related to incomplete information system such as maximal consistent block technique and descriptors in Section 2. Since covering is a special form of the neighborhood system, then in Section 3, we show how the coverings induced by maximal consistent blocks and support sets of the descriptors can be transformed into the form of neighborhood systems. Moreover, we also use these two neighborhood systems to construct new rough sets in the incomplete information system. The relationships between these new rough sets and the previous rough sets in terms of maximal consistent blocks and the descriptors are then examined. In Section 4, Lin’s knowledge operation is further explored. An immediate result of the knowledge operation is new knowledge (neighborhoods) are obtained. If we add these new knowledge into the original neighborhood systems, then we can generate the expanded neighborhood systems. By comparing with the initial neighborhood system based rough approximations, bigger lower approximation and smaller upper approximation are obtained in the expanded neighborhood systems, it follows that the expanded neighborhood system based rough set is better than the initial neighborhood system based rough set in dealing with the incomplete information systems. We then conclude the paper with a summary and outlook for further research in Section 5.
Section snippets
Pawlak’s rough set
Formally, an information system can be considered as a quadruple I = 〈U, AT, V, f〉 where
- •
U is a non-empty finite set of objects, it is called the universe;
- •
AT is a non-empty finite set of condition attributes, ∀a ∈ AT, Va is the domain of the attribute a;
- •
V is the domain of all attributes such that V = VAT = ⋃a∈ATVa;
- •
f is an information function where f(x, a) ∈ Va for each x ∈ U and each a ∈ AT.
For an information system I, one can describe relationship between objects through their condition attributes values. With
From GrC to neighborhood system
Neighborhood system theory can be derived from the theory of Granular Computing (GrC). In GrC theory, Lin has presented nine formal GrC models. In the following, we only present the fourth GrC model for the introducing of neighborhood system. Definition 6 Fourth GrC model: U = {x1, x2 , …} is the universes of discourse; U × U is a Cartesian product on U; A binary relation R is a subset such that R ⊆ U × U; β = {R1, R2 , …} is a family of binary relations;[16]
then the pair (U, β), called the Binary Granular Data Model, is a formal
Knowledge operation and neighborhood system based rough set
In Lin’s Granular Computing theory, formal GrC model has three semantic views: “Knowledge Engineering”, “Uncertainty Theory” and “How to solve/computing it” [15]. Though neighborhood system was motivated from uncertainty, we will view it from the prospective of knowledge engineering in this paper. Thus, each neighborhood (element of neighborhood system) is regarded as a set of data that carries a unit of basic knowledge.
Since we take the knowledge engineering view to consider neighborhood
Conclusions
In this paper, we have developed a general framework for the study of neighborhood system based rough sets in the incomplete information system. The main results are:
- 1.
Firstly, by using the coverings induced by maximal consistent blocks and support sets of descriptors, two forms of the neighborhood systems based rough set models are constructed. By comparing with Leung’s maximal consistent blocks and descriptors based rough sets, we can obtain the smaller upper approximations if the neighborhood
Acknowledgment
This work is supported by the Natural Science Foundation of China (No.60632050) and Postdoctoral Science Foundation of China (No.20100481149).
References (38)
- et al.
Incomplete information, inferences, and individual differences: the case of environmental judgements
Organizational Behavior and Human Decision Processes
(2000) - et al.
Set-valued information systems
Information Sciences
(2006) - et al.
Attribute reduction and optimal decision rules acquisition for continuous valued information systems
Information Sciences
(2009) Rough set approach to incomplete information systems
Information Sciences
(1998)- et al.
Maximal consistent block technique for rule acquisition in incomplete information systems
Information Sciences
(2003) - et al.
Knowledge acquisition in incomplete information systems: a rough set approach
European Journal of Operational Research
(2006) Introduction to special issues on data mining and granular computing
International Journal of Approximate Reasoning
(2005)- et al.
Rudiments of rough sets
Information Sciences
(2007) - et al.
Rough sets: some extensions
Information Sciences
(2007) - et al.
Rough sets and Boolean reasoning
Information Sciences
(2007)
Approximation reduction in inconsistent incomplete decision tables
Knowledge-based Systems
Dominance-based rough set approach and knowledge reductions in incomplete ordered information system
Information Sciences
Credible rules in incomplete decision system based on descriptors
Knowledge-Based Systems
Dominance-based rough set approach to incomplete interval-valued information system
Data & Knowledge Engineering
Neighborhood operator systems and approximations
Information Sciences
Neighborhood systems and approximate retrieval
Information Sciences
Characteristic relations for incomplete data: a generalization of the indiscernibility relation
Data with missing attribute values: generalization of indiscernibility relation and rule induction
On the unknown attribute values in learning from examples
Cited by (107)
A novel approach to discretizing information systems associated with neighborhood rough sets
2024, International Journal of Approximate ReasoningGeneral three-way decision models on incomplete information tables
2022, Information SciencesHybrid filter–wrapper attribute selection with alpha-level fuzzy rough sets
2022, Expert Systems with ApplicationsA novel fast constructing neighborhood covering algorithm for efficient classification
2021, Knowledge-Based Systems