Dominance-based rough set approach to incomplete ordered information systems
Introduction
In 1982, Pawlak [37] introduced rough set theory (RST) as a formal mathematical tool for dealing with inconsistency and ambiguity in information systems [38]. The main advantage of rough set approach is that it does not need any preliminary or additional information about data like probability distributions in probability theory, grades of membership in fuzzy set theory, or mass functions in Dempster–Shafer theory of evidence. The starting point of this theory is an observation that objects having the same description are indiscernible in view of the available information about them [40], [49]. The equivalence classes form a partition of the universe of discourse and constitute the basic granules of knowledge. The lower and upper approximations of sets which are constructed by these equivalence classes are fundamental concepts in RST. To date, the RST has been successfully applied in many categories such as machine learning, pattern recognition, knowledge discovery, decision analysis and expert systems [2], [26], [27], [39], [52], [53], [54].
However, the original rough set theory is not able to discover and process inconsistencies coming from consideration of criteria, that is, attributes with preference-ordered domains (scales), such as test score, university ranking, and house pricing. To address this issue, Greco et al. [14], [15] proposed an extension of RST, which is called the dominance-based rough set approach (DRSA) to take into account the ordering properties of criteria. This innovation is mainly based on substitution of the indiscernibility (or, equivalence) relation by a dominance relation, which permits approximations of ordered sets within multi-criteria decision making and multi-criteria sorting problems. In DRSA, where condition attributes are all criteria and decision classes are preference-ordered, the knowledge to be approximated is a collection of upward and downward unions of decision classes and the dominance classes are sets of objects defined by using a dominance relation. Moreover, the DRSA takes into consideration monotonic relationships between descriptions of objects on condition criteria and their class labels. Since its inception, DRSA has been extended to cope with knowledge acquisition in various types of ordered information systems (OISs) [3], [6], [12], [23], [33], [51], [59], [60], [64].
Furthermore, RST is founded on the assumption that, with every object of the universe of discourse, we associate some concrete information. And all available objects are completely described by a set of attribute values, i.e., there are no unknown values on condition attributes describing these objects. However, in many real-life applications, the information concerning the properties of objects is only partial, not specified, either because it is not possible to record the attribute values, or because it is definitely impossible to get a value on a given attribute for certain objects [16], [17], [49]. Such information systems are called incomplete. Roughly speaking, there are three main strategies for handling incomplete systems [31]: (I) completion, in which each unknown value is replaced by a specific value. More specifically, we may replace it with the most common value, or the mean, or the median of all known values of the attribute; (II) case deletion, we just ignore or discard all instances with unknown values for at least one feature; (III) “best left alone”, we treat the unknown value as a special symbol and make no change when it is encountered. The former two methods inevitably destroy the original structure of data. This paper is focused on the third strategy, in which the meaning of the symbol plays an important role. In essence, there exist two diffident semantic explanations for incomplete information:
- •
the “lost value” semantics (unknown values of attributes allow any comparison)
- •
the “absent value” semantics (unknown values of attributes do not allow any comparison).
The former interpretation is used when the original attribute value was recorded but was erased later or was not recorded at all because it was considered to be irrelevant to the final outcome. In this case, all attribute values may be used to replace a lost/missing attribute value. In the latter interpretation, the attribute value was not recorded for certain objects in a given moment although it is definitely relevant to the decision class. That is, the choice between two interpretations should be made on the basis of an additional information about the cause of unknown attribute values [18]. For example, let us consider the problem of identifying patients with Flu, and one of the attributes related to the concept is Cough. If, originally the symptom of Cough was known, however due to some reasons, currently the value is not recorded, then both possible values Yes and No can be used for further analysis. But it is not the case that if a patient refuses to take his/her temperature or the test result is not at hand, then decision makers, e.g., doctors cannot compare the unknown value with already known values for the sake of cautiousness.
For such incomplete information systems (IISs) there are two special cases: in the first case, all unknown attribute values are lost, and in the second case, all unknown attribute values are absent. In the former, Kryszkiewicz [28], [29] proposed the tolerance relation to treat this kind of IISs, while, in the latter, Stefanowski and Tsoukiàs [49] proposed the similarity relation. Such idea was pursued in the literature [11], [35], [50], [63]. For a more general IIS, the characteristic relation [16], [17] was used to construct the blocks of attribute-value pairs. A further investigation of the characteristic relation was made in [61]. Incremental methods for dynamic attribute generalization based on the characteristic relation were proposed in [30], [32], [46] under variations of objects, attributes or attribute values.
Correspondingly, three different formalisms to handle incomplete ordered information systems (IOISs) may be considered: expanded dominance relations [45], similarity dominance relations [62] and characteristic-based dominance relations proposed in this paper. Shao and Zhang [45] presented an extension of the dominance relation in IOISs under the assumption that all unknown values are lost, whereas Yang et al. [62] introduced similarity dominance relations to study IOISs with all unknown values adopting the “absent value” semantics. As mentioned before, the IOISs they had examined are just two extreme cases of IOISs, i.e., they did not distinguish these two different semantics but adopted only one of the two semantics. In this paper we assume that an IOIS may have unknown values of both types, a more complex and general situation, with lost values and absent values coexisting. The purpose of this paper is to solve problems in such a situation. The characteristic-based dominance relation which differs from others is proposed to handle this kind of systems. On the other hand, instead of distinguishing a priori the semantics of unknown values, Błaszczyński et al. [5] proposed approaches, denoted by DRSA- to consider some desirable properties that the DRSA should have. Each approach resulted from a specific definition of the dominance relation. When unknown values are met, one particular approach denoted by i (DRSA-mvi) is chosen according to required properties such as rough inclusion and complementarity. It is worth noting that DRSA-mv2 is equivalent to the approach described in [45]. Fig. 1 shows the relationships among the present and aforementioned works, in which the relation plays a crucial role in attribute reduction. It can also instruct one to choose the suitable approach. That is to say, the relation we used is completely determined by the problems we are faced with.
A crucial application of RST is rule induction. The lower and upper approximations are employed to derive certain and possible decision rules. But there is an overlap between these two sets of rules because of the subset relation between lower and upper approximations. Theory of three-way decisions (3WD) is an attempt to deal with this defect [20], [67], [68]. Its basic ideas come from Pawlak’s rough sets [37] and probability rough sets [65], [66], [71]. And it can help to interpret the positive, negative and boundary regions of a crisp or fuzzy set as three decision outcomes, acceptance, rejection, and uncertainty (or deferment) in a ternary classification. However, theory of 3WD has not been applied in the context of ordered decision tables (ODTs) yet. In this paper, approximations of ordered sets are explained in the framework of three-way decision theory to fill the gap between DRSA and 3WD. This is another contribution of our work.
The other parts of this paper are organized as follows. In Section 2, we recall some preliminary concepts about IOISs. In Section 3, we introduce a characteristic-based dominance relation to an IOIS. Based on this relation, we establish another extension of dominance-based rough set approach. In Section 4, an approach to attribute reduction in IOISs is investigated by introducing the discernibility matrix and the discernibility function. In Section 5, an approach to computing all relative reducts is presented in consistent incomplete ordered decision tables. In both Sections 4 and 5, a heuristic algorithm is developed to find a unique reduct and a single relative reduct respectively. Some numerical experiments are conducted to test the validity of the proposed techniques in Section 6. Finally, some concluding remarks and suggestions for further work are made in Section 7.
Section snippets
Preliminaries
In this section we mainly recall several basic concepts and introduce some notations.
An information system is a quadruple where U is a finite non-empty set of objects, AT is a finite non-empty set of attributes, and Va is the domain of attribute a, and f: U × AT → V is a total function such that f(x, a) ∈ Va for every a ∈ AT, x ∈ U, called an information function.
In practical decision-making analysis, we always consider a binary dominance relation between objects that
Characteristic-based dominance relations in incomplete ordered information systems
In this section, characteristic-based dominance relations are firstly presented to deal with incomplete ordered information systems with unknown values having two interpretations. Approximations of an arbitrary set, which are expressed in terms of granules of knowledge, are also introduced based on this relation.
Attribute reduction in incomplete ordered information systems
Attribute reduction is one of the major topics in rough set theory and is also discussed in DRSA. In the following, the concept of attribute reduction in incomplete ordered information systems is proposed, and then the discernibility matrix method of computing all reducts is introduced. A heuristic algorithm with the forward greedy search strategy is developed to find a unique reduct as well.
Given an IOIS, for a ∈ B, if we say knowledge a is redundant or superfluous for B,
Relative reducts in incomplete ordered decision tables
In this section, we study the case of IODTs. Let us restate that in an IODT, there is no unknown attribute value for the decision criterion.
Recall that in DRSA the quality of classification by an attribute subset A is . As to the task in Example 3.3, we have .
Definition 5.1 Let be an IODT, where d is an overall preference of objects. Denote
then is a dominance relation induced by the decision
Experimental analysis
In this section, some real-world tasks are gathered in the empirical study to examine the effectiveness and efficiency of the proposed methods.
Monotonic classification is an important class of ordinal classification tasks, where the decision values are discrete and ordinal. The instances are assigned to specified classes which are not only disjoint but also ordered, and there are monotonicity constrains between condition features and decision classes. For example, in sensory tests of wine
Conclusions and future work
Dominance-based rough set approach generalizes rough set theory by using dominance relations rather than equivalence relations. And incompletion is a common characteristic of information systems especially for large scale data sets due to various reasons in some practical applications. The main objective of this paper is to introduce characteristic-based dominance relations to incomplete ordered information systems with unknown attribute values being lost or absent. The characteristic-based
Acknowledgments
The authors are very grateful to the anonymous referees for their constructive comments and suggestions that led to an improved version of this paper. This research was supported by the National Natural Science Foundation of China (grant nos. 61179038, 11571010) and the Fundamental Research Funds for the Central Universities (grant no.2015201020201).
References (72)
- et al.
Inductive discovery of laws using monotonic rules
Eng. Appl. Artif. Intell.
(2012) - et al.
Induction of ordinal classification rules from incomplete data
Lect. Notes Comput. Sci.
(2012) - et al.
Mining incomplete data with singleton, subset and concept probabilistic approximations
Inf. Sci.
(2014) - et al.
Consistency of incomplete data
Inf. Sci.
(2015) - et al.
Support-vector networks
Mach. Learn.
(1995) - et al.
Approximate distribution reducts in inconsistent interval-valued ordered decision tables
Inf. Sci.
(2014) - et al.
Computers and Intractability: A Guide to the Theory of NP-Completeness
(1979) - et al.
Rough approximation by dominance relations
Int. J. Intell. Syst.
(2002) - et al.
Idiot’s Bayes—not so stupid after all?
Int. Stat. Rev.
(2001) - et al.
Feature selection for monotonic classification
IEEE Trans. Fuzzy Syst.
(2012)
Another look at measures of forecast accuracy
Int. J. Forecast.
Rough sets attributes reduction based expert system in interlaced video sequences
IEEE Trans. Consum. Electron.
Rules in incomplete information systems
Inf. Sci.
A rough sets based characteristic relation approach for dynamic attribute generalization in data mining
Knowl. Based Syst.
Statistical Analysis with Missing Data
Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values
Inf. Sci.
Fuzzy-rough sets for information measures and selection of relevant genes from microarray data
IEEE Trans. Syst. Man Cybern. Part B: Cybern.
Synthesis and Optimization of Digital Circuits
Rudiments of rough sets
Inf. Sci.
Rough sets and Boolean reasoning
Inf. Sci.
Interval-valued analysis for discriminative gene selection and tissue sample classification using microarray data
Genomics
Set-valued ordered information systems
Inf. Sci.
An efficient accelerator for attribute reduction from incomplete data in rough set framework
Pattern Recognit.
Dominance relation and rules in an incomplete ordered information system
Int. J. Intell. Syst.
The discernibility matrices and functions in information systems
Intelligent Decision Support: Handbook of Applications and Advances of the Rough Set Theory
Reducts and constructs in classic and dominance-based rough sets approach
Inf. Sci.
Rough sets as a front end of neural-networks texture classifiers
Neurocomputing
Automated extraction of medical expert system rules from clinical databases based on rough set theory
Inf. Sci.
The Nature of Statistical Learning Theory
The Complexity of Boolean Functions
Further investigation of characteristic relation in incomplete information system
Syst. Eng.—Theory Pract.
Dominance-based rough set approach and knowledge reductions in incomplete ordered information system
Inf. Sci.
Neighborhood systems-based rough sets in incomplete information system
Knowl. Based Syst.
α-dominance relation and rough sets in interval-valued information systems
Inf. Sci.
Probabilistic approaches to rough sets
Expert Syst.
Three-way decisions with probabilistic rough sets
Inf. Sci.
Cited by (86)
Acquisition of representative objects and attribute reductions based on generalized decisions of dominance-based rough set approach
2024, Engineering Applications of Artificial IntelligenceMultigranulation fuzzy probabilistic rough sets induced by overlap functions and their applications
2024, Fuzzy Sets and SystemsFeature selection of dominance-based neighborhood rough set approach for processing hybrid ordered data
2024, International Journal of Approximate ReasoningFuzzy rough feature selection using a robust non-linear vague quantifier for ordinal classification
2023, Expert Systems with ApplicationsMatrix-based multi-granulation fusion approach for dynamic updating of knowledge in multi-source information
2023, Knowledge-Based SystemsRegret-based three-way decision making with possibility dominance and SPA theory in incomplete information system
2023, Expert Systems with Applications