Derivation digraphs for dependencies in ordinal and similarity-based data
Introduction
Reasoning with if-then rules belongs to important branches of computer science and relational data analysis. In its simplest form, the rules are considered as implications between conjunctions of attributes (features) and are intended to describe if-then dependencies between presence/absence of attributes. Considering the idempotency, associativity, and commutativity of conjunction, the rules are usually written as expressions , where A and B are finite sets of attributes. The basic meaning of is that if an object has all the attributes from A, then it has all the attributes from B. Rules of this form have been extensively studied in relational data analysis and in particular in formal concept analysis (FCA, see [18]), where they are known as attribute implications, and serve as an indirect description of object-attribute clusters called formal concepts that can be found in object-attribute incidence data. The seminal contribution to FCA regarding attribute implications is [22] where the authors have shown minimal sets of attribute implications that convey information about all if-then dependencies valid in input data. Note that attribute implications can also be seen as a particular case of association rules where the support is disregarded and one is only interested in mining rules with confidence equal to 1, see [1], [26], [30], [37].
The rules also play a central role in dependency theory, most notably in relational database systems [15], [28], where they are used to specify constraints on relation schemes and are called functional dependencies. From the point of view of the syntax, functional dependencies are the same formulas as attribute implications but their interpretation is different. The rules are interpreted in relations on relation schemes (formal counterparts to database tables) and the basic meaning of is that any two tuples which agree on all values of the attributes from A should also agree on all values of the attributes from B.
An interesting property is that both the different interpretations of the if-then rules yield the same notion of semantic entailment. As a result, one can use a single inference system for reasoning with both attribute implications and functional dependencies. The best known inference system has been proposed by Armstrong [3] and can be seen as a system consisting of two rules [25]:
- (Ax)
infer ,
- (Cut)
from and infer ,
where A, B, C, D are subsets of attributes. Various other systems have been proposed in order to normalize proofs and enable automated deduction techniques [2]. An interesting alternative graph-based approach that is also aimed at possible automated proving has been proposed by Maier in [27], see also [28] for an extensive description and its application for theorem proving. Note that in [6] the authors use derivation trees which can be seen as an analogous approach. Later related works include [34] on hypergraph formalisms for representation of database schemes and [4] introducing FD-graphs analogous to the approach by Maier.
The present paper is devoted to a graph-based inference for if-then rules where the attributes are graded, meaning that each attribute in the antecedent and consequent of is equipped with a threshold degree saying that an object must have the attribute at least to a specified degree, or in the database interpretation, two tuples must have similar values of the attribute at least to a specified degree. Technically, the rules are considered as implications between graded sets (fuzzy sets), i.e., rules of the form , where A and B are maps from the set of all attributes to a suitable scale of degrees (typically a real unit interval or its finite subset). Rules of this form naturally appear in dependency analysis of object-attribute relational data with grades [8] and similarity-based relational databases [13] and have been proposed and studied in the past, see [8] for an overview of results. If-then formulas of a similar form with antecedents containing constants for truth degrees have been studied before in [10] and proved to be useful for characterization of classes of algebras with fuzzy equalities closed under various class operators (varieties, quasivarieties, semivarieties, and sur-reflective classes, see [11], [36]). We therefore consider exploration of inference systems for rules in this broad family an important task from both the theoretical and practical points of view.
Looking for a graph-based inference system for graded if-then rules is interesting from several viewpoints. First, the notion of semantic entailment of the rules we consider is graded, i.e., the entailment expresses a degree to which a rule follows from other rules. It is therefore interesting to find a graph-based inference system that is able to infer rules from other ones including the entailment degrees. Second, there is an Armstrong-like axiomatization of the semantic entailment [8] for the graded rules, i.e., one might be interested in finding a corresponding graph-based inference method. Third, the Armstrong-like proofs can be formalized to form particular sequences (so-called MRAP-sequences, see [12]). It is therefore interesting to observe whether the graph-based proofs can be constructed according to the normalized proofs and vice versa. We address all the issues in this paper.
The paper is organized as follows. In Section 2, we present preliminaries from residuated structures and fuzzy attribute implications. Section 3 introduces derivation digraphs as particular labeled acyclic digraphs constructed from collections of FAIs. Furthermore, in Section 4, we prove that the construction of digraphs can be used to describe degrees of semantic entailment, i.e., we show a completeness result using the graph-based inference. Furthermore, Sections 5 Computing closures, 6 Proofs in normalized forms elaborate on related issues of computing closures and getting normalized proofs. We conclude the paper by an illustrative example in Section 7 and conclusions in Section 8.
Section snippets
Preliminaries
We recall basic notions of directed graphs, residuated lattices, and fuzzy attribute implications. Details can be found in [5], [7], [17], [21], [23].
A directed graph (a digraph) is a pair , where V is a nonempty finite set of elements called vertices and A is a binary relation , each is called an arc. V and A are called the vertex set and the arc set of , respectively. If , we say that the arc leaves v and enters w. A digraph is acyclic (in short, is
Derivation acyclic digraphs for FAIs
We now introduce derivation digraphs as particular acyclic digraphs where vertices are labeled by attributes from Y and degrees from . The arcs of the digraphs will correspond to fuzzy attribute implications from an input theory and indicate which formulas from the theory are used in the process of inference. In what follows, is a complete residuated lattice. In order to denote that is a hedge on , we write . Definition 1 Let T be a set of FAIs over Y. Any such that and for every T-based -derivation DAG
Completeness
We now turn our attention to the completeness by which we mean a characterization of the semantic entailment by existence of -derivation DAGs. We prove the claim by showing that a FAI is provable from a theory T iff it has a T-based -derivation DAG.
In order to simplify the proofs in this section, we use the following derived inference rules [8, Lemma 3.1]:
- (Add)
from and infer ,
- (Pro)
from infer .
By a derivable rule we mean that for all , from the part preceding “infer”,
Computing closures
Considering the construction of T-based -derivation DAGs as an alternative proof technique not only can help visualize the inference from if-then rules but in addition, the construction of such DAGs yields algorithms for checking whether (and to what degree) semantically follows from a theory. Indeed, in order to check whether , we may proceed as follows: Procedure 1 [Checking of full entailment] For any T and : Construct a T-based -derivation DAG withIf ,
Proofs in normalized forms
We now show that T-based -derivation DAGs are in a correspondence with normalized proofs called MRAP-sequences [12]. A nonconstructive proof of this observation already follows from our observations since we have already established a completeness result in Theorem 5, and the inference system for the MRAP-sequences is also known to be complete, i.e., there is a correspondence between T-based -derivation DAGs and MRAP-sequences representing the same proofs. Nevertheless, we focus in this
Illustrative example
For further illustration of the procedures discussed in the paper, we present here an extended example in which we use the similarity-based database semantics of formulas. Assume that a bank is keeping the following information about clients: city of residence (attribute c), age (attribute a), education (attribute e), job position (attribute j), salary (attribute s), loan amount (attribute l), account balance (attribute b), number of children (attribute ch), and insurance products (attribute i
Conclusions
The paper presents initial study of graph-based inference methods for graded if-then rules. We have introduced a notion of a T-based -derivation directed acyclic graph (DAG) which generalizes the ordinary notion of a T-based derivation DAG from [27]. The main results show that degrees of semantic entailment of if-then rules from collections of other if-then rules can be characterized by the existence of such directed acyclic graphs. We have proved the result by showing correspondences between
Acknowledgements
L. Urbanova is supported by Grant No. P103/11/1456 of the Czech Science Foundation. V. Vychodil is supported by project reg. No. CZ.1.07/2.3.00/20.0059 of the European Social Fund in the Czech Republic.
References (37)
- et al.
A logical approach to fuzzy truth hedges
Inform. Sci.
(2013) L-fuzzy sets
J. Math. Anal. Appl.
(1967)On very true
Fuzzy Sets Syst.
(2001)Association discovery from relational data via granular computing
Inform. Sci.
(2013)- et al.
Dual multi-adjoint concept lattices
Inform. Sci.
(2013) - et al.
Efficient mining of association rules using closed itemset lattices
Inform. Syst.
(1999) - et al.
Globalization of intuitionistic set theory
Ann. Pure Appl. Logic
(1987) - Rakesh Agrawal, Tomasz Imieliński, Arun Swami, Mining association rules between sets of items in large databases, in:...
- et al.
A non-explosive treatment of functional dependencies using rewriting logic
Lect. Notes Artifi. Intell.
(2004) Dependency structures of data base relationships
Graph algorithms for functional dependency manipulation
J. ACM
Digraphs: Theory, Algorithms and Applications. Springer Monographs in Mathematics
Computational problems related to the design of normal form relational schemas
ACM Trans. Database Syst.
Fuzzy Relational Systems: Foundations and Principles
Attribute implications in a fuzzy setting
Fuzzy horn logic I: proof theory
Arch. Math. Logic
Fuzzy horn logic II: implicationally defined classes
Arch. Math. Logic
Cited by (5)
Closure structures parameterized by systems of isotone Galois connections
2017, International Journal of Approximate ReasoningCitation Excerpt :Let us note that the presented inference system for if-then formulas which is parameterized by a system of isotone Galois connections is not the only one possible. Several equivalent systems can be introduced which are analogous to the graph-based system in [52] and the system based on the simplification equivalence [6] which is suitable for automated provers. Also note that Theorem 7 and Theorem 9 are limited to algebraic S-closure operators on algebraic lattices.
Computing sets of graded attribute implications with witnessed non-redundancy
2016, Information SciencesOn sets of graded attribute implications with witnessed non-redundancy
2016, Information SciencesCitation Excerpt :An alternative inference system based on the rules of simplification in presented in [6]. A graph-based inference system for FAIs is presented in [52]. The implications of [11] are more or less just theoretical because in practice one is unable to use such a graph-based procedure to find a system of pseudo-intents—because of the enormous size of G, enumerating of all maximal independent sets satisfying the additional condition (i) is intractable.
On minimal sets of graded attribute implications
2015, Information Sciences