Consistent data operations for multi-databases in extended possibility-based data models
Introduction
Since the introduction of fuzzy set theory in 1965 (Zadeh, 1965), the fuzzy concept has been applied to various domains, including decision-making procedure (Lin, Hsu, & Sheen, 2007), financial analysis (McIvor, McCloskey, Maguire, & Humphreys, 2004), decision support system (Kulak, 2005) and databases (Buckles & Petry, 1982). The database with fuzzy concept is regarded as an important role in building real-world data to expert systems (Zemankova-Leech & Kandel, 1984) and artificial intelligence (Melton & Shenoi, 1991). For traditional databases, relational algebra (Codd, 1979, Codd, 1970) is widely adopted to represent data operations. With the addition of fuzzy concept, relational algebra has also been extended into several forms for fuzzy databases (Bosc et al., 1995, Bosc et al., 1997, Chen et al., 1993, Dubois and Prade, 1996, Prade and Testemale, 1984, Yager, 1995). However, such extensions often raise new consistency problems that will not occur in tradition relation algebra. In traditional relational algebra, many data operations (such as project, union, intersect and equal join) involve redundant tuple removal. The removal is achieved directly by deleting duplicates since the redundant tuples must be identical under the crisp concept. While involving fuzziness, the redundant tuples may resemble one another but not necessarily be identical. Restated, the tuple redundancy can be approximated according to the closeness of the tuples.
Most studies dealing with the extended relational algebra applied the approximate tuple redundancy based on the premise that the resemblance of given tuples is fixed throughout the database. The premise is not necessarily the case in a multi-database context involving fuzzy concept. In such a database, heterogeneity regarding the level of resemblance for tuples may arise among the components of multi-databases. As it is often necessary to integrate information from multiple databases for various applications, including decision-making (Reyes & Raisinghani, 2002), data mining (Hu & Cercone, 2004) or data warehousing (Bukhres and Elmagarmid, 1995, Calvanese et al., 2001, Zhou et al., 1995), how to avoid such heterogeneity is essential for multi-database design with fuzziness.
Heterogeneity may initially occur in the resemblance relations among multiple fuzzy databases. A resemblance relation describes the resemblance between each pair of elements of a scalar domain in fuzzy databases. Two fuzzy databases might define the resemblance relation for a domain differently, making it difficult to estimate tuple redundancy (or data redundancy) for the result coming from both databases. To complicate the problem further, different fuzzy databases may use different types of attribute values, like subsets of a domain or a possibility distribution on the domain. Moreover, they may originally define the closeness of attribute values using different measurement and threshold values.
This work examines the heterogeneity problem of multiple fuzzy databases for the extended possibility-based (EP) data model, in which attribute values could be fuzzy sets or linguistic terms. Table 1 presents an example of a database relation in the data model.
An example of the resemblance relation for the personality attribute domain, denoted by p, is shown below:
Under such circumstances, heterogeneity results from both different resemblance relation and different threshold values associated therewith. Different relation and different associated thresholds may either induce different equivalence classes, or yield different degree of closeness for a single pair of attribute values. Either equivalence classes (Buckles and Petry, 1982, Shenoi and Melton, 1989) or the closeness of attribute values (Chen et al., 1993, Chen et al., 1992, Guu et al., 2002, Ma et al., 2000, Rundensteiner et al., 1989) serves as the basis of data redundancy in various fuzzy database models (Buckles and Petry, 1982, Rundensteiner et al., 1989, Shenoi and Melton, 1989).
When two database relations to be manipulated have conflict on data redundancy, the result of database operation will either lose original information or conflict with the original relations in semantics. At one extreme, the corresponding attributes in database relations to be operated must employ identical resemblance relations for a single domain. However, this constraint is too strong for the databases to be operated. This work therefore introduces notions of consistency constraint to relax the constraint described. Under the consistency constraint, two different resemblance relations with appropriate threshold values for a domain can induce the same equivalence classes at any given level of cut. With the same equivalence classes, two fuzzy databases using the resemblance-based data model (Buckles and Petry, 1982, Shenoi and Melton, 1989) agree with each other in terms of data redundancy, also the databases using the extended possibility-based data model (Chen et al., 1993, Chen et al., 1992, Guu et al., 2002, Ma et al., 2000, Rundensteiner et al., 1989) exhibit the same closeness for the same pair of attribute values.
The consistency constraint fulfill different purposes for the distributed databases constructed using inverse approach. For those built in a top-down manner, the constraint can guide the design of the resemblance relations in the component databases to prevent information loss when operating data from multiple component databases. However, for multi-databases or data warehouses, where each component database was created independently, the constraint merely provide assistance in verifying the agreement of database on data redundancy.
The rest of this paper is organized as follows. Section 2 provides a preliminary review of the EP databases and some properties of the proximity relations employed in the databases. Sections 3 then examines various estimates of attribute value closeness, and presents the extended algebraic operations that related to the estimation of tuple redundancy. Next, Section 4 introduces the full consistency and conformity constraints for the EP databases, and demonstrates that the closeness of a pair of attribute values can be identical using different resemblance relations under these constraints. Conclusions are finally drawn in Section 5, along with recommendations for future research.
Section snippets
Preliminaries
Extended possibility-based (EP) data models are hybrids of possibility-based models (Prade & Testemale, 1984) and resemblance-based models (Shenoi & Melton, 1990). With EP databases, a relation schema of arity m is represented by R(A, D, P) where A = {A1, A2, …, Am}, D = {D1, D2, …, Dm} and P = {p1, p2, …, pm}, and each Aj denotes an attribute of R, each Dj denotes the domain of Aj, and each pj is a proximity relation describing the degree of resemblance between elements in domain Dj. A schema could be
Problems for data operation in EP databases
Closeness between attribute values determines the tuple redundancy, and thus the result of certain data operations. This section first investigates the closeness estimates of attribute values in the EP data models, and then based on this discusses tuple approximately redundancy. Next, the relational algebraic operations involving tuple redundancy are presented. Finally, an example is given to demonstrate the problem that occurs when two databases conflict in term of data redundancy.
The consistent constraints of proximity relation
This section provides a constraint for the EP databases that avoids the above problem. This study first reviews the consistency of proximity relation from our earlier study (Liu, Chang, & Lin, 2003) and then applies it to the considered data models. For simplicity, let D denote a given scalar domain, where denotes the set of all possible proximity relations on D. An ordering is imposed, denoted by ≺, between any pair of distinct elements in D to ignore the duplicated relation values
Conclusions
The objective of this work is to provide constraints for designing extended possibility-base database to avoid inconsistent problem of database operation. Firstly, this work investigated various methods of estimating the closeness of attribute values in order to approximate tuple redundancy in the databases considered. Secondly, it introduced the extended algebraic operations that related to approximate redundancy for operating data from multiple EP databases. Thirdly, this study demonstrates
Acknowledgement
This research was supported by the National Science Council under Grant 93-2213-E-155-025.
References (42)
- et al.
Axiomatisation of fuzzy multivalued dependencies in a fuzzy relational data model
Fuzzy Sets and Systems
(1998) - et al.
Flexible queries in relational databases – the example of the division operator
Theoretical Computer Science
(1997) - et al.
Normalization based on fuzzy functional dependency in a fuzzy relational data model
Information Systems
(1996) - et al.
Fuzzy lossless decompositions in databases
Fuzzy Sets and Systems
(1998) - et al.
Semantics of quotient operators in fuzzy relational database
Fuzzy Sets and Systems
(1996) - et al.
A fuzzy-based decision-making procedure for data warehouse system selection
Expert Systems with Applications
(2007) - et al.
Using a fuzzy approach to support financial analysis in the corporate acquisition process
Expert Systems with Applications
(2004) - et al.
Fuzzy relations and fuzzy relational database
Journal of Computers and Mathematics with Applications
(1991) - et al.
Generalizing database relational algebra for the treatment of incomplete or uncertain information
Information Sciences
(1984) - et al.
On nearness measures in fuzzy relational data models
International Journal of Approximate Reasoning
(1989)
Fuzzy functional dependencies and independencies in extended fuzzy relational database models
Fuzzy Sets and Systems
Proximity relations in the fuzzy relational database model
Fuzzy Sets and Systems
An extended version of the fuzzy relational database model
Information Sciences
Functional dependencies and normal forms in the fuzzy relational database models
Information Science
Multivalued dependencies in fuzzy relational databases
Fuzzy Sets and Systems
Fuzzy sets
Information Control
Fuzzy sets as a basis for a theory of possibility
Fuzzy Sets and Systems
A fuzzy representation of data for relational databases
Fuzzy Sets and Systems
Object-oriented multidatabases systems: A solution for advanced applications
Data integration in data warehousing
International Journal Cooperative Information Systems
Cited by (3)
Lossless join decomposition for extended possibility-based fuzzy relational databases
2014, Journal of Applied MathematicsThe search of fuzzy functional dependency in the integration of fuzzy databases
2013, ICIC Express LettersHandling missing data in extended possibility-based fuzzy relational databases
2012, Proceedings - 3rd International Conference on Innovations in Bio-Inspired Computing and Applications, IBICA 2012