A new approach to generate weighted fuzzy rules using genetic algorithms for estimating null values

https://doi.org/10.1016/j.eswa.2007.07.033Get rights and content

Abstract

In this paper, we present a new method to generate weighted fuzzy rules using genetic algorithms for estimating null values in relational database systems, where there are negative functional dependency relationships between attributes. The proposed method can get higher average estimated accuracy rates than the method presented in [Chen, S. M., & Huang, C. M. (2003). Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Transactions on Fuzzy Systems, 11(4), 495–506].

Introduction

In traditional relational database systems, there are some functional dependency relationships among attributes. For example, assume that there is a relation R with attributes A and B. If the value of attribute A of a tuple increases and the value of attribute B of the tuple increases, then we say that there is a positive dependency relationship between the attribute A and the attribute B. On the other hand, if the value of attribute A of a tuple increases and the value of attribute B of the tuple decreases, then we say that there is a negative dependency relationship between the attribute A and the attribute B. In recent years, relational database systems are widely used in enterprises. However, a relational database system will not operate properly if it exists some null values of attributes in the system. Cheng and Wang (2006) pointed out that a basic problem with null values is that they have many plausible interpretations. They also pointed out that the various manifestations of null values can be reduced to two basic interpretations (Zaniolo, 1984). That is,

  • (1)

    The unknown interpretation: A value exists but it is not known.

  • (2)

    The nonexistent interpretation: A value does not exist.

In recent years, some methods (Chen and Chen, 2000, Chen and Yeh, 1997, Chen and Huang, 2003, Chen and Lee, 2003, Chen and Lee, 2005, Chen and Hsiao, 2005, Cheng and Wang, 2006) have been presented to estimate null values in relational database systems based on the fuzzy set theory (Zadeh, 1965, Chen, 1988).

Chen and Chen (2000) presented a method to estimate null values in distributed relational database systems, where an “employee database” is used to illustrate their method for estimating the null values of the attributes “Degree” and “Salary”, respectively. However, there is a drawback in the method presented in (Chen & Chen, 2000), i.e., the fuzzy rules are given directly by experts and are not generated by the system automatically. Chen and Yeh (1997) presented a method to estimate null values in relational database systems by generating fuzzy rules from relational database systems. They proposed a fuzzy concept learning system (FCLS) algorithm to construct a fuzzy decision tree from the “employee database”, and then generate fuzzy rules automatically from the constructed fuzzy decision tree for estimating the null values of the attribute “Salary” of the employee database. Chen and Huang (2003) presented a method to estimate null values in relational database systems using the fuzzy set theory and genetic algorithms (Holland, 1975) to adjust the weight of attributes in the antecedent part of the generated fuzzy rules, where the “employee database” is used to illustrate their method for estimating the null values of the attribute “Salary”. Chen and Lee (2003) presented a method to generate fuzzy rules from relational database systems for estimating null values based on the concept of “coefficient of determination” and “regression equations” of the statistics, where the “employee database” is used to illustrate their method for estimating the null values of the attribute “Salary”. Chen and Lee (2005) presented a method for estimating null values in relational database systems based on genetic algorithms. It tunes the membership functions of the linguistic values of the attributes in the “employee database” for estimating the null values of the attribute “Salary”. Chen and Hsiao (2005) presented a method to estimate null values in relational database systems based on automatic clustering techniques, where the “employee database” is used to illustrate their method for estimating the null values of the attribute “Salary”. Cheng and Wang (2006) presented an approach for estimating null values in relational database systems using clustering techniques, where the “employee database” is used to illustrate their method for estimating the null values of the attribute “Salary”. However, these methods do not consider the situation in which there are negative dependency relationships between attributes. Therefore, it is necessary to develop a new method for estimating null values in relational database systems in which there are negative dependency relationships between attributes.

In this paper, we present a new method to generate weighted fuzzy rules using genetic algorithms for estimating null values in relational database systems having negative functional dependency relationships between attributes. The difference between the proposed method and the existing methods is that it uses genetic algorithms rather than clustering techniques for estimating null values in relational database systems. The proposed method gets higher average estimated accuracy rates than the method presented in (Chen & Huang, 2003).

The rest of this paper is organized as follows. In Section 2, we briefly review basic concepts of genetic algorithms (Holland, 1975). In Section 3, we present a method to estimate null values in relational database systems by tuning the weights of attributes. In Section 4, we use the “Benz secondhand car database” (Huang & Chen, 2002) to make an experiment to compare the average estimated error rate of the proposed method with the method presented in (Chen & Huang, 2003). The conclusions are discussed in Section 5.

Section snippets

Basic concepts of genetic algorithms

The concept of genetic algorithms was proposed by Holland (1975), which is based on the theory of evolution proposed by Charles Darwin. It can find optimum solutions to solve problems in a way similar to the evolution process of a species. In a genetic algorithm, we encode the parameters of a solution into a numerical stream, where the numerical stream is called a chromosome. The basic element of a chromosome is called a gene. A genetic algorithm uses a fitness function to calculate the degree

A new method for estimating null values in relational database systems using genetic algorithms

In this session, we present a new method to generate weighted fuzzy rules using a genetic algorithm for estimating null values in relational database systems, where there are negative functional dependency relationships between attributes. In a genetic algorithm, we must define the format of a chromosome in a population. For example, we use the relation of “Secondhand Cars” shown in Table 1 to describe how to define the chromosomes. Fig. 1 shows the membership functions of the linguistic terms

Experimental results

Assume that there is a relation in a relational database containing a null value as shown in Table 4, where Table 4 is derived from Table 1 by letting the value of the attribute “Price” of tuple T1 be a null value.

In order to estimate the null value of the attribute “Price” of the tuple T1 whose Car-ID is 1, we must find a tuple that is closest to tuple T1. The process for computing the degree of closeness of the tuple T1 with respect to the other tuples is described as follows. Take tuple T2

Conclusions

In this paper, we have presented a new method to generate weighted fuzzy rules using genetic algorithms for estimating null values in the “Benz Secondhand Car” database, where there are negative functional dependency relationships between attributes. From Table 6, we can see that the proposed method has smaller average estimated error rates than the method presented in (Chen & Huang, 2003) with respect to different numbers of training instances and different numbers of testing instances. That

Acknowledgement

This work was supported in part by the National Science Council, Republic of China, under Grant NSC 95-2221-E-011-117-MY2.

References (19)

  • S.M. Chen et al.

    A new method to estimate null values in relational database systems based on automatic clustering techniques

    Information Sciences

    (2005)
  • L.A. Zadeh

    Fuzzy Sets, Information and Control

    (1965)
  • C. Zaniolo

    Database relations with null values

    Journal of Computer Systems and Science

    (1984)
  • S.M. Chen

    A new approach to handling fuzzy decisionmaking problems

    IEEE Transactions on Systems, Man, and Cybernetics

    (1988)
  • S.M. Chen et al.

    Estimating null values in the distributed relational databases environment

    Cybernetics and Systems

    (2000)
  • S.M. Chen et al.

    Generating fuzzy rules from relational database systems for estimating null values

    Cybernetics and Systems

    (1997)
  • S.M. Chen et al.

    Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms

    IEEE Transactions on Fuzzy Systems

    (2003)
  • S.M. Chen et al.

    A new method to generate fuzzy rules from relational database systems for estimating null values

    Cybernetics and Systems

    (2003)
  • S.M. Chen et al.

    Estimating null values in relational database systems based on genetic algorithms

    Cybernetics and Systems

    (2005)
There are more references available in the full text version of this article.

Cited by (15)

  • Fuzzy functional dependencies and linguistic interpretations employed in knowledge discovery tasks from relational databases

    2020, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    It implies that we have to adjust the relation into the third normal form by creating a new table. Thirdly, FDs could be converted into the if-then rules to support decision making (Chen and Huang, 2008; Hudec et al., 2014a). Thus, revealed FDs could be used in two main ways: adjusting database structure (designers are not aware of all dependencies during the design phase) and providing information for decision making.

  • Jointly optimizing microgrid configuration and energy consumption scheduling of smart homes

    2019, Swarm and Evolutionary Computation
    Citation Excerpt :

    The optimum design of microgrid systems is a hot topic and there is a rich literature dedicated to this topic. Genetic algorithm (GA) that imitates the genetic process of biological organisms, is an effective optimization method to provide solutions to intricate real world scenarios, even microgrid configuration [16,17]. Senjyu et al. [18] configure a generating system in isolated island consisting of diesel generators, wind turbine generators, PV system and batteries.

  • A Sequential Linear Programming algorithm for economic optimization of Hybrid Renewable Energy Systems

    2019, Journal of Process Control
    Citation Excerpt :

    Various optimization techniques for HRES optimization have been reported in literature. The most common ones are genetic algorithm (GA) [21–23,5], simulated annealing (SA) [24], and particle swarm optimization (PSO) [25–27]. There are also possible promising techniques for future use in HRES sizing, such as ant colony optimization (ACO) [28] or artificial immune system (AIS) algorithm [29].

  • Application of Artificial Intelligence Methods for Hybrid Energy System Optimization

    2016, Renewable and Sustainable Energy Reviews
    Citation Excerpt :

    A list of this software for the design of an HES is presented in Table 2. One of the optimization methods operates in terms of the genetic process for biological mechanisms and is called GAs, which have the ability to present a problem-solving method for difficult real-world problems [47,48]. Holland first represented the concept of GAs [49], and afterward, it was widely utilized in many applications, case studies, and information mining.

  • Optimum design of hybrid renewable energy systems: Overview of different approaches

    2012, Renewable and Sustainable Energy Reviews
    Citation Excerpt :

    For a detailed literature survey specifically on commercially available software tools for the performance evaluation of hybrid renewable energy systems, the readers are addressed to Ref. [78]. GA is an optimization method based on the genetic process of biological organisms [79,80]. By mimicking this process, GA has capability to provide solutions to complex real world problems.

View all citing articles on Scopus
View full text