Statistical treatment of the information content of a database

https://doi.org/10.1016/0306-4379(86)90029-3Get rights and content

Abstract

The statistical analysis of the database contents is usually performed by using software packages, that require the numerical coding of database attributes. Unfortunately, statistics computed from attribute values ciphered might be meaningless (this is the case when the attribute values are intrinsically not ordered in any way).

We present an analytical data model, where the information content of a database relation is represented by a contingency table and analysed using the methods of the multivariate information theory. From these quantitative tools of analysis may benefit first the database user interested in a statistical view of the database contents and inclined to put queries like “to what extent are attributes related (in a given database state)?” or like “how does one attribute depend on the others (in a given database state)?”. A second application, here only sketched, is the measurement of the record selectivities for queries, in view of an evaluation of the physical database organization performance.

References (39)

  • T.H. Merret et al.

    Distributions model of relations

  • S. Christodoulakis

    A multivariate statistical model for data base performance evaluation

  • D. Bates et al.

    A framework for research in database management for statistical analysis

  • J. Schloerer

    Information loss in partitioned statistical databases

    The Computer Journal

    (1983)
  • E. Lefons et al.

    An analytical approach to statistical databases

  • L.I. Schiff

    Quantum Mechanics

    (1968)
  • S. Christodoulakis

    Implications of certain assumptions in database performance evaluation

    ACM Database Systems

    (1984)
  • S. Buhler et al.
  • W.J. Dixon et al.

    BMDP-79 Biomedical Computer Programs P-Series

    (1979)
  • Cited by (34)

    • On the similarity metric and the distance metric

      2009, Theoretical Computer Science
    • Why is the snowflake schema a good data warehouse design?

      2003, Information Systems
      Citation Excerpt :

      Although it is tempting to extend SSNF to cyclic database schemas, it is an open problem to what extent our results will generalise in such cases. We now utilise the information-theoretic treatment of relational databases developed in [17–20]. This approach is important since it allows us to accommodate for probabilistic information in the data warehouse, which is fundamental in decision making [21].

    View all citing articles on Scopus
    View full text