A logical framework for large file information handling

doi:10.1016/0020-0255(75)90003-1

Abstract

As the size and efficiency of computers increases, it is clear that they will be used for extremely large file information storage and retrieval banks, in particular for non-scientific documents and data. Such an information bank might be, for example, the legal code for a large country or even the legal codes of all countries. Another example might be individual medical histories for a large population base, possibly the whole world. Storage and retrieval algorithms for such large files present special problems. It is no longer feasible, for instance, for the computer to consider every item in the memory in order to retrieve a designated subset. The problem, then, is how to classify the information, that is, divide it up into disjoint categories so that retrieval is feasible, efficient, and accurate. The present paper presents a general approach or logical framework for handling such problems. The basic observation is that classification really amounts to introducing an equivalence relation on the universal set of all the items in the file. The problem is how to define these equivalence relations in a natural manner, that is, in a manner which reflects something of the structure of the information and which thus allows us to store and retrieve the information in a structured fashion. This classification procedure is not conceived as being an algorithm but rather a mathematical procedure on which various algorithms can be based. This procedure can be described as follows: First a very broad, initial classification is made. For each such classification, there is a standard or canonical form. The canonical form involves some finite number of parameters. Syntactically these parameters are just terms of various sorts, but our approach is to consider rather the universe of values of each parameter. This is the semantic universe associated with the given parameter. These semantic universes are seen to have a very natural structure when they are considered in a certain way as subcollections of a type-theoretic universe based on some concrete set (such as the set of people, the set of married couples, etc). This structure allows us to define the desired equivalence in a natural way on a given semantic universe. Taking the conjunction of the parameters occurring in a given canonical form we thus get a classification for the sentences having that canonical form. Since this obtains for each canonical form, we thus have a classification for the whole information bank. Concrete examples are given to illustrate the applicability of this approach. The approach is, in fact, a mathematical generalization of a concrete case already treated by this method.

Article preview

Information Sciences

Abstract

References (4)

An information retrieval system for the Baha'i Writings

Techn. Rep.

Foundations of Mathematics

Cited by (0)