Fuzzy set theoretical approach to document retrieval

https://doi.org/10.1016/0306-4573(79)90031-1Get rights and content

Abstract

The aim of a document retrieval system is to issue documents which contain the information needed by a given user of an information system. The process of retrieving documents in response to a given query is carried out by means of the search patterns of these documents and the query. It is thus clear that the quality of this process, i.e. the pertinence of the information system response to the information need of a given user depends on the degree of accuracy in which document and query contents are represented by their search patterns. It seems obvious that the weighting of descriptors entering document search patterns improves the quality of the document retrieval process.

A mathematical apparatus which takes into consideration, in a natural manner, the fact that the grades of importance of the descriptors in document search patterns are of the continuum type, that is an apparatus adequate to the description of a retrieval system of documents indexed by weighted descriptors is—among known mathematical methods—the theory of fuzzy sets, formulated by L.A. Zadeh.

It is the aim of this paper to present a new method of document retrieval based on the fundamental operations of the fuzzy set theory. We start by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed.

The basic advantage of the use of the fuzzy set theory for document retrieval system description is that it takes into consideration, in a simple way, the differentiation of the importance of descriptors in document search patterns and the differentiation of the formal relevance grades of particular documents of an information system to a given query. Documents of the highest grades (in the given information system) of formal relevance to the given query may be retrieved by means of the application of simple operations of the fuzzy set theory.

References (28)

  • M. Dabrowski

    A general model of distribution of objects in information retrieval systems

    Inform. Systems

    (1975)
  • D. Hsiao et al.

    A formal system for information retrieval from files

    Commun. ACM

    (1970)
  • K. Laus et al.

    A model of information retrieval process for hierarchical set of descriptors

    Inform. Stor. Ret.

    (1974)
  • W. Lipski et al.

    O sformalizowanej teorii systemów informacyjnych

    Podstawy Sterowania

    (1976)
  • W. Marek et al.

    Information storage and retrieval systems: mathematical foundations

    Theore. Comp. Sci.

    (1976)
  • G. Salton

    Automatic Information Organization and Retrieval

    (1968)
  • G. Salton

    Dynamic Information and Library Processing

    (1975)
  • W.M. Turski

    On a model of information retrieval system based on thesaurus

    Inform. Stor. Retr.

    (1971)
  • C.J. Van Rijsbergen

    Information Retrieval

    (1975)
  • E. Wong et al.

    Canonical structure in attribute based file organization

    Commun. ACM

    (1971)
  • L.A. Zadeh

    Fuzzy sets

    Inform. Control

    (1965)
  • L.A. Zadeh

    Similarity relations and fuzzy orderings

    Inform. Sci.

    (1971)
  • L.A. Zadeh

    Outline of a new approach to the analysis of complex systems and decision processes

    IEEE Trans. Systems, Man and Cybernetics

    (1973)
  • Cited by (0)

    A preliminary version of this paper was presented at the Sixth Cranfield International Conference on Mechanised Information Storage and Retrieval Systems (26–29 July 1977).

    View full text