Direct selection of keywords for the KWIC index

https://doi.org/10.1016/0020-0271(69)90016-3Get rights and content

Abstract

The original Luhn's method of KWIC index was modified for the processing of the bibliographical journal Index Radiohygienicus as to the keyword selection procedure. The keywords are not selected by the computer processing the text against a stoplist of non-significant words but they are directly marked in the input data.

In this paper the difficulties connected with the application of the original Luhn's method are shown on 11 stoplist variants as applied for the processing of Chemical Titles and an analysis of the main advantages of the direct selection is presented; these are as follows: (a) cutting down of the computing time (by approximately 15%); (b) enabling the elimination of all really non-significant words from the index; though these words might occur but rarely (therefore it would not be economical to include them in the stoplist), they would, in their sum, constitute a non-negligible portion of the index (approximately 10%); (c) making it possible to decide whether the same word should be considered significant or not with regard to the actual context (this results in a further reduction of the useless extent of the index). The direct selection of keywords in 100 titles does not take more than 30 min of human effort.

References (8)

  • J. Helbich

    Application of the KWIC-index in Editing Index Radiohygienicus

  • H.P. Luhn

    Automatic Creation of Literature Abstracts

    IBM J. Res. Develop.

    (1958)
  • H.P. Luhn

    Keyword-in-context index for technical literature (KWIC-index)

    Am. Docum.

    (1960)
  • M.E. Stevens

    Automatic indexing: A state-of-the-art report

    Nat. Bur. Stand., Monograph 91

    (1965)
There are more references available in the full text version of this article.

Cited by (1)

Presented before the Anglo-Czechoslovak Documentation Symposium, Ciba Foundation, London, May 20–26, 1967.

View full text