Author:
Roger Bradford
Affiliation:
Agilex Technologies Inc, United States
Keyword(s):
Latent Semantic Indexing, LSI, Phrase-based Retrieval, Phrase Indexing.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Analytics
;
Data Analytics
;
Data Engineering
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
Latent semantic indexing (LSI) is a well-established technique for information retrieval and data mining. The technique has been incorporated into a wide variety of practical applications. In these applications, LSI provides a number of valuable capabilities for information search, categorization, clustering, and discovery. However, there are some limitations that are encountered in using the technique. One such limitation is that the classical implementation of LSI does not provide a flexible mechanism for dealing with phrases. In both information retrieval and data mining applications, phrases can have significant value in specifying user information needs. In the classical implementation of LSI, the only way that a phrase can be used in a query is if that phrase has been identified a priori and treated as a unit during the process of creating the LSI index. This requirement has greatly hindered the use of phrases in LSI applications. This paper presents a method for dealin
g with phrases in LSI-based information systems on an ad hoc basis – at query time, without requiring any prior knowledge of the phrases of interest. The approach is fast enough to be used during real-time query execution. This new capability can enhance use of LSI in both information retrieval and knowledge discovery applications.
(More)