Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy

Gelbukh, Alexander F.

doi:10.1007/3-540-44469-6_49

Alexander F. Gelbukh⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1873))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1793 Accesses

Abstract

A full-text information retrieval system has to deal with various phenomena of string equivalence: ignore case matching, morphological inflection, derivation, synonymy, and hyponymy or hyperonymy. Technically, this can be handled either at the time of indexing by reducing equivalent strings to a common form or at the time of query processing by enriching the query with the whole set of the equivalent forms. We argue for that the latter way allows for greater flexibility and easier maintenance, while being more affordable than it is usually considered. Our proposal consists in enriching the query only with those forms that really appear in the document base. Our experiments with a thesaurus-based information retrieval system showed only insignificant increase of the query size on average with a 200-megabyte document base, even with highly inflective Spanish language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

COVER: a linguistic resource combining common sense and lexicographic information

Article 21 June 2018

AS-Index: A Structure for String Search Using n-Grams and Algebraic Signatures

Article 08 January 2016

References

Aho, Alfred V. Algorithms for finding patterns in strings. In J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, chapter 5, pp. 254–300. Elsevier Science Publishers B. V., 1990.
Google Scholar
Cassidy P. An Investigation of the Semantic Relations in the Roget’s Thesaurus: Preliminary Results. In: A. Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, IPN-UNAM, Mexico, to appear. See also Proc. of CICLing-2000, February 2000, CIC-IPN, Mexico City, ISBN 970-18-4206-5.
Google Scholar
Gelbukh, A. A data structure for prefix search under access locality requirements and its application to spelling correction. Proc. of MICAI-2000: Mexican International Conference on Artificial Intelligence, Acapulco, Mexico, 2000.
Google Scholar
Gelbukh, A., G. Sidorov, and A. Guzm’an-Arenas. Use of a Weighted Topic Hierarchy for Document Classification, Matousek et al., TSD-99: Text, Speech, Dialogue. Lecture Notes in Artificial Intelligence N 1692, Springer, 1999.
Google Scholar
Gusfield, Dan. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997; ISBN: 0521585198.
Google Scholar
Guzm’an-Arenas, Adolfo. Finding the main themes in a Spanish document, Journal Expert Systems with Applications, Vol. 14, No. 1/2. Jan/Feb 1998, pp. 139–148.
Article Google Scholar
Fellbaum, Ch. (ed.) WordNet as Electronic Lexical Database. MIT Press, 1998.
Google Scholar
Frakes, W., and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.
Google Scholar
Hausser, Ronald. Three principled methods of automatic word form recognition. Proc. of VEXTAL: Venecia per il Tratamento Automatico delle Lingue. Venice, Italy, Sept. 1999.
Google Scholar
Koskenniemi, Kimmo. Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. University of Helsinki Publications, N 1l, 1983.
Google Scholar
Kowalski, Gerald. Information Retrieval Systems Theory and Implementation, Kluwer Academic Publishers, 1997.
Google Scholar
Lenat, D. B. and R. V. Guha. Building Large Knowledge Based Systems. Reading, Massachusetts: Addison Wesley, 1990. See also more recent publications on CYC project, http://www.cyc.com.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Av. Juan Dios Bátiz s/n esq. Mendizábal, Col. Zacatenco, C.P. 07738, D.F., Mexico
Alexander F. Gelbukh

Authors

Alexander F. Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing and Mathematical Sciences, Maritime Greenwich University Campus, 30 Park Row, London, SE10 9LS, UK
Mohamed Ibrahim
University of Linz, FAW, Altenbergerstr. 69, 4040, Linz, Austria
Josef Küng
Middlesex University, Bounds Green, London, UK
Norman Revell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gelbukh, A.F. (2000). Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy. In: Ibrahim, M., Küng, J., Revell, N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44469-6_49

Download citation

DOI: https://doi.org/10.1007/3-540-44469-6_49
Published: 28 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67978-3
Online ISBN: 978-3-540-44469-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

COVER: a linguistic resource combining common sense and lexicographic information

AS-Index: A Structure for String Search Using n-Grams and Algebraic Signatures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Lazy Query Enrichment: A Method for Indexing Large Specialized Document Bases with Morphology and Concept Hierarchy

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

COVER: a linguistic resource combining common sense and lexicographic information

AS-Index: A Structure for String Search Using n-Grams and Algebraic Signatures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation