Abstract
We develop a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and string levels. Concepts and relationships among them are represented at the conceptual level. The linguistic level gives natural language expressions for concepts. Each expression has one or more matching patterns at the string level. The models specify the matching of the expression in database indices built in varying ways. The data model supports a declarative concept-based query expansion and formulation tool, the ExpansionTool, for heterogeneous IR system environments. Conceptual expansion is implemented by a novel intelligent operator for traversing transitive relationships among cyclic concept networks. The number of expansion links followed, their types, and weights can be used to control expansion. A sample empirical experiment illustrating the use of the ExpansionTool in IR experiments is presented.
Article PDF
Similar content being viewed by others
References
Abramson H and Dahl V (1989) Logic Grammars. Springer-Verlag, Heidelberg.
Aho AV and Ullman JD (1992) Foundations of Computer Science. Computer Science Press, New York.
Alkula R (2000) Merkkijonoista suomen kielen sanoiksi. Doctoral Thesis, University of Tampere, Acta Electronica Universitatis Tamperensis, 51. URL: http://acta.uta.fi/pdf/951-44-4886-3.pdf.
Allan J, Callan J, Croft B, Ballesteros L, Byrd D, Swan R and Xu J (1998) INQUERY does battle with TREC-6. In: Voorhees EM and Harman DK, Eds., Proceedings of the Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240, pp. 169-206.
Beaulieu MM, Gatford M, Huang X, Robertson SE, Walker S and Williams P (1997) Okapi at TREC-5. In: Voorhees EM and Harman DK, Eds., Information Technology: The Fifth Text Retrieval Conference (TREC-5). National Institute of Standards and Technology, Gaithersburg, MD, pp. 143-166.
Belkin N, Cool C, Croft WB and Callan JP (1993) The effect of multiple query representations of information retrieval performance. In: Korfhage R, Rasmussen EM and Willett P, Eds., Proceedings of the 16th International Conference on Research and Development in Information Retrieval. ACM, New York, NY, pp. 339-346.
Belkin N, Kantor P, Fox EA and Shaw JA (1995) Combining evidence of multiple query representations for information retrieval. Information Processing and Management, 31: 431-448.
Buckley C, Singhal A, Mandar M and Salton G (1996) New retrieval approaches using SMART: TREC 4. In: VoorheesEMand Harman DK, Eds., Proceedings of the 4th Text REtrieval Conference (TREC-4). NIST special publication 500-236, pp. 25-48.
Chang CL and Walker A (1986) A Prolog programming interface with SQL/DS. In: Kerschberg L, Ed., Expert Database Systems: Proceedings from the 1st International Workshop. Benjamin-Cummings, Menlo Park, CA, pp. 233-246.
Crestani F, Sanderson M, Theophylactou M and Lalmas M (1997) Short queries, natural language and spoken document retrieval: Experiments at Glasgow University. In: Harman DK and Voorhees E, Eds., Proceedings of the Sixth Text Retrieval Conference (TREC-6). NIST, Washington DC, pp. 667-686.
Croft WB (1986) User-specified domain knowledge for document retrieval. In: Rabitti F, Ed., Proceedings of the 9th International Conference on Research and Development in Information Retrieval. Pisa, Italy.
Croft WB and Das R (1990) Experiments with query acquisition and use in document retrieval systems. In: Vidick J-L, Ed., Proceedings of the 13th International Conference on Research and Development in Information Retrieval. ACM, Bruxelles, pp. 349-368.
Efthimiadis EN (1996) Query expansion. In: Williams ME, Ed., Annual Review of Information Science and Technology, Vol. 31. Information Today, Medford, NJ, pp. 121-187.
Harman DK (1992) Relevance feedback revisited. In: Belkin N, Ingwersen P and Mark Pejtersen A, Eds., Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, pp. 1-10.
Hull DA (1997) Using structured queries for disambiguation in cross-language information retrieval. In: AAAI Spring Symposium on Cross-Language Text and Speech Retrieval ElectronicWorking Notes [online], Stanford University, March 24-26, 1997.
ISO (1986) ISO International Standard 2788. Documentation-Guidelines for the establishment and development of monolingual thesauri. International Organization for Standardization.
ISO (1993) ISO International Standard 8777:1993(E). Information and documentation-Commands for interactive text searching. International Organization for Standardization.
Järvelin K, Kekäläinen J and Niemi T (2001) EXPANSIONTOOL: Formal definition of concept-based query expansion and construction. Report DIS-2001-1, University of Tampere, Department of Information Studies, Finland. Available at http://www.info.uta.fi/julkaisut/pdf/et3.pdf.
Järvelin K and Niemi T (1993) An entity-based approach to query processing in relational databases. Part I: Entity type representation. Data & Knowledge Engineering, 10: 117-150.
Järvelin K, Kristensen J, Niemi T, Sormunen E and Keskustalo H (1996) A deductive data model for query expansion. In: Frei H-P, Harman D, Schäuble P and Wilkinson R, Eds., Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, pp. 235-249.
Jing Y and Croft WB (1994) An association thesaurus for information retrieval. In: Proceedings of RIAO, '94, pp. 146-160.
Jones S (1993) A thesaurus data model for an intelligent retrieval system. Journal of Information Science, 19: 167-178.
Jones S, Gatford M, Robertson S, Hancock-Beaulieu M and Secker J (1995) Interactive thesaurus navigation: Intelligence rules ok? Journal of the American Society for Information Science, 46: 52-59.
Kekäläinen J (1999) The effects of query complexity, expansion and structure on retrieval performance in probabilistic text retrieval. Doctoral Thesis, University of Tampere, Acta Universitatis Tamperensis 678.
Kekäläinen J and Järvelin K (1998) The impact of query structure and query expansion on retrieval performance. In: Croft WB, Moffat A, van Rijsbergen CJ, Wilkinson R and Zobel J, Eds., Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, pp. 130-137.
Kekäläinen J and Järvelin K (2000) The co-effects of query structure and expansion on retrieval performance in probabilistic text retrieval. Information Retrieval, 1: 329-344.
Kristensen J (1993) Expanding end-user's query statements for free text searching with a search-aid thesaurus. Information Processing and Management, 29: 733-745.
Niemi T and Järvelin K (1992) Operation-oriented query language approach for recursive queries-Part 1. Functional definition. Information Systems, 17: 49-75.
Paice CD (1991) A thesaural model of information retrieval. Information Processing and Management, 27: 433-447.
Pereira FCN and Warren DHD (1980) Definite Clause Grammars for language analysis-Asurvey of the formalism and a comparison with Augmented Transition Networks. Artificial Intelligence, 13: 231-278.
Pirkola A (2001) Morphological typology of languages for IR. Journal of Documentation, 57(3): 330-348.
Rajashekar TB and Croft WB (1995) Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46: 272-283.
Sintichakis M and Constantopoulos P (1997) A method for monolingual thesauri merging. In: Belkin NJ, Narasimhalu AD andWillett P, Eds., Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, pp. 129-138.
Turtle HR (1990) Inference networks for document retrieval. Doctoral Thesis, COINS Technical Report 90-92, University of Massachusetts. Computer and information Science Department.
Turtle HR and Croft WB (1991) Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9: 187-222.
Ullman JD (1988) Principles of Database and Knowledge Base Systems, Vol. I. Computer Science Press, Rockville, MD.
UMLS (1994) UMLS Knowledge Sources, 5th Experimental edition. National Library of Medicine, Bethesda, MD.
Voorhees E (1994) Query expansion using lexical-semantic relations. In: Croft WB and van Rijsbergen CJ, Eds., Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, pp. 61-69.
Xu J and Croft WB (1996) Query expansion using local and global document analysis. In: Frei H-P, Harman D, Schäuble P and Wilkinson R, Eds., Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, pp. 4-11.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Järvelin, K., Kekäläinen, J. & Niemi, T. ExpansionTool: Concept-Based Query Expansion and Construction. Information Retrieval 4, 231–255 (2001). https://doi.org/10.1023/A:1011998222190
Issue Date:
DOI: https://doi.org/10.1023/A:1011998222190