Abstract
In Information Retrieval, since it is hard to identify users’ information needs, many approaches have been tried to solve this problem by expanding initial queries and reweighting the terms in the expanded queries using users’ relevance judgments. Although relevance feedback is most effective when relevance information about retrieved documents is provided by users, it is not always available. Another solution is to use correlated terms for query expansion. The main problem with this approach is how to construct the term-term correlations that can be used effectively to improve retrieval performance. In this study, we try to construct query concepts that denote users’ information needs from a document space, rather than to reformulate initial queries using the term correlations and/or users’ relevance feedback. To form query concepts, we extract features from each document, and then cluster the features into primitive concepts that are then used to form query concepts. Experiments are performed on the Associated Press (AP) dataset taken from the TREC collection. The experimental evaluation shows that our proposed framework called QCM (Query Concept Method) outperforms baseline probabilistic retrieval model on TREC retrieval.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Baeza-Yates R and Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, pp. 131, 308
Bodner R and Song F (1996) Knowledge-based approaches to query expansion in information retrieval. In McCalla G (Ed.), Advances in Artificial Intelligence, Springer, New York, pp. 146–158
Bookman L, Houston A, Kuhns RJ, Martin P, Green S, and Woods W (2000) Linguistic knowledge can improve information retrieval. In: Proceedings of the sixth conference on Applied natural language processing. Morgan Kaufmann Publishers Inc., Seattle, Washington, USA, pp. 262–267
Chang C and Hsu C (1998) Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval. In: Proceedings of the seventh international conference on World Wide Web 7, Elsevier Science Publishers B. V., Brisbane, Australia, pp. 151–173
Chang Y, Choi I, Choi J, Kim M, and Raghavan VV (2002) Conceptual Retrieval Based on Feature Clustering of Documents, Workshop on Mathematical/Formal Methods in Information Retrieval at the 25th Annual International ACM SIGIR Conference on Research and Development in IR, in Tampere, Finland, August 15, pp. 89–104
Chang Y, Kim M, and Ounis I (2004) Construction of query concepts in a document space based on data mining techniques. In: Proceedings of the 6th International Conference On Flexible Query Answering Systems (FQAS, 2004), Lecture Notes in Artificial Intelligence, Lyon, France, June 24–26, pp. 137–149
Cormen TH, Leiserson CE, Rivest RL, and Stein C (2001) Introduction to algorithm. Second Edition. MIT Press, McGraw-Hill, New York, NY
Deerwester S, Furnas G, Landauer T, and Harshman R (1990) Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6):391–407
Edmundson HP (1969) New Methods in Automatic Abstracting. Journal of the ACM 16(2):264–285
Fellbaum C (1998) WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, Mass; London
Frakes WB and Baeza-Yates R (1992) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ
Gersho A and Gray R (1992) Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht, Netherlands
Han C, Fujii H, and Croft WB (1994) Automatic Query Expansion for Japanese Text Retrieval. UMass Technical Report
Harman D (1995) Overview of the Third Text REtrieval Conference. In: Proceedings of Third Text REtrieval Conference, pp. 1–19
Hawking D, Thistlewaite PB, and Harman D (1999) Scaling Up the TREC Collection. Information Retrieval 1(1–2):115–137
Kim M, Alsaffar AH, Deogun JS, and Raghavan VV (2000) On Modeling of Concept Based Retrieval in Generalized Vector Spaces. International Symposium on Methodologies for Intelligent Systems, pp. 453–462
Kim M, Lu F, and Raghavan VV (2000) Automatic Construction of Rule-based Trees for Conceptual Retrieval. In: Proceedings of SPIRE2000, A Coruna, Spain, IEEE Computer Society Press, pp. 153–161
Klink S (2001) Query reformulation with collaborative concept-based expansion. In: Proceedings of the First International Workshop on Web Document Analysis (WDA2001), Presentation I: Content Extraction and Web Mining. Seattle, WA, USA, pp. 19–22
Koenemann J (1996) Supporting interactive information retrieval through relevance feedback. SIGCHI: ACM Special Interest Group on Computer-Human Interaction. ACM Press, New York, NY, USA, pp. 49–50
Lam-Adesina AM and Jones FJG (2001) Applying summarization techniques for term selection in relevance feedback. In: Proceedings of the 24th Annual International ACM SIGIR Conference. ACM press, New Orleans, Louisiana, USA, pp. 1–9
Leuski A (2001) Evaluating Document Clustering for Interactive Information Retrieval. In: Proceedings of 10th International conference on Information and Knowledge Management (CIKM’01), ACM Press, Atlanta, Georgia, USA, pp. 33–40
Luhn HP (1958) The automatic creation of literature abstracts. IBM journal of research & development 2(2):159–165
Mannila H (2002) Global and local methods in data mining: basic techniques and open problems. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming (ICALP 2002), Springer-Verlag, Malaga, Spain, pp. 57–68
McCune BP, Tong RM, Dean JS, and Shapiro DG (1985) RUBRIC: A System for Rule-Based Information Retrieval. IEEE Transaction on Software Engineering 11(9):939–945
Nakata K, Voss A, Juhnke M, and Kreifelts T (1998) Collaborative concept extraction from documents. In: Proceedings of the 2nd International Conference on Practical Aspects of Knowledge management (PAKM 98), Basel, Switzerland, pp. 29–30
Pelleg D and Moore A (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000), Morgan Kaufmann, Stanford, CA, USA, pp. 727–734
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Qiu Y and Frei HP (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval. ACM Press, Pittsburgh, Pennsylvania, USA, pp. 160–169
Quaresma P and Rodrigues IP (2000) Automatic Classification and Intelligent Clustering for WWWeb Information Retrieval Systems. The Journal of Information, Law and Technology (JILT). http://elj.warwick.ac.uk/jilt/00-2/quaresma.html (visited April 7th, 2004)
Robertson S (1990) On term selection for query expansion. Journal of Documentation 46:359–364
Robertson S, Walker S, Jones S, Hancock-Beaulieu M, and Gatford M (1994) Okapi at TREC3. In: Proceedings of the overview of the Third Text REtrieval Conference, pp. 109–125
Rocchio JJ (1971) Relevance feedback in information retrieval in the SMART system. Prentice Hall, Englewood Cliffs, NJ, pp. 313–323
Salton G and Buckley C (1990) Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41(4):288–297
Sparck Jones K and Tait JI (1984) Automatic search term variant generation. Journal of Documentation 40:50–66
Tombros A and Sanderson M (1998) Advantages of Query Biased Summaries in Information Retrieval. In: Proceedings of Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne, Australia, pp. 2–10
van Rijsbergen CJ (1979) INFORMATION RETRIEVAL: 2nd Edition. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html (visited April 7th, 2004)
Willett P (1988) Recent trends in hierarchic document clustering: a critical review. Information Processing and Management: An International Journal 24(5):577–597
Wong W and Fu A (2000) Incremental document clustering for web page classification. International Conference on Information Society in the 21st century: emerging technologies and new challenges (IS2000), Fukushima, Japan, 2000. pp. 5–8
Xu J and Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press, Zurich, Switzerland, pp. 4–11
Zhang T, Ramakrishnan R, and Livny M (1996) BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Record 25(2):103–114
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chang, Y., Kim, M. & Raghavan, V.V. Construction of query concepts based on feature clustering of documents. Inf Retrieval 9, 231–248 (2006). https://doi.org/10.1007/s10791-006-0837-9
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10791-006-0837-9