Construction of query concepts based on feature clustering of documents

Chang, Youjin; Kim, Minkoo; Raghavan, Vijay V.

doi:10.1007/s10791-006-0837-9

Construction of query concepts based on feature clustering of documents

Published: June 2006

Volume 9, pages 231–248, (2006)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Construction of query concepts based on feature clustering of documents

Download PDF

Youjin Chang¹,
Minkoo Kim² &
Vijay V. Raghavan³

187 Accesses
3 Altmetric
Explore all metrics

Abstract

In Information Retrieval, since it is hard to identify users’ information needs, many approaches have been tried to solve this problem by expanding initial queries and reweighting the terms in the expanded queries using users’ relevance judgments. Although relevance feedback is most effective when relevance information about retrieved documents is provided by users, it is not always available. Another solution is to use correlated terms for query expansion. The main problem with this approach is how to construct the term-term correlations that can be used effectively to improve retrieval performance. In this study, we try to construct query concepts that denote users’ information needs from a document space, rather than to reformulate initial queries using the term correlations and/or users’ relevance feedback. To form query concepts, we extract features from each document, and then cluster the features into primitive concepts that are then used to form query concepts. Experiments are performed on the Associated Press (AP) dataset taken from the TREC collection. The experimental evaluation shows that our proposed framework called QCM (Query Concept Method) outperforms baseline probabilistic retrieval model on TREC retrieval.

References

Baeza-Yates R and Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, pp. 131, 308
Bodner R and Song F (1996) Knowledge-based approaches to query expansion in information retrieval. In McCalla G (Ed.), Advances in Artificial Intelligence, Springer, New York, pp. 146–158
Google Scholar
Bookman L, Houston A, Kuhns RJ, Martin P, Green S, and Woods W (2000) Linguistic knowledge can improve information retrieval. In: Proceedings of the sixth conference on Applied natural language processing. Morgan Kaufmann Publishers Inc., Seattle, Washington, USA, pp. 262–267
Google Scholar
Chang C and Hsu C (1998) Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval. In: Proceedings of the seventh international conference on World Wide Web 7, Elsevier Science Publishers B. V., Brisbane, Australia, pp. 151–173
Google Scholar
Chang Y, Choi I, Choi J, Kim M, and Raghavan VV (2002) Conceptual Retrieval Based on Feature Clustering of Documents, Workshop on Mathematical/Formal Methods in Information Retrieval at the 25th Annual International ACM SIGIR Conference on Research and Development in IR, in Tampere, Finland, August 15, pp. 89–104
Chang Y, Kim M, and Ounis I (2004) Construction of query concepts in a document space based on data mining techniques. In: Proceedings of the 6th International Conference On Flexible Query Answering Systems (FQAS, 2004), Lecture Notes in Artificial Intelligence, Lyon, France, June 24–26, pp. 137–149
Cormen TH, Leiserson CE, Rivest RL, and Stein C (2001) Introduction to algorithm. Second Edition. MIT Press, McGraw-Hill, New York, NY
Google Scholar
Deerwester S, Furnas G, Landauer T, and Harshman R (1990) Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6):391–407
Article Google Scholar
Edmundson HP (1969) New Methods in Automatic Abstracting. Journal of the ACM 16(2):264–285
Article MATH Google Scholar
Fellbaum C (1998) WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, Mass; London
MATH Google Scholar
Frakes WB and Baeza-Yates R (1992) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJ
Google Scholar
Gersho A and Gray R (1992) Vector quantization and signal compression. Kluwer Academic Publishers, Dordrecht, Netherlands
MATH Google Scholar
Han C, Fujii H, and Croft WB (1994) Automatic Query Expansion for Japanese Text Retrieval. UMass Technical Report
Harman D (1995) Overview of the Third Text REtrieval Conference. In: Proceedings of Third Text REtrieval Conference, pp. 1–19
Hawking D, Thistlewaite PB, and Harman D (1999) Scaling Up the TREC Collection. Information Retrieval 1(1–2):115–137
Article Google Scholar
Kim M, Alsaffar AH, Deogun JS, and Raghavan VV (2000) On Modeling of Concept Based Retrieval in Generalized Vector Spaces. International Symposium on Methodologies for Intelligent Systems, pp. 453–462
Kim M, Lu F, and Raghavan VV (2000) Automatic Construction of Rule-based Trees for Conceptual Retrieval. In: Proceedings of SPIRE2000, A Coruna, Spain, IEEE Computer Society Press, pp. 153–161
Google Scholar
Klink S (2001) Query reformulation with collaborative concept-based expansion. In: Proceedings of the First International Workshop on Web Document Analysis (WDA2001), Presentation I: Content Extraction and Web Mining. Seattle, WA, USA, pp. 19–22
Koenemann J (1996) Supporting interactive information retrieval through relevance feedback. SIGCHI: ACM Special Interest Group on Computer-Human Interaction. ACM Press, New York, NY, USA, pp. 49–50
Google Scholar
Lam-Adesina AM and Jones FJG (2001) Applying summarization techniques for term selection in relevance feedback. In: Proceedings of the 24th Annual International ACM SIGIR Conference. ACM press, New Orleans, Louisiana, USA, pp. 1–9
Google Scholar
Leuski A (2001) Evaluating Document Clustering for Interactive Information Retrieval. In: Proceedings of 10th International conference on Information and Knowledge Management (CIKM’01), ACM Press, Atlanta, Georgia, USA, pp. 33–40
Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM journal of research & development 2(2):159–165
Article MathSciNet Google Scholar
Mannila H (2002) Global and local methods in data mining: basic techniques and open problems. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming (ICALP 2002), Springer-Verlag, Malaga, Spain, pp. 57–68
Google Scholar
McCune BP, Tong RM, Dean JS, and Shapiro DG (1985) RUBRIC: A System for Rule-Based Information Retrieval. IEEE Transaction on Software Engineering 11(9):939–945
Google Scholar
Nakata K, Voss A, Juhnke M, and Kreifelts T (1998) Collaborative concept extraction from documents. In: Proceedings of the 2nd International Conference on Practical Aspects of Knowledge management (PAKM 98), Basel, Switzerland, pp. 29–30
Pelleg D and Moore A (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000), Morgan Kaufmann, Stanford, CA, USA, pp. 727–734
Google Scholar
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Google Scholar
Qiu Y and Frei HP (1993) Concept based query expansion. In: Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval. ACM Press, Pittsburgh, Pennsylvania, USA, pp. 160–169
Google Scholar
Quaresma P and Rodrigues IP (2000) Automatic Classification and Intelligent Clustering for WWWeb Information Retrieval Systems. The Journal of Information, Law and Technology (JILT). http://elj.warwick.ac.uk/jilt/00-2/quaresma.html (visited April 7th, 2004)
Robertson S (1990) On term selection for query expansion. Journal of Documentation 46:359–364
Article Google Scholar
Robertson S, Walker S, Jones S, Hancock-Beaulieu M, and Gatford M (1994) Okapi at TREC3. In: Proceedings of the overview of the Third Text REtrieval Conference, pp. 109–125
Rocchio JJ (1971) Relevance feedback in information retrieval in the SMART system. Prentice Hall, Englewood Cliffs, NJ, pp. 313–323
Google Scholar
Salton G and Buckley C (1990) Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41(4):288–297
Article Google Scholar
Sparck Jones K and Tait JI (1984) Automatic search term variant generation. Journal of Documentation 40:50–66
Google Scholar
Tombros A and Sanderson M (1998) Advantages of Query Biased Summaries in Information Retrieval. In: Proceedings of Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne, Australia, pp. 2–10
van Rijsbergen CJ (1979) INFORMATION RETRIEVAL: 2nd Edition. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html (visited April 7th, 2004)
Willett P (1988) Recent trends in hierarchic document clustering: a critical review. Information Processing and Management: An International Journal 24(5):577–597
Article Google Scholar
Wong W and Fu A (2000) Incremental document clustering for web page classification. International Conference on Information Society in the 21st century: emerging technologies and new challenges (IS2000), Fukushima, Japan, 2000. pp. 5–8
Xu J and Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press, Zurich, Switzerland, pp. 4–11
Google Scholar
Zhang T, Ramakrishnan R, and Livny M (1996) BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Record 25(2):103–114
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information and Communication, Ajou University, Suwon, Korea
Youjin Chang
Department of Information and Computer Engineering, Ajou University, Suwon, Korea
Minkoo Kim
The Center for Advanced Computer Studies, University of Louisiana, Lafayette, USA
Vijay V. Raghavan

Authors

Youjin Chang
View author publications
You can also search for this author inPubMed Google Scholar
Minkoo Kim
View author publications
You can also search for this author inPubMed Google Scholar
Vijay V. Raghavan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Youjin Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, Y., Kim, M. & Raghavan, V.V. Construction of query concepts based on feature clustering of documents. Inf Retrieval 9, 231–248 (2006). https://doi.org/10.1007/s10791-006-0837-9

Download citation

Received: 10 April 2004
Revised: 03 March 2005
Accepted: 07 March 2005
Issue Date: June 2006
DOI: https://doi.org/10.1007/s10791-006-0837-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Construction of query concepts based on feature clustering of documents

Abstract

Article PDF

Similar content being viewed by others

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Integrating LDA with Clustering Technique for Relevance Feature Selection

A New Automatic Query Expansion Approach Using Term Selection and Document Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Construction of query concepts based on feature clustering of documents

Abstract

Article PDF

Similar content being viewed by others

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Integrating LDA with Clustering Technique for Relevance Feature Selection

A New Automatic Query Expansion Approach Using Term Selection and Document Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords