Discovering Relevant Features for Effective Query Formulation

Pipanmaekaporn, Luepol; Li, Yuefeng

doi:10.1007/978-3-642-31274-8_12

Luepol Pipanmaekaporn¹⁸ &
Yuefeng Li¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7356))

Included in the following conference series:

Information Retrieval Facility Conference

995 Accesses
1 Citations

Abstract

The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, exracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bendersky, M., Metzler, D., Croft, W.: Learning concept importance using a weighted dependence model. In: 3rd ACM WSDM International Conf. on Web Search and Data Mining, pp. 31–40 (2010)
Google Scholar
Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: ACM SIGIR 17th International Conf., pp. 292–300 (1994)
Google Scholar
Buckley, C., Voorhees, E.: Evaluating evaluation measure stability. In: 23th ACM SIGIR International Conf. on Research and Development in Information Retrieval, pp. 33–40 (2000)
Google Scholar
Cao, G., Nie, J., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: The 31st Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 243–250. ACM (2008)
Google Scholar
Caropreso, M., Matwin, S., Sebastiani, F.: Statistical phrases in automated text categorization. Centre National de la Recherche Scientifique, Paris, France (2000)
Google Scholar
Carvalho, A., Moura, E., Calado, P.: Using statistical features to find phrasal terms in text collections. Journal of Information and Data Management 1(3), 583 (2010)
Google Scholar
Croft, W.B., Cronen-Townsend, S., Larvrenko, V.: Relevance feedback and personalization: A language modeling perspective. In: DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries (2001)
Google Scholar
Dumais, S.: Latent semantic indexing (lsi): Trec-3 report, pp. 219–219. Nist Special Publication SP (1995)
Google Scholar
Jaillet, S., Laurent, A., Teisseire, M.: Sequential patterns for text categorization. Intelligent Data Analysis 10(3), 199–214 (2006)
Google Scholar
Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: 14th ICML International Conf. on Machine Learning, pp. 143–151 (1997)
Google Scholar
Lam-Adesina, A., Jones, G.: Applying summarization techniques for term selection in relevance feedback. In: The 24th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 1–9. ACM (2001)
Google Scholar
Lee, C.J., Lin, Y.C., Chen, R.C., Cheng, P.J.: Selecting effective terms for query formulation. Information Retrieval Technology, 168–180 (2009)
Google Scholar
Lee, K., Croft, W., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: The 31st Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 235–242. ACM (2008)
Google Scholar
Li, Y., Tao, X., Wu, S.-T., Algarni, A.: Mining Specific and General Features in Both Positive and Negative Relevance Feedback. In: TREC 2009 Relevance Feedback Track (2009)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W., Yu, P.: Building text classifiers using positive and unlabeled examples. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 179–186. IEEE (2003)
Google Scholar
Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)
Google Scholar
Metzler, D., Croft, W.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479. ACM (2005)
Google Scholar
Nallapati, R.: Discriminative models for information retrieval. In: The 27th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 64–71. ACM (2004)
Google Scholar
Nanas, N., Vavalis, M.: A “Bag” or a “Window” of Words for Information Filtering? In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 182–193. Springer, Heidelberg (2008)
Chapter Google Scholar
Pérez-Agüera, J., Araujo, L.: Comparing and combining methods for automatic query expansion. Advances in Natural Language Processing and Applications Research in Computing Science 33, 177–188 (2008)
Google Scholar
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information science 27(3), 129–146 (1976)
Article Google Scholar
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at trec. Information Processing & Management 36(1), 95–108 (2000)
Article Google Scholar
Rose, T., Stevenson, M., Whitehead, M.: The reuters corpus volume 1-from yesterday’s news to tomorrow’s language resources. In: 3th International Conf. on Language Resources and Evaluation, pp. 29–31 (2002)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Schiaffino, S., Amandi, A.: Intelligent user profiling. Artificial Intelligence, 193–216 (2009)
Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. In: The 16th ICML International Conf. on Machine Learning, pp. 379–388 (1999)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Soboroff, I., Robertson, S.: Building a filtering test collection for trec 2002. In: The 26th Annual International ACM SIGIR Conf. on Research and Development in Informaion Retrieval, p. 250. ACM (2003)
Google Scholar
Tesar, R., Strnad, V., Jezek, K., Poesio, M.: Extending the single words-based document model: a comparison of bigrams and 2-itemsets. In: Proceedings of the 2006 ACM Symposium on Document Engineering, pp. 138–146. ACM (2006)
Google Scholar
Van Rijsbergen, C., Harper, D., Porter, M.: The selection of good search terms. Information Processing & Management 17(2), 77–91 (1981)
Article Google Scholar
Wu, S., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: 6th IEEE ICDM International Conf. on Data Mining, pp. 1157–1161 (2006)
Google Scholar
Wu, S., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern-taxonomy extraction for web mining. In: 3th IEEE/WIC/ACM WI International Conf. on Web Intelligence, pp. 242–248 (2004)
Google Scholar
Yanagimoto, H., Omatu, S.: Information filtering using a probabilistic model. Artificial Life and Robotics 10(1), 41–44 (2006)
Article Google Scholar
Zhong, N., Li, Y., Wu, S.: Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering (99), 1–1 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Australia
Luepol Pipanmaekaporn & Yuefeng Li

Authors

Luepol Pipanmaekaporn
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
Michail Salampasis
Royal School of Library and Information Science, 2300, Copenhagen, Denmark
Birger Larsen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pipanmaekaporn, L., Li, Y. (2012). Discovering Relevant Features for Effective Query Formulation. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-31274-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31273-1
Online ISBN: 978-3-642-31274-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics