Skip to main content

Discovering Relevant Features for Effective Query Formulation

  • Conference paper
Multidisciplinary Information Retrieval (IRFC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7356))

Included in the following conference series:

Abstract

The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, exracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bendersky, M., Metzler, D., Croft, W.: Learning concept importance using a weighted dependence model. In: 3rd ACM WSDM International Conf. on Web Search and Data Mining, pp. 31–40 (2010)

    Google Scholar 

  2. Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: ACM SIGIR 17th International Conf., pp. 292–300 (1994)

    Google Scholar 

  3. Buckley, C., Voorhees, E.: Evaluating evaluation measure stability. In: 23th ACM SIGIR International Conf. on Research and Development in Information Retrieval, pp. 33–40 (2000)

    Google Scholar 

  4. Cao, G., Nie, J., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: The 31st Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 243–250. ACM (2008)

    Google Scholar 

  5. Caropreso, M., Matwin, S., Sebastiani, F.: Statistical phrases in automated text categorization. Centre National de la Recherche Scientifique, Paris, France (2000)

    Google Scholar 

  6. Carvalho, A., Moura, E., Calado, P.: Using statistical features to find phrasal terms in text collections. Journal of Information and Data Management 1(3), 583 (2010)

    Google Scholar 

  7. Croft, W.B., Cronen-Townsend, S., Larvrenko, V.: Relevance feedback and personalization: A language modeling perspective. In: DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries (2001)

    Google Scholar 

  8. Dumais, S.: Latent semantic indexing (lsi): Trec-3 report, pp. 219–219. Nist Special Publication SP (1995)

    Google Scholar 

  9. Jaillet, S., Laurent, A., Teisseire, M.: Sequential patterns for text categorization. Intelligent Data Analysis 10(3), 199–214 (2006)

    Google Scholar 

  10. Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: 14th ICML International Conf. on Machine Learning, pp. 143–151 (1997)

    Google Scholar 

  11. Lam-Adesina, A., Jones, G.: Applying summarization techniques for term selection in relevance feedback. In: The 24th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 1–9. ACM (2001)

    Google Scholar 

  12. Lee, C.J., Lin, Y.C., Chen, R.C., Cheng, P.J.: Selecting effective terms for query formulation. Information Retrieval Technology, 168–180 (2009)

    Google Scholar 

  13. Lee, K., Croft, W., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: The 31st Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 235–242. ACM (2008)

    Google Scholar 

  14. Li, Y., Tao, X., Wu, S.-T., Algarni, A.: Mining Specific and General Features in Both Positive and Negative Relevance Feedback. In: TREC 2009 Relevance Feedback Track (2009)

    Google Scholar 

  15. Liu, B., Dai, Y., Li, X., Lee, W., Yu, P.: Building text classifiers using positive and unlabeled examples. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 179–186. IEEE (2003)

    Google Scholar 

  16. Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)

    Google Scholar 

  17. Metzler, D., Croft, W.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479. ACM (2005)

    Google Scholar 

  18. Nallapati, R.: Discriminative models for information retrieval. In: The 27th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 64–71. ACM (2004)

    Google Scholar 

  19. Nanas, N., Vavalis, M.: A “Bag” or a “Window” of Words for Information Filtering? In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 182–193. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Pérez-Agüera, J., Araujo, L.: Comparing and combining methods for automatic query expansion. Advances in Natural Language Processing and Applications Research in Computing Science 33, 177–188 (2008)

    Google Scholar 

  21. Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information science 27(3), 129–146 (1976)

    Article  Google Scholar 

  22. Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at trec. Information Processing & Management 36(1), 95–108 (2000)

    Article  Google Scholar 

  23. Rose, T., Stevenson, M., Whitehead, M.: The reuters corpus volume 1-from yesterday’s news to tomorrow’s language resources. In: 3th International Conf. on Language Resources and Evaluation, pp. 29–31 (2002)

    Google Scholar 

  24. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  25. Schiaffino, S., Amandi, A.: Intelligent user profiling. Artificial Intelligence, 193–216 (2009)

    Google Scholar 

  26. Scott, S., Matwin, S.: Feature engineering for text classification. In: The 16th ICML International Conf. on Machine Learning, pp. 379–388 (1999)

    Google Scholar 

  27. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  28. Soboroff, I., Robertson, S.: Building a filtering test collection for trec 2002. In: The 26th Annual International ACM SIGIR Conf. on Research and Development in Informaion Retrieval, p. 250. ACM (2003)

    Google Scholar 

  29. Tesar, R., Strnad, V., Jezek, K., Poesio, M.: Extending the single words-based document model: a comparison of bigrams and 2-itemsets. In: Proceedings of the 2006 ACM Symposium on Document Engineering, pp. 138–146. ACM (2006)

    Google Scholar 

  30. Van Rijsbergen, C., Harper, D., Porter, M.: The selection of good search terms. Information Processing & Management 17(2), 77–91 (1981)

    Article  Google Scholar 

  31. Wu, S., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: 6th IEEE ICDM International Conf. on Data Mining, pp. 1157–1161 (2006)

    Google Scholar 

  32. Wu, S., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern-taxonomy extraction for web mining. In: 3th IEEE/WIC/ACM WI International Conf. on Web Intelligence, pp. 242–248 (2004)

    Google Scholar 

  33. Yanagimoto, H., Omatu, S.: Information filtering using a probabilistic model. Artificial Life and Robotics 10(1), 41–44 (2006)

    Article  Google Scholar 

  34. Zhong, N., Li, Y., Wu, S.: Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering (99), 1–1 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pipanmaekaporn, L., Li, Y. (2012). Discovering Relevant Features for Effective Query Formulation. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31274-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31273-1

  • Online ISBN: 978-3-642-31274-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics