Abstract
The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, exracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bendersky, M., Metzler, D., Croft, W.: Learning concept importance using a weighted dependence model. In: 3rd ACM WSDM International Conf. on Web Search and Data Mining, pp. 31–40 (2010)
Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: ACM SIGIR 17th International Conf., pp. 292–300 (1994)
Buckley, C., Voorhees, E.: Evaluating evaluation measure stability. In: 23th ACM SIGIR International Conf. on Research and Development in Information Retrieval, pp. 33–40 (2000)
Cao, G., Nie, J., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: The 31st Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 243–250. ACM (2008)
Caropreso, M., Matwin, S., Sebastiani, F.: Statistical phrases in automated text categorization. Centre National de la Recherche Scientifique, Paris, France (2000)
Carvalho, A., Moura, E., Calado, P.: Using statistical features to find phrasal terms in text collections. Journal of Information and Data Management 1(3), 583 (2010)
Croft, W.B., Cronen-Townsend, S., Larvrenko, V.: Relevance feedback and personalization: A language modeling perspective. In: DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries (2001)
Dumais, S.: Latent semantic indexing (lsi): Trec-3 report, pp. 219–219. Nist Special Publication SP (1995)
Jaillet, S., Laurent, A., Teisseire, M.: Sequential patterns for text categorization. Intelligent Data Analysis 10(3), 199–214 (2006)
Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In: 14th ICML International Conf. on Machine Learning, pp. 143–151 (1997)
Lam-Adesina, A., Jones, G.: Applying summarization techniques for term selection in relevance feedback. In: The 24th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 1–9. ACM (2001)
Lee, C.J., Lin, Y.C., Chen, R.C., Cheng, P.J.: Selecting effective terms for query formulation. Information Retrieval Technology, 168–180 (2009)
Lee, K., Croft, W., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: The 31st Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 235–242. ACM (2008)
Li, Y., Tao, X., Wu, S.-T., Algarni, A.: Mining Specific and General Features in Both Positive and Negative Relevance Feedback. In: TREC 2009 Relevance Feedback Track (2009)
Liu, B., Dai, Y., Li, X., Lee, W., Yu, P.: Building text classifiers using positive and unlabeled examples. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 179–186. IEEE (2003)
Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)
Metzler, D., Croft, W.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479. ACM (2005)
Nallapati, R.: Discriminative models for information retrieval. In: The 27th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 64–71. ACM (2004)
Nanas, N., Vavalis, M.: A “Bag” or a “Window” of Words for Information Filtering? In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 182–193. Springer, Heidelberg (2008)
Pérez-Agüera, J., Araujo, L.: Comparing and combining methods for automatic query expansion. Advances in Natural Language Processing and Applications Research in Computing Science 33, 177–188 (2008)
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information science 27(3), 129–146 (1976)
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at trec. Information Processing & Management 36(1), 95–108 (2000)
Rose, T., Stevenson, M., Whitehead, M.: The reuters corpus volume 1-from yesterday’s news to tomorrow’s language resources. In: 3th International Conf. on Language Resources and Evaluation, pp. 29–31 (2002)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Schiaffino, S., Amandi, A.: Intelligent user profiling. Artificial Intelligence, 193–216 (2009)
Scott, S., Matwin, S.: Feature engineering for text classification. In: The 16th ICML International Conf. on Machine Learning, pp. 379–388 (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Soboroff, I., Robertson, S.: Building a filtering test collection for trec 2002. In: The 26th Annual International ACM SIGIR Conf. on Research and Development in Informaion Retrieval, p. 250. ACM (2003)
Tesar, R., Strnad, V., Jezek, K., Poesio, M.: Extending the single words-based document model: a comparison of bigrams and 2-itemsets. In: Proceedings of the 2006 ACM Symposium on Document Engineering, pp. 138–146. ACM (2006)
Van Rijsbergen, C., Harper, D., Porter, M.: The selection of good search terms. Information Processing & Management 17(2), 77–91 (1981)
Wu, S., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: 6th IEEE ICDM International Conf. on Data Mining, pp. 1157–1161 (2006)
Wu, S., Li, Y., Xu, Y., Pham, B., Chen, P.: Automatic pattern-taxonomy extraction for web mining. In: 3th IEEE/WIC/ACM WI International Conf. on Web Intelligence, pp. 242–248 (2004)
Yanagimoto, H., Omatu, S.: Information filtering using a probabilistic model. Artificial Life and Robotics 10(1), 41–44 (2006)
Zhong, N., Li, Y., Wu, S.: Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering (99), 1–1 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pipanmaekaporn, L., Li, Y. (2012). Discovering Relevant Features for Effective Query Formulation. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-31274-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31273-1
Online ISBN: 978-3-642-31274-8
eBook Packages: Computer ScienceComputer Science (R0)