skip to main content
10.1145/1390334.1390394acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections

Retrieval and feedback models for blog feed search

Published: 20 July 2008 Publication History


Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC 2007 Blog Distillation task[12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudo-relevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22% and 14% improvement in MAP over the unexpanded query for our baseline and federated algorithms respectively.


J. Arguello, J. L. Elsas, J. Callan, and J. G. Carbonell. Document representation and query expansion models for blog recommendation. In Proc. of the 2nd Intl. Conf. on Weblogs and Social Media (ICWSM), 2008.
S. Brin and L. Page. The anatomy of a large-scale hyper-textual web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998.
J. Callan. Distributed information retrieval. In W. Croft, editor, Advances in Information Retrieval, pages 127--150. Kluwer Academic Publishers, 2000.
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 terabyte track. In Proc. of the 2004 Text Retrieval Conf., 2004.
C. Clarke, F. Scholer, and I. Soboroff. Overview of the TREC 2005 terabyte track. In Proc. of the 2005 Text Retrieval Conf., 2005.
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proc. of the 29th Annl. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 154--161, 2006.
J. Elsas, J. Arguello, J. Callan, and J. Carbonell. Retrieval and feedback models for blog distillation. In Proc. of the 2007 Text Retrieval Conf., 2007.
D. Hannah, C. Macdonald, J. Peng, B. He, and I. Ounis. University of Glasgow at TREC 2007: Experiments with blog and enterprise tracks with terrier. In Proc. of the 2007 Text Retrieval Conf., 2007.
P. Kolari, A. Java, and T. Finin. Characterizing the splogosphere. In Proc. of the 3rd Annl. Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wide Web Conf., 2006.
V. Lavrenko and W. B. Croft. Relevance based language models. In Proc. of the 24th Annl. Intl. ACM SIGIR Conf. on Research and Development in Information retrieval, pages 120--127, 2001.
C. Macdonal and I. Ounis. The TREC blog06 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, Department of Computing Science, U. of Glasgow, 2006.
C. Macdonald, I. Ounis, and I. Soboroff. Overview of the TREC 2007 blog track. In Proc. of the 2007 Text Retrieval Conf., 2007.
D. Metzler and B. W. Croft. A markov random field model for term dependencies. In Proc. of the 28th Annl. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 472--479, 2005.
D. Metzler, T. Strohman, H. Turtle, and W. Croft. Indri at TREC 2004: Terabyte track. In Proc. of the 2004 Text Retrieval Conf., 2004.
D. Metzler, T. Strohman, Y. Zhou, and W. Croft. Indri at TREC 2005: Terabyte track. In Proc. of the 2005 Text Retrieval Conf., 2005.
J. Seo and W. B. Croft. Umass at trec 2007 blog distillation task. In Proc. of the 2007 Text Retrieval Conf., 2007.
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In Proc. of the 26th Annl. Intl. ACM SIGIR Conf. on Research and Development in Informaion Retrieval, 2003.
I. Soboroff, A. de Vries, and N. Craswell. Overview of the trec 2006 enterprise track. In Proc. of the 2006 Text Retrieval Conf., 2006.
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179--214, 2004.

Cited By

View all
  • (2020)Topic-aware Web Service Representation LearningACM Transactions on the Web10.1145/338604114:2(1-23)Online publication date: 11-Apr-2020
  • (2019)Bayesian Model Selection Approach to Multiple Change-Points Detection with Non-Local Prior DistributionsACM Transactions on Knowledge Discovery from Data10.1145/334080413:5(1-17)Online publication date: 24-Sep-2019
  • (2017)Time sensitive blog retrieval using temporal properties of queriesJournal of Information Science10.1177/016555151561858943:1(103-121)Online publication date: 1-Feb-2017
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008


Request permissions for this article.

Check for updates

Author Tags

  1. blog retrieval
  2. federated search
  3. query expansion


  • Research-article



Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics


Cited By

View all
  • (2020)Topic-aware Web Service Representation LearningACM Transactions on the Web10.1145/338604114:2(1-23)Online publication date: 11-Apr-2020
  • (2019)Bayesian Model Selection Approach to Multiple Change-Points Detection with Non-Local Prior DistributionsACM Transactions on Knowledge Discovery from Data10.1145/334080413:5(1-17)Online publication date: 24-Sep-2019
  • (2017)Time sensitive blog retrieval using temporal properties of queriesJournal of Information Science10.1177/016555151561858943:1(103-121)Online publication date: 1-Feb-2017
  • (2017)Design Patterns for Fusion-Based Object RetrievalAdvances in Information Retrieval10.1007/978-3-319-56608-5_66(684-690)Online publication date: 8-Apr-2017
  • (2016)Analyzing user-generated online content for drug discovery: development and use of MedCrawlerBioinformatics10.1093/bioinformatics/btw782(btw782)Online publication date: 22-Dec-2016
  • (2016)Efficient distributed selective searchInformation Retrieval Journal10.1007/s10791-016-9290-620:3(221-252)Online publication date: 25-Nov-2016
  • (2016)Detecting Vital Documents Using Negative Relevance Feedback in Distributed Realtime Computation FrameworkComputational Linguistics10.1007/978-981-10-0515-2_14(193-208)Online publication date: 20-Feb-2016
  • (2015)Finding a needle in the blogosphereInformation Fusion10.1016/j.inffus.2014.09.00123:C(58-68)Online publication date: 1-May-2015
  • (2015)Latent entity space: a novel retrieval approach for entity-bearing queriesInformation Retrieval Journal10.1007/s10791-015-9267-x18:6(473-503)Online publication date: 11-Sep-2015
  • (2015)Combining lexical and statistical translation evidence for cross-language information retrievalJournal of the Association for Information Science and Technology10.1002/asi.2315366:1(23-39)Online publication date: 1-Jan-2015
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media