Skip to main content

A Learned Approach for Ranking News in Real-Time Using the Blogosphere

  • Conference paper
String Processing and Information Retrieval (SPIRE 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7024))

Included in the following conference series:

Abstract

Newspaper websites and news aggregators rank news stories by their newsworthiness in real-time for display to the user. Recent work has shown that news stories can be ranked automatically in a retrospective manner based upon related discussion within the blogosphere. However, it is as yet undetermined whether blogs are sufficiently fresh to rank stories in real-time. In this paper, we propose a novel learning to rank framework which leverages current blog posts to rank news stories in a real-time manner. We evaluate our proposed learning framework within the context of the TREC Blog track top stories identification task. Our results show that, indeed, the blogosphere can be leveraged for the real-time ranking of news, including for unpredictable events. Our approach improves upon state-of-the-art story ranking approaches, outperforming both the best TREC 2009/2010 systems and its single best performing feature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Newspaper Association of America (NAA): Newspaper Web sites attract more than 70 million visitors in June; over one-third of all Internet users visit newspaper Web sites (2010), http://www.naa.org/PressCenter/SearchPressReleases/2009/NEWSPAPER-WEB-SITES-ATTRACT-MORE-THAN-70-MILLION-VISITORS.aspx , (accessed on January 25, 2010)

  2. Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. 25(3), 14 (2007)

    Article  Google Scholar 

  3. Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of WSDM 2010 (2010)

    Google Scholar 

  4. Lee, Y., Jung, H.y., Song, W., Lee, J.H.: Mining the blogosphere for top news stories identification. In: Proceeding of SIGIR 2010 (2010)

    Google Scholar 

  5. Leidner, J.L.: Thomson Reuters releases TRC2 news corpus through NIST (2010), http://jochenleidner.posterous.com/thomson-reuters-releases-research-collection (accessed on January 16, 2011)

  6. Lin, Y.F., Wang, J.H., Lai, L.C., Kao, H.Y.: Top stories identification from blog to news in TREC 2010 Blog track. In: Proceedings of TREC 2010 (2010)

    Google Scholar 

  7. Lioma, C., Macdonald, C., Plachouras, V., Peng, J., He, B., Ounis, I.: University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier. In: Proceedings of TREC 2006 (2006)

    Google Scholar 

  8. Liu, T.Y.: Learning to rank for Information Retrieval. Foundations and Trends® in Information Retrieval 3(3), 225–331 (2009)

    Article  Google Scholar 

  9. Macdonald, C., Ounis, I.: The TREC Blogs06 collection: Creating and analysing a blog test collection. Tech report. Univ. of Glasgow

    Google Scholar 

  10. Macdonald, C.: The Voting Model for People Search. Ph.D. thesis, Univ. of Glasgow (2009)

    Google Scholar 

  11. Macdonald, C., Ounis, I.: Learning models for ranking aggregates. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 517–529. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Macdonald, C., Soboroff, I., Ounis, I.: Overview of TREC-2009 Blog track. In: Proceedings of TREC 2009. NIST (2009)

    Google Scholar 

  13. Matheson, D.: Weblogs and the epistemology of the news: Some trends in online journalism. New Media and Society 6(4), 443–468 (2004)

    Article  Google Scholar 

  14. McCreadie, R., Macdonald, C., Ounis, I.: News article ranking: Leveraging the wisdom of bloggers. In: Proceedings of RIAO 2010 (2010)

    Google Scholar 

  15. Mejova, Y., Ha Turc, V., Foster, S., Harris, C., Arens, B., Srinivasan, P.: TREC Blog and TREC Chem: A view from the corn fields. In: Proceedings of TREC 2009 (2009)

    Google Scholar 

  16. Metzler, D.A.: Automatic feature selection in the Markov random field model for Information Retrieval. In: Proceedings of CIKM 2007 (2007)

    Google Scholar 

  17. Mishne, G., de Rijke, M.: A study of blog search. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Santos, R.L.T., Macdonald, C., Ounis, I.: Voting for related entities. In: Proceedings of RIAO 2010 (2010)

    Google Scholar 

  19. Schmid, H.: Treetagger. TC project at the Institute for Computational Linguistics of the University of Stuttgart (1994)

    Google Scholar 

  20. Sussman, M.: The state of the Blogosphere 2009 (2009), http://technorati.com/blogging/article/state-of-the-blogosphere-2009-introduction/ (accessed on May 13, 2010)

  21. Thelwall, M.: Bloggers during the London attacks: Top information sources and topics. In: Proceedings of WWW 2006 Blog Workshop (2006)

    Google Scholar 

  22. Xu, X., Liu, Y., Xu, H., Yu, X., Peng, Z., Cheng, X., Xiao, L., Nie, S.: ICTNET at Blog track TREC 2010. In: Proceedings of TREC 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McCreadie, R., Macdonald, C., Ounis, I. (2011). A Learned Approach for Ranking News in Real-Time Using the Blogosphere. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24583-1_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24582-4

  • Online ISBN: 978-3-642-24583-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics