Skip to main content

Experiments with Google News for Filtering Newswire Articles

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6241))

Abstract

This paper describes an approach based on the use of Google News as a source of information in order to generate a learning corpus for an information filtering task. The INFILE (INformation FILtering Evaluation) track of the CLEF (Cross-Lingual Evaluation Forum) 2009 campaign has been used as framework. The information filtering task can be seen as a document classification task, so a supervised learning scheme has been followed. Two learning corpora have been proved: one using the text of the topics as learning data to train a classifier, and another one where training data have been generated from Google News pages, using the keywords of topics as queries. Results show that the use of Google News for generating learning data does not improve the results obtained using only topic descriptions as learning corpora.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Besançon, R., Chaudiron, S., Mostefa, D., Hamon, O., Timimi, I., Choukri, K.: Overview of CLEF 2008 INFILE Pilot Track. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 939–946. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Besançon, R., Chaudiron, S., Mostefa, D., Timimi, I., Choukri, K.: The INFILE Project: a Crosslingual Filtering Systems Evaluation Campaign. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA) (2008)

    Google Scholar 

  3. Besançon, R., Chaudiron, S., Mostefa, D., Timimi, I., Choukri, K., Laïb, M.: Overview of CLEF 2009 INFILE track. In: Peters, C., Nunzio, G.D., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) In Press. LNCS, Springer, Heidelberg (2009)

    Google Scholar 

  4. Couto, F.M., Martins, B., Silva, M.J.: Classifying biological articles using web resources. In: SAC 2004, Proceedings of the 2004 ACM symposium on Applied computing. pp. 111–115. ACM, New York (2004)

    Google Scholar 

  5. Díaz-Galiano, M.C., Perea-Ortega, J.M., Martín-Valdivia, M.T., Montejo-Ráez, A., Ureña-López, L.A.: SINAI at TRECVID 2007. In: Over, P. (ed.) Proceedings of the TRECVID 2007 Workshop (TRECVID 2007) (2007)

    Google Scholar 

  6. Gligorov, R., ten Kate, W., Aleksovski, Z., van Harmelen, F.: Using google distance to weight approximate ontology matches. In: WWW ’07: Proceedings of the 16th international conference on World Wide Web, pp. 767–776. ACM, New York (2007)

    Chapter  Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), citeseer.ist.psu.edu/joachims97text.html

    Chapter  Google Scholar 

  8. Perea-Ortega, J.M., Montejo-Ráez, A., Díaz-Galiano, M.C., Martín-Valdivia, M.T., Ureña-López, L.A.: Using an Information Retrieval System for Video Classification. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 927–930. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Montejo-Ráez, A., Perea-Ortega, J.M., Díaz-Galiano, M.C., Ureña-López, L.A. (2010). Experiments with Google News for Filtering Newswire Articles. In: Peters, C., et al. Multilingual Information Access Evaluation I. Text Retrieval Experiments. CLEF 2009. Lecture Notes in Computer Science, vol 6241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15754-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15754-7_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15753-0

  • Online ISBN: 978-3-642-15754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics