Skip to main content

On the Long-Tail Entities in News

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

Abstract

Long-tail entities represent unique challenges for state-of-the-art entity linking systems since they are under-represented in general knowledge bases. This paper studies long-tail entities in news corpora. We conduct experiments on a large news collection of one million articles, where we devise an approach for measuring the volume of such entities in news and we uncover insights on the challenges associated with linking these entities to general knowledge bases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/dbpedia-spotlight/.

  2. 2.

    https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service.

  3. 3.

    http://mappings.dbpedia.org/server/ontology/classes/.

References

  1. Frank, J.R., Kleiman-Weiner, M., Roberts, D.A., Voorhees, E., Soboroff, I.: TREC KBA overview. In: Proceedings of TREC (2014)

    Google Scholar 

  2. Martinez, M., Kruschwitz, U., Kazai, G., Hopfgartner, F., Corney, D., Campos, R., Albakour, D.: Report on the 1st international workshop on recent trends in news information retrieval (NewsIR16). SIGIR Forum 50(1), 58–67 (2016)

    Article  Google Scholar 

  3. Reinanda, R., Meij, E., de Rijke, M.: Document filtering for long-tail entities. In: Proceedings of CIKM (2016)

    Google Scholar 

  4. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)

    Google Scholar 

  5. Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of CIKM2010 (2010)

    Google Scholar 

  6. van Erp, M., Mendes, P., Paulheim, H., Ilievski, F., Plu, J., Rizzo, G., Waitelonis, J.: Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: Proceedings of ELRA (2016)

    Google Scholar 

  7. Lin, T., Etzioni, O.: No noun phrase left behind: detecting and typing unlinkable entities. In: Proceedings of EMNLP (2012)

    Google Scholar 

  8. Farid, M.H., Ilyas, I.F., Whang, S.E., Yu, C.: LONLIES: estimating property values for long tail entities. In: Proceedings of SIGIR 2016, 1125–1128 (2016)

    Google Scholar 

  9. Corney, D., Albakour, D., Martinez, M., Moussa, S.: What do a million news articles look like? In: Proceedings of ECIR NewsIR workshop (2016)

    Google Scholar 

  10. Fetahu, B., Anand, A., Anand, A.: How much is wikipedia lagging behind news? In: Proceedings of the ACM Web Science Conference (2015)

    Google Scholar 

  11. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of ACL (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dyaa Albakour .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Esquivel, J., Albakour, D., Martinez, M., Corney, D., Moussa, S. (2017). On the Long-Tail Entities in News. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56608-5_67

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56607-8

  • Online ISBN: 978-3-319-56608-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics