On-topic Cover Stories from News Archives

Schulte, Christian; Taneva, Bilyana; Weikum, Gerhard

doi:10.1007/978-3-319-16354-3_4

Christian Schulte¹⁹,
Bilyana Taneva²⁰ &
Gerhard Weikum¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9022))

Included in the following conference series:

European Conference on Information Retrieval

3932 Accesses

Abstract

While Web or newspaper archives store large amounts of articles, they also contain a lot of near-duplicate information. Examples include articles about the same event published by multiple news agencies or articles about evolving events that lead to copies of paragraphs to provide background information. To support journalists, who attempt to read all information on a given topic at once, we propose an approach that, given a topic and a text collection, extracts a set of articles with broad coverage of the topic and minimum amount of duplicates.

We start by extracting articles related to the input topic and detecting duplicate paragraphs. We keep only one instance from each group of duplicates by using a weighted quadratic optimization problem. It finds the best position for all paragraphs, such that some articles consist mainly of distinct paragraphs and others consist mainly of duplicates. Finally, we present to the reader the articles with more distinct paragraphs. Our experiments show the high precision and recall of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16)

NewsDeps: Visualizing the Origin of Information in News Articles

References

Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)
Google Scholar
Carbonell, J., et al.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)
Google Scholar
Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)
Article Google Scholar
Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)
Google Scholar
Nenkova, A., McKeown, K.: Automatic summarization. Foundations and Trends in Information Retrieval 5(2-3), 103–233 (2011)
Article Google Scholar
Parker, R., et al.: English Gigaword, 5th edn., Linguistic Data Consortium (2011)
Google Scholar
Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Heuristic and special case algorithms for dispersion problems. Operations Research 42(2), 299–310 (1994)
Article MATH Google Scholar
Schlaefer, N., Chu-Carroll, J., Nyberg, E., Fan, J., Zadrozny, W., Ferrucci, D.: Statistical source expansion for question answering. In: CIKM, pp. 345–354 (2011)
Google Scholar
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: EACL, pp. 781–789 (2009)
Google Scholar
Taneva, B., Weikum, G.: Gem-based entity-knowledge maintenance. In: CIKM, pp. 149–158 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute for Informatics, Saarbrücken, Germany
Christian Schulte & Gerhard Weikum
CNRS-LIG, Grenoble, France
Bilyana Taneva

Authors

Christian Schulte
View author publications
You can also search for this author in PubMed Google Scholar
Bilyana Taneva
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vienna University of Technology, Institute of Software Technology and Interactive Systems, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Allan Hanbury
Lumi, Semion Ltd., 111 Charterhouse Street, EC1M 6AW, London, UK
Gabriella Kazai
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Andreas Rauber
Universität Duisburg-Essen, Lotharstraße 65, 47057, Duisburg, Germany
Norbert Fuhr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schulte, C., Taneva, B., Weikum, G. (2015). On-topic Cover Stories from News Archives. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-16354-3_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On-topic Cover Stories from News Archives

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16)

NewsDeps: Visualizing the Origin of Information in News Articles

NewsDeps: Visualizing the Origin of Information in News Articles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On-topic Cover Stories from News Archives

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16)

NewsDeps: Visualizing the Origin of Information in News Articles

NewsDeps: Visualizing the Origin of Information in News Articles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation