Event-based summarization using a centrality-as-relevance model

Marujo, Luís; Ribeiro, Ricardo; Gershman, Anatole; de Matos, David Martins; Neto, João P.; Carbonell, Jaime

doi:10.1007/s10115-016-0966-4

Event-based summarization using a centrality-as-relevance model

Regular Paper
Published: 21 June 2016

Volume 50, pages 945–968, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Luís Marujo¹,
Ricardo Ribeiro^2,3,
Anatole Gershman⁴,
David Martins de Matos^2,5,
João P. Neto^2,5 &
…
Jaime Carbonell⁴

700 Accesses
8 Citations
Explore all metrics

Abstract

Event detection is a fundamental information extraction task, which has been explored largely in the context of question answering, topic detection and tracking, knowledge base population, news recommendation, and automatic summarization. In this article, we explore an event detection framework to improve a key phrase-guided centrality-based summarization model. Event detection is based on the fuzzy fingerprint method, which is able to detect all types of events in the ACE 2005 Multilingual Corpus. Our base summarization approach is a two-stage method that starts by extracting a collection of key phrases that will be used to help the centrality-as-relevance retrieval model. We explored three different ways to integrate event information, achieving state-of-the-art results in text and speech corpora: (1) filtering of nonevents, (2) event fingerprints as features, and (3) combination of filtering of nonevents and event fingerprints as features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated identification of media bias in news articles: an interdisciplinary literature review

Article Open access 16 November 2018

Felix Hamborg, Karsten Donnay & Bela Gipp

Information extraction from electronic medical documents: state of the art and future research directions

Article 08 November 2022

Mohamed Yassine Landolsi, Lobna Hlaoua & Lotfi Ben Romdhane

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

Belal Abdullah Hezam Murshed, Suresha Mallappa, … Hudhaifa Mohammed Abdulwahab

Notes

References

Abbasi A, Chen H (2008) Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2):7:1–7:29
Article Google Scholar
Allan J, Carbonell J, Doddington G, Yamron J, Yang Y, Archibald B, Scudder M (1998) Topic detection and tracking pilot study final report. In: Proceedings of the broadcast news transcription and understanding workshop
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York pp 1247–1250
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 335–336
Carbonell J, Yang Y, Lafferty J, Brown RD, Pierce T, Liu X (1998) CMU approach to TDT: segmentation, detection, and tracking. In: Proceedings of the DARPA broadcast news conference
Chakrabarti D, Punera K (2011) Event summarization using tweets. In: Proceedings of the 5th international conference on weblogs and social media (ICWSM)
Daniel N, Radev D, Allison T (2003) Sub-event based multi-document summarization. In: Proceedings of the HLT-NAACL 03 on text summarization workshop-Vol 5, HLT-NAACL-DUC ’03. Association for Computational Linguistics, Stroudsburg, pp 9–16
Duan Y, Chen Z, Wei F, Zhou M, Shum H (2012) Twitter topic summarization by ranking tweets using social influence and content quality, In: COLING 2012, 24th international conference on computational linguistics, proceedings of the conference: technical papers, 8–15 December 2012, pp 763–780
Erkan G, Radev DR (2004) LexRank: graph-based centrality as salience in text summarization. J Artif Intell Res 22:457–479
Google Scholar
Feng A, Allan J (2007) Finding and linking incidents in news. In: CIKM ’07: proceedings of the 16th ACM conference on information and knowledge management. ACM, New York, pp 821–830
Filatova E, Hatzivassiloglou V (2004) Event-based extractive summarization. In: Proceedings of ACL workshop on summarization, pp 104–111
Glavaš G, Šnajder J (2014) Event graphs for information retrieval and multi-document summarization. Expert Syst Appl 41(15):6904–6916
Article Google Scholar
Homem N, Carvalho JP (2011) Authorship identification and author fuzzy “fingerprints”. In: Proceedings of 2011 annual meeting of the North American fuzzy information processing society (NAFIPS). IEEE pp 1–6
Hong Y, Zhang J, Ma B, Yao J, Zhou G, Zhu Q (2011) Using cross-entity inference to improve event extraction. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies—vol 1, HLT ’11. Association for Computational Linguistics, Stroudsburg, pp 1127–1136
Huang X, Wan X, Xiao J (2014) Comparative news summarization using concept-based optimization. Knowl Inf Syst 38(3):691–716
Article Google Scholar
Ji H, Grishman R (2011) Knowledge base population: successful approaches and challenges. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies—vol 1’, HLT ’11. Association for Computational Linguistics, Stroudsburg, pp 1148–1158
Li W, Wu M, Lu Q, Xu W, Yuan C (2006) Extractive summarization using inter- and intra-event relevance. In: ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference, Sydney, Australia, 17–21 July 2006. Association for Computational Linguistics, Stroudsburg, pp 369–376
Liao S, Grishman R (2010) Using document level cross-event inference to improve event extraction. In: Proceedings of the 48th annual meeting of the association for computational linguistics, ACL ’10. Association for Computational Linguistics, Stroudsburg, pp 789–797
Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop. Association for Computational Linguistics, pp 74–81
Litvak M, Last M (2008) Graph-based keyword extraction for single-document summarization. In: Proceedings of the workshop on MMIES’, MMIES ’08. Association for Computational Linguistics, Stroudsburg pp 17–24
Liu M, Li W, Wu M, Lu Q (2007) Extractive summarization based on event term clustering. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions’, ACL ’07. Association for Computational Linguistics, Stroudsburg, pp 185–188
Marujo L, Carvalho JP, Gershman A, Carbonell J, Neto JP, de Matos DM (2015) Textual event detection using fuzzy fingerprints. In: Angelov P, Atanassov K, Doukovska L, Hadjiski M, Jotsov V, Kacprzyk J, Kasabov N, Sotirov S, Szmidt E, Zadrożny S (eds) Intelligent systems’ 2014, vol 322 of advances in intelligent systems and computing. Springer, Berlin, pp 825–836
Google Scholar
Marujo L, Gershman A, Carbonell J, Frederking R, Neto JP (2012) Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. In: Proceedings of the 8th language resources and evaluation conference (LREC 2012), ELRA
Marujo L, Portelo J, Martins de Matos D, Neto JP, Gershman A, Carbonell J, Trancoso I, Raj B (2014) Privacy-preserving important passage retrieval. In: Proceedings of the 1st international workshop on privacy-preserving IR: when information retrieval meets privacy and security co-located with 37th annual international ACM SIGIR conference (SIGIR 2014). CEUR, pp 7–12
Marujo L, Viveiros M, Neto JP (2011) Keyphrase cloud generation of broadcast news. In: Proceeding of interspeech 2011: 12th annual conference of the international speech communication association, ISCA
Maskey SR (2008) Automatic broadcast news speech summarization. Ph.D. thesis, Columbia University
Maskey SR, Hirschberg J (2005) Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. In: Proceedings of the 9th EUROSPEECH—INTERSPEECH 2005
Mei J-P, Chen L (2012) Sumcr: a new subtopic-based extractive approach for text summarization. Knowl Inf Syst 31(3):527–545
Article Google Scholar
Nallapati R, Feng A, Peng F, Allan J (2004) Event threading within news topics. In: CIKM ’04: Proceedings of the 13th ACM international conference on information and knowledge management. ACM, New York, pp 446–453
Naughton M, Stokes N, Carthy J (2008) Investigating statistical techniques for sentence-level event classification. In: Proceedings of the 22nd international conference on computational linguistics—vol 1, COLING ’08. Association for Computational Linguistics, Stroudsburg, pp 617–624
Nichols J, Mahmud J, Drews C (2012) Summarizing sporting events using twitter. In: Proceedings of the 2012 ACM international conference on intelligent user interfaces, IUI ’12. ACM, New York, pp 189–198
Olariu A (2014) Efficient online summarization of microblogging streams. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, vol 2: short papers. Association for Computational Linguistics, Gothenburg, pp 236–240
Ribeiro R, de Matos DM (2011) Revisiting centrality-as-relevance: support sets and similarity as geometric proximity. J Artif Intell Res 42:275–308
MATH Google Scholar
Ribeiro R, Marujo L, Martins de Matos D, Neto JP, Gershman A, Carbonell J (2013) Self reinforcement for important passage retrieval. In: SIGIR ’13: proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 845–848
Riedhammer K, Favre B, Hakkani-Tür D (2010) Long story short—global unsupervised models for keyphrase based meeting summarization. Speech Commun 52:801–815
Article Google Scholar
Rosa H, Batista F, Carvalho JP (2014) Twitter topic fuzzy fingerprints. In: 2014 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 776–783
Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208
Article MathSciNet MATH Google Scholar
Saggion H, Szasz S (2012) The CONCISUS corpus of event summaries. In: Proceedings of the 8th language resources and evaluation conference (LREC 2012), ELRA
Sharifi B, Hutton M-A, Kalita J (2010) Summarizing microblogs automatically. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 685–688
Shou L, Wang Z, Chen K, Chen G (2013) Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’13. ACM, New York, pp 533–542
Sipos R, Swaminathan A, Shivaswamy P, Joachims T (2012) Temporal corpus summarization using submodular word coverage. In: CIKM ’12: proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, pp 754–763
Takamura H, Yokono H, Okumura M (2011) Summarizing a document stream. In: Proceedings of the 33rd European conference on advances in information retrieval, ECIR’11. Springer, Berlin, pp 177–188
Tucker RI, Spärck Jones K (2005) Between shallow and deep: an experiment in automatic summarising. Technical report 632, University of Cambridge
Uysal I, Croft WB (2011) User oriented tweet ranking: a filtering approach to microblogs. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11. ACM, New York, pp 2261–2264
Uzêda V, Pardo T, Nunes M (2010) A comprehensive comparative evaluation of RST-based summarization methods. ACM Trans Speech Lang Process (TSLP) 6(4):1–20
Article Google Scholar
Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond SumBasic: task-focused summarization and lexical expansion. Inf Process Manag 43:1606–1618
Article Google Scholar
Walker C, Strassel S, Medero J (2006) ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia
Google Scholar
Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007). Association for Computational Linguistics Prague, pp 552–559
Yang Y, Carbonell JG, Brown RD, Pierce T, Archibald BT, Liu X (1999) Learning approaches for detecting and tracking news events. IEEE Intell Syst 14(4):32–43
Article Google Scholar
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: SIGIR’ 99: proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 42–49
Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: SIGIR ’98: proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 28–36
Zechner K, Waibel A (2000) Minimizing word error rate in textual summaries of spoken language. In: Proceedings of the 1st North American chapter of the association for computational linguistics conference, Morgan Kaufmann, pp 186–193
Zha H (2002) Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: SIGIR ’02: proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York pp 113–120

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported by national funds through FCT under Project UID/CEC/50021/2013, the Carnegie Mellon Portugal Program, and Grant SFRH/BD/33769/2009.

Author information

Authors and Affiliations

Feedzai Research, Lisbon, Portugal
Luís Marujo
INESC-ID Lisboa, Lisbon, Portugal
Ricardo Ribeiro, David Martins de Matos & João P. Neto
Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal
Ricardo Ribeiro
School of Computer Science, CMU, Pittsburgh, USA
Anatole Gershman & Jaime Carbonell
Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
David Martins de Matos & João P. Neto

Authors

Luís Marujo
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Anatole Gershman
View author publications
You can also search for this author in PubMed Google Scholar
David Martins de Matos
View author publications
You can also search for this author in PubMed Google Scholar
João P. Neto
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Carbonell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luís Marujo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marujo, L., Ribeiro, R., Gershman, A. et al. Event-based summarization using a centrality-as-relevance model. Knowl Inf Syst 50, 945–968 (2017). https://doi.org/10.1007/s10115-016-0966-4

Download citation

Received: 30 December 2014
Revised: 29 November 2015
Accepted: 04 June 2016
Published: 21 June 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10115-016-0966-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Event-based summarization using a centrality-as-relevance model

Abstract

Access this article

Similar content being viewed by others

Automated identification of media bias in news articles: an interdisciplinary literature review

Information extraction from electronic medical documents: state of the art and future research directions

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Event-based summarization using a centrality-as-relevance model

Abstract

Access this article

Similar content being viewed by others

Automated identification of media bias in news articles: an interdisciplinary literature review

Information extraction from electronic medical documents: state of the art and future research directions

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation