skip to main content
research-article

Preserve Integrity in Realtime Event Summarization

Published: 03 May 2021 Publication History

Abstract

Online text streams such as Twitter are the major information source for users when they are looking for ongoing events. Realtime event summarization aims to generate and update coherent and concise summaries to describe the state of a given event. Due to the enormous volume of continuously coming texts, realtime event summarization has become the de facto tool to facilitate information acquisition. However, there exists a challenging yet unexplored issue in current text summarization techniques: how to preserve the integrity, i.e., the accuracy and consistency of summaries during the update process. The issue is critical since online text stream is dynamic and conflicting information could spread during the event period. For example, conflicting numbers of death and injuries might be reported after an earthquake. Such misleading information should not appear in the earthquake summary at any timestamp. In this article, we present a novel realtime event summarization framework called IAEA (i.e., Integrity-Aware Extractive-Abstractive realtime event summarization). Our key idea is to integrate an inconsistency detection module into a unified extractive–abstractive framework. In each update, important new tweets are first extracted in an extractive module, and the extraction is refined by explicitly detecting inconsistency between new tweets and previous summaries. The extractive module is able to capture the sentence-level attention which is later used by an abstractive module to obtain the word-level attention. Finally, the word-level attention is leveraged to rephrase words. We conduct comprehensive experiments on real-world datasets. To reduce efforts required for building sufficient training data, we also provide automatic labeling steps of which the effectiveness has been empirically verified. Through experiments, we demonstrate that IAEA can generate better summaries with consistent information than state-of-the-art approaches.

References

[1]
Murat Demirbas, Murat Ali Bayir, Cuneyt Gurcan Akcora, Yavuz Selim Yilmaz, and Hakan Ferhatosmanoglu. 2010. Crowd-sourced sensing and collaboration using Twitter. In WOWMOM. IEEE Computer Society, 1–9.
[2]
M.-Dyaa Albakour, Craig Macdonald, and Iadh Ounis. 2013. Identifying local events by using microblogs as social sensors. In OAIR. ACM, 173–180.
[3]
Merrin Fabre. 2015. Use of social media for internal communication: A case study in a government organisation. In Social Media for Government Services. Springer, 51–74.
[4]
Gina Ciancio and Amanda Dennett. 2015. Social media for government services: A case study of human services. In Social Media for Government Services. Springer, 25–49.
[5]
Na Yeon Lee, Yonghwan Kim, and Yoonmo Sang. 2017. How do journalists leverage Twitter? Expressive and consumptive use of Twitter. Soc Sci J. 54, 2 (2017), 139–147.
[6]
Mehreen Gillani, Muhammad U. Ilyas, Saad Saleh, Jalal S. Alowibdi, Naif R. Aljohani, and Fahad S. Alotaibi. 2017. Post summarization of microblogs of sporting events. In WWW (Companion Volume). ACM, 59–68.
[7]
Chen Lin, Chun Lin, Jingxuan Li, Dingding Wang, Yang Chen, and Tao Li. 2012. Generating event storylines from microblogs. In CIKM. ACM, 175–184.
[8]
Zhi Liu, Yan Huang, and Joshua R. Trampier. 2016. LEDS: Local event discovery and summarization from tweets. In SIGSPATIAL/GIS. ACM, 53:1–53:4.
[9]
Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, and Saptarshi Ghosh. 2015. Extracting situational information from microblogs during disaster events: A classification-summarization approach. In CIKM. ACM, 583–592.
[10]
Lidan Shou, Zhenhua Wang, Ke Chen, and Gang Chen. 2013. Sumblr: Continuous summarization of evolving tweet streams. In SIGIR. ACM, 533–542.
[11]
Arkaitz Zubiaga, Damiano Spina, Enrique Amigó, and Julio Gonzalo. 2012. Towards real-time summarization of scheduled events from Twitter streams. In HT. ACM, 319–320.
[12]
Lingting Lin, Chen Lin, and Yongxuan Lai. 2018. Realtime event summarization from tweets with inconsistency detection. In ER. Springer, 555–570.
[13]
Qunhui Wu, Jianghua Lv, and Shilong Ma. 2015. Continuous summarization for microblog streams based on clustering. In ICONIP. Springer, 371–379.
[14]
Wan Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, and Min Sun. 2018. A unified model for extractive and abstractive summarization using inconsistency loss. In ACL. ACL, 132–141.
[15]
Koustav Rudra, Siddhartha Banerjee, Niloy Ganguly, Pawan Goyal, Muhammad Imran, and Prasenjit Mitra. 2016. Summarizing situational tweets in crisis scenario. In HT. ACM, 137–147.
[16]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In ACL. Association for Computational Linguistics, 1073–1083.
[17]
Yong Zhang, Meng Joo Er, Rui Zhao, and Mahardhika Pratama. 2017. Multiview convolutional neural networks for multidocument extractive summarization. IEEE Trans. Cybernetics 47, 10 (2017), 3230–3242.
[18]
Yong Zhang, Meng Joo Er, and Mahardhika Pratama. 2016. Extractive document summarization based on convolutional neural networks. In IECON. IEEE, 918–922.
[19]
Zhongqing Wang and Yue Zhang. 2017. A neural model for joint event detection and summarization. In IJCAI. 4158–4164.
[20]
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI. AAAI Press, 3075–3081.
[21]
Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, and Dragomir R. Radev. 2017. Graph-based neural multi-document summarization. In CoNLL. Association for Computational Linguistics, 452–462.
[22]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In EMNLP. The Association for Computational Linguistics, 379–389.
[23]
Ramesh Nallapati, Bowen Zhou, Cícero Nogueira dos Santos, Çaglar Gülçehre, and Bing Xiang. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. In CoNLL. ACL, 280–290.
[24]
Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Sujian Li, and Houfeng Wang. 2012. Entity-centric topic-oriented opinion summarization in Twitter. In KDD. ACM, 379–387.
[25]
Hiroya Takamura, Hikaru Yokono, and Manabu Okumura. 2011. Summarizing a document stream. In ECIR. Springer, 177–188.
[26]
Koustav Rudra, Niloy Ganguly, Pawan Goyal, and Saptarshi Ghosh. 2018. Extracting and summarizing situational information from the Twitter social media during disasters. ACM Trans. Web 12, 3 (2018), 17:1–17:35.
[27]
A. P. Naik and S. Bojewar. 2017. Tweet analytics and tweet summarization using graph mining. In ICECA. 17–21.
[28]
Ilkin Huseynli and M. Elif Karsligil. 2017. Determination and summarization of important tweets after natural disasters. In SIU. IEEE, 1–4.
[29]
Muhammad Asif Hossain Khan, Danushka Bollegala, Guangwen Liu, and Kaoru Sezaki. 2013. Multi-tweet summarization of real-time events. In SocialCom. IEEE Computer Society, 128–133.
[30]
Soumi Dutta, Asit Kumar Das, Abhishek Bhattacharya, Gourav Dutta, Komal K. Parikh, Atin Das, and Dipsa Ganguly. 2019. Community detection based tweet summarization. In Emerging Technologies in Data Mining and Information Security. Springer, Singapore, 797–808.
[31]
Jin Yao Chin, Sourav S. Bhowmick, and Adam Jatowt. 2017. TOTEM: Personal tweets summarization on mobile devices. In SIGIR. ACM, 1305–1308.
[32]
John Hannon, Mike Bennett, and Barry Smyth. 2010. Recommending Twitter users to follow using content and collaborative filtering approaches. In RecSys. ACM, 199–206.
[33]
Miles Efron and Gene Golovchinsky. 2011. Estimation methods for ranking recent information. In SIGIR. ACM, 495–504.
[34]
Liuqing Li, Jack Geissinger, William A. Ingram, and Edward A. Fox. 2020. Teaching natural language processing through big data text summarization with problem-based learning. DIM 4, 1 (2020), 18–43.
[35]
Meng Xu, Xin Zhang, and Lixiang Guo. 2019. Jointly detecting and extracting social events from Twitter using gated bilstm-crf. IEEE Access 7 (2019), 148462–148471.
[36]
Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, and William W. Cohen. 2016. Tweet2vec: Character-based distributed representations for social media. In ACL. The Association for Computer Linguistics, 269–274.
[37]
Sara Melvin, Wenchao Yu, Peng Ju, Sean D. Young, and Wei Wang. 2017. Event detection and summarization using phrase network. In ECML/PKDD. Springer, 89–101.
[38]
Beaux Sharifi, Mark-Anthony Hutton, and Jugal K. Kalita. 2010. Summarizing microblogs automatically. In HLT-NAACL. The Association for Computational Linguistics, 685–688.
[39]
Huyen Trang Phan, Ngoc Thanh Nguyen, and Dosam Hwang. 2018. A tweet summarization method based on maximal association rules. In ICCCI. Springer, 373–382.
[40]
Nuno Dionísio, Fernando Alves, Pedro Miguel Ferreira, and Alysson Bessani. 2019. Cyberthreat detection from Twitter using deep neural networks. In IJCNN. IEEE, 1–8.
[41]
Abdelhamid Chellal and Mohand Boughanem. 2018. Optimization framework model for retrospective tweet summarization. In SAC. ACM, 704–711.
[42]
Yue Huang, Chao Shen, and Tao Li. 2018. Event summarization for sports games using Twitter streams. World Wide Web 21, 3 (2018), 609–627.
[43]
Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. In ACL. The Association for Computer Linguistics, 484–494.
[44]
M. Xu, X. Zhang, and L. Guo. 2019. Jointly detecting and extracting social events from twitter using Gated BiLSTM-CRF. IEEE Access 7 (2019), 148462–148471.
[45]
Pengjie Ren, Zhumin Chen, Zhaochun Ren, Furu Wei, Liqiang Nie, Jun Ma, and Maarten de Rijke. 2018. Sentence relations for extractive summarization with deep neural networks. ACM Trans. Inf. Syst. 36, 4 (2018), 39:1–39:32.
[46]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
[47]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. ACL, 1724–1734.
[48]
Yang Liu and Mirella Lapata. 2019. Hierarchical transformers for multi-document summarization. In ACL. Association for Computational Linguistics, 5070–5081.
[49]
Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph-based attentional neural model. In ACL. ACL, 1171–1181.
[50]
Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive summarization. In ICLR. OpenReview.net.
[51]
Eric Chu and Peter J. Liu. 2019. Meansum: A neural model for unsupervised multi-document abstractive summarization. In ICML. 1223–1232.
[52]
Yang Liu and Mirella Lapata. 2018. Learning structured text representations. Trans. Assoc. Comput. Linguistics 6, 2018 (2018), 63–75.
[53]
Alexander Richard Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R. Radev. 2019. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In ACL. Association for Computational Linguistics, 1074–1084.
[54]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In EMNLP. ACL, 1532–1543.
[55]
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In CVPR. IEEE Computer Society, 1179–1195.
[56]
Arkaitz Zubiaga. 2018. A longitudinal assessment of the persistence of Twitter datasets. J. Assoc. Inf. Sci. Technol., 69, 8 (2018), 974–984.
[57]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations). The Association for Computer Linguistics, 55–60.
[58]
Liqun Liu, Funan Mu, Pengyu Li, Xin Mu, Jing Tang, Xingsheng Ai, Ran Fu, Lifeng Wang, and Xing Zhou. 2019. Neuralclassifier: An open-source neural hierarchical multi-label text classification toolkit. In ACL. 87–92.
[59]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP. ACL, 1746–1751.
[60]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In IJCAI. IJCAI/AAAI Press, 2873–2879.
[61]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998–6008.
[62]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI. AAAI Press, 2267–2273.
[63]
Baoxin Wang. 2018. Disconnected recurrent neural networks for text categorization. In ACL. Association for Computational Linguistics, 2311–2320.
[64]
Wenpeng Yin and Hinrich Schütze. 2018. Attentive convolution: Equipping CNNs with RNN-style attention mechanisms. Trans. Assoc. Comput. Linguistics 6, 2018 (2018), 687–702.
[65]
Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In ACL. Association for Computational Linguistics, 562–570.
[66]
Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann LeCun. 2017. Very deep convolutional networks for text classification. In EACL. Association for Computational Linguistics, 1107–1116.
[67]
Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. In Rep4NLP@ACL. Association for Computational Linguistics, 78–86.
[68]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP. 3980–3990.
[69]
Jingxuan Li, Lei Li, and Tao Li. 2011. MSSF: A multi-document summarization framework based on submodularity. In SIGIR. ACM, 1247–1248.
[70]
Dingding Wang, Tao Li, Shenghuo Zhu, and Chris H. Q. Ding. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In SIGIR. ACM, 307–314.
[71]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Association for Computational Linguistics, 74–81.
[72]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In ACL. ACL, 311–318.
[73]
Giuseppe Di Fabbrizio, Amanda Stent, and Robert J. Gaizauskas. 2014. A hybrid approach to multi-document summarization of opinions in reviews. In INLG. 54–63.

Cited By

View all
  • (2025)ATSumm: Auxiliary information enhanced approach for abstractive disaster tweet summarization with sparse training dataKnowledge-Based Systems10.1016/j.knosys.2025.112969(112969)Online publication date: Jan-2025
  • (2024)Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning ModelsTechnologies10.3390/technologies1210019912:10(199)Online publication date: 15-Oct-2024
  • (2024)OntoDSumm: Ontology-Based Tweet Summarization for Disaster EventsIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326602511:2(2724-2739)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. Preserve Integrity in Realtime Event Summarization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 3
    June 2021
    533 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3454120
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Accepted: 01 December 2021
    Published: 03 May 2021
    Revised: 01 October 2020
    Received: 01 April 2020
    Published in TKDD Volume 15, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Tweet summarization
    2. data integrity
    3. hierarchical deep neural network
    4. real-time event summarization

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Natural Science Foundation of China
    • Joint Innovation Research Program of Fujian Province China
    • Natural Science Foundation of Fujian Province China
    • International Cooperation Projects of Fujian Province China
    • National Natural Science Foundation of China (Key Program)
    • Shanghai Committee of Science and Technology

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)ATSumm: Auxiliary information enhanced approach for abstractive disaster tweet summarization with sparse training dataKnowledge-Based Systems10.1016/j.knosys.2025.112969(112969)Online publication date: Jan-2025
    • (2024)Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning ModelsTechnologies10.3390/technologies1210019912:10(199)Online publication date: 15-Oct-2024
    • (2024)OntoDSumm: Ontology-Based Tweet Summarization for Disaster EventsIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326602511:2(2724-2739)Online publication date: Apr-2024
    • (2024)IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarizationComputer Speech & Language10.1016/j.csl.2024.10164987(101649)Online publication date: Aug-2024
    • (2023)Unsupervised update summarization of news eventsPattern Recognition10.1016/j.patcog.2023.109839144:COnline publication date: 1-Dec-2023
    • (2022)Complex Network Hierarchical Sampling Method Combining Node Neighborhood Clustering Coefficient with Random WalkNew Generation Computing10.1007/s00354-022-00179-x40:3(765-807)Online publication date: 1-Sep-2022
    • (2022)A Survey on Advancements of Real-Time Analytics Architecture ComponentsComputational Methods and Data Engineering10.1007/978-981-19-3015-7_41(547-559)Online publication date: 9-Sep-2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media