Abstract
Event detection from text data is an active area of research. While the emphasis in the literature has been on event identification and labeling using a single data source, this work considers event and story line detection when using a large number of data sources. In this setting, it is natural for different events in the same domain, e.g., violence, sports, politics, to occur at the same time and for different story lines about the same event to emerge. To capture events in this setting, we propose an Offline algorithm that detects events and story lines about events for a target domain given a news article collection. Our algorithm leverages a multi-relational sentence-level semantic graph and well-known graph properties to identify overlapping events and story lines within the events. We then extend this algorithm for an Online setting. Both the Offline and Online approaches are evaluated using two large data sets containing millions of news articles from a large number of sources. Our empirical analysis shows that methods using the proposed semantic graph beat the state of the art in terms of precision and recall while providing more complete event summaries.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We have empirically validated this assumption across 1000s of articles. While articles may discuss multiple events or multiple themes of a single event, paragraphs generally focuses on a single story line in a single event.
It should also be noted that phrase and sentence-level topic graphs have been analyzed in other fields. For example, these type of graphs are sometimes called complex graphs and have been used for text summarization [7].
These documents are either published by Iraqi news agencies or contain a term or a phrase related to Iraq, e.g., Iraq, Baghdad, Gulf War.
Available at http://www.trec-ts.org.
References
Inma. http://www.inma.org/article/index.cfm/23899-credibility-of-online-newspapers
Nbcnews. http://www.nbcnews.com/technology/online-news-readership-overtakes-newspapers-124383
Statoids. http://www.statoids.com
Abdelhaq, H., Sengstock, C., Gertz, M.: Eventweet: online localized event detection from twitter. VLDB Endow. 6(12), 1326–1329 (2013)
Aggarwal, C.C., Subbian, K.: Event detection in social streams. In: SDM, vol. 12, pp. 624–635. SIAM (2012)
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR, pp. 37–45. ACM (1998)
Antiqueira, L., Oliveira, O.N., da Fontoura Costa, L., Nunes, M.D.: A complex network approach to text summarization. Inf. Sci. 179(5), 584–599 (2009)
Barzilay, R., McKeown, K.R., Elhadad, M.: Information fusion in the context of multi-document summarization. In: ACL, pp. 550–557. ACL (1999)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. ICWSM 11, 438–441 (2011)
Brants, T., Chen, F., Farahat, A.: A system for new event detection. In: SIGIR, pp. 330–337. ACM (2003)
Chakrabarti, D., Punera, K.: Event summarization using tweets. ICWSM 11, 66–73 (2011)
Chen, F., Neill, D.B. : Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: KDD, pp. 1166–1175. ACM (2014)
Chua, F.C.T., Asur, S.: Automatic summarization of events from social media. In ICWSM, Citeseer (2013)
Ferlez, J., Faloutsos, C., Leskovec, J., Mladenic, D., Grobelnik, M.: Monitoring network evolution using mdl. In: ICDE, pp. 1328–1330. IEEE (2008)
Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp. 181–192. VLDB Endowment (2005)
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99(12), 7821–7826 (2002)
Guralnik, V., Srivastava, J.: Event detection from time series data. In: KDD, pp. 33–42. ACM (1999)
Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, D.: On burstiness-aware search for document sequences. In: KDD, pp. 477–486. ACM (2009)
Lappas, T., Vieira, M.R., Gunopulos, D., Tsotras, V.J.: On the spatiotemporal burstiness of terms. In: VLDB, pp. 836–847 (2012)
Lei, K., Khadiwala, R., Chang, K.-C.T.: A twitter-based event detection and analysis system. In: ICDE (2012)
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506. ACM (2009)
Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164. ACM (2012)
Lin, C.X., Zhao, B., Mei, Q., Han, J.: Pet: a statistical model for popular events tracking in social communities. In: KDD, pp. 929–938. ACM (2010)
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: ICJNLP (2005)
Muthiah, S., Huang, B., Arredondo, J., Mares, D., Getoor, L., Katz, G., Ramakrishnan, N.: Planned protest modeling in news and social media. In: AAAI, pp. 3920–3927 (2015)
Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using twitter. In: IUI, pp. 189–198. ACM (2012)
Nishihara, Y., Sato, K., Sunayama, W.: Event extraction and visualization for obtaining personal experiences from blogs. In: Human Interface and the Management of Information. Information and Interaction, pp. 315–324. Springer (2009)
Ramakrishnan, N., Butler, P., Muthiah, S., Self, N., Khandpur, R., Saraf, P., Wang, W., Cadena, J., Vullikanti, A., Korkmaz, G., et al.: ‘Beating the news’ with embers: forecasting civil unrest using open source indicators. In KDD, pp. 1799–1808. ACM (2014)
Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from twitter. In: KDD, pp. 1104–1112. ACM (2012)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW, pp. 851–860. ACM (2010)
Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: ICWSM (2009)
Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. IJCAI 7, 2862–2867 (2007)
Wang, D., Ding, W.: A hierarchical pattern learning framework for forecasting extreme weather events. In: ICDM, pp. 1021–1026. IEEE (2015)
Wang, J., Tong, W., Yu, H., Li, M., Ma, X., Cai, H., Hanratty, T., Han, J.: Mining multi-aspect reflection of news events in twitter: Discovery, linking and presentation. In: ICDM, pp. 429–438. IEEE (2015)
Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: KDD, pp. 784–793. ACM (2007)
Wei, Y., Singh, L., Gallagher, B., Buttler, D.: Overlapping target event and story line detection of online newspaper articles. In: DSAA
Weng, J., Lee, B.-S.: Event detection in twitter. ICWSM 11, 401–408 (2011)
Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topicsketch: Real-time bursty topic detection from twitter. In: ICDM, pp. 837–846. IEEE (2013)
Xu, F., Uszkoreit, H., Li, H.: Automatic event and relation detection with seeds of varying complexity. In: AAAI Workshop Event Extraction and Synthesis, pp. 12–17 (2006)
Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR, pp. 28–36. ACM (1998)
Acknowledgements
We thank Abbie Taylor, Nili Yossinger, Lara Kinne, and Eleanor Swingewood for labeling data and their general subject matter expertise provided throughout the process. This work was supported in part by the National Science Foundation (NSF) Grant SMA-1338507, the Georgetown University Mass Data Institute (MDI), and the Lawrence Livermore National Laboratory (LLNL) under Contract DE-AC52-07NA27344. Any opinions, findings, conclusions, and recommendations expressed in this work are those of the authors and do not necessarily reflect the views of NSF, MDI, or LLNL.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extension version of the DSAA’2016 paper “Overlapping target event and story line detection of online newspaper articles”.
Rights and permissions
About this article
Cite this article
Wei, Y., Singh, L., Buttler, D. et al. Using semantic graphs to detect overlapping target events and story lines from newspaper articles. Int J Data Sci Anal 5, 41–60 (2018). https://doi.org/10.1007/s41060-017-0066-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-017-0066-x