Skip to main content

Advertisement

Log in

Using semantic graphs to detect overlapping target events and story lines from newspaper articles

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Event detection from text data is an active area of research. While the emphasis in the literature has been on event identification and labeling using a single data source, this work considers event and story line detection when using a large number of data sources. In this setting, it is natural for different events in the same domain, e.g., violence, sports, politics, to occur at the same time and for different story lines about the same event to emerge. To capture events in this setting, we propose an Offline algorithm that detects events and story lines about events for a target domain given a news article collection. Our algorithm leverages a multi-relational sentence-level semantic graph and well-known graph properties to identify overlapping events and story lines within the events. We then extend this algorithm for an Online setting. Both the Offline and Online approaches are evaluated using two large data sets containing millions of news articles from a large number of sources. Our empirical analysis shows that methods using the proposed semantic graph beat the state of the art in terms of precision and recall while providing more complete event summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. We have empirically validated this assumption across 1000s of articles. While articles may discuss multiple events or multiple themes of a single event, paragraphs generally focuses on a single story line in a single event.

  2. It should also be noted that phrase and sentence-level topic graphs have been analyzed in other fields. For example, these type of graphs are sometimes called complex graphs and have been used for text summarization [7].

  3. These documents are either published by Iraqi news agencies or contain a term or a phrase related to Iraq, e.g., Iraq, Baghdad, Gulf War.

  4. Available at http://www.trec-ts.org.

References

  1. Inma. http://www.inma.org/article/index.cfm/23899-credibility-of-online-newspapers

  2. Nbcnews. http://www.nbcnews.com/technology/online-news-readership-overtakes-newspapers-124383

  3. Statoids. http://www.statoids.com

  4. Abdelhaq, H., Sengstock, C., Gertz, M.: Eventweet: online localized event detection from twitter. VLDB Endow. 6(12), 1326–1329 (2013)

    Article  Google Scholar 

  5. Aggarwal, C.C., Subbian, K.: Event detection in social streams. In: SDM, vol. 12, pp. 624–635. SIAM (2012)

  6. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR, pp. 37–45. ACM (1998)

  7. Antiqueira, L., Oliveira, O.N., da Fontoura Costa, L., Nunes, M.D.: A complex network approach to text summarization. Inf. Sci. 179(5), 584–599 (2009)

    Article  MATH  Google Scholar 

  8. Barzilay, R., McKeown, K.R., Elhadad, M.: Information fusion in the context of multi-document summarization. In: ACL, pp. 550–557. ACL (1999)

  9. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. ICWSM 11, 438–441 (2011)

    Google Scholar 

  10. Brants, T., Chen, F., Farahat, A.: A system for new event detection. In: SIGIR, pp. 330–337. ACM (2003)

  11. Chakrabarti, D., Punera, K.: Event summarization using tweets. ICWSM 11, 66–73 (2011)

    Google Scholar 

  12. Chen, F., Neill, D.B. : Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: KDD, pp. 1166–1175. ACM (2014)

  13. Chua, F.C.T., Asur, S.: Automatic summarization of events from social media. In ICWSM, Citeseer (2013)

    Google Scholar 

  14. Ferlez, J., Faloutsos, C., Leskovec, J., Mladenic, D., Grobelnik, M.: Monitoring network evolution using mdl. In: ICDE, pp. 1328–1330. IEEE (2008)

  15. Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp. 181–192. VLDB Endowment (2005)

  16. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  17. Guralnik, V., Srivastava, J.: Event detection from time series data. In: KDD, pp. 33–42. ACM (1999)

  18. Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, D.: On burstiness-aware search for document sequences. In: KDD, pp. 477–486. ACM (2009)

  19. Lappas, T., Vieira, M.R., Gunopulos, D., Tsotras, V.J.: On the spatiotemporal burstiness of terms. In: VLDB, pp. 836–847 (2012)

  20. Lei, K., Khadiwala, R., Chang, K.-C.T.: A twitter-based event detection and analysis system. In: ICDE (2012)

  21. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506. ACM (2009)

  22. Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164. ACM (2012)

  23. Lin, C.X., Zhao, B., Mei, Q., Han, J.: Pet: a statistical model for popular events tracking in social communities. In: KDD, pp. 929–938. ACM (2010)

  24. Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: ICJNLP (2005)

  25. Muthiah, S., Huang, B., Arredondo, J., Mares, D., Getoor, L., Katz, G., Ramakrishnan, N.: Planned protest modeling in news and social media. In: AAAI, pp. 3920–3927 (2015)

  26. Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using twitter. In: IUI, pp. 189–198. ACM (2012)

  27. Nishihara, Y., Sato, K., Sunayama, W.: Event extraction and visualization for obtaining personal experiences from blogs. In: Human Interface and the Management of Information. Information and Interaction, pp. 315–324. Springer (2009)

  28. Ramakrishnan, N., Butler, P., Muthiah, S., Self, N., Khandpur, R., Saraf, P., Wang, W., Cadena, J., Vullikanti, A., Korkmaz, G., et al.: ‘Beating the news’ with embers: forecasting civil unrest using open source indicators. In KDD, pp. 1799–1808. ACM (2014)

  29. Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from twitter. In: KDD, pp. 1104–1112. ACM (2012)

  30. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW, pp. 851–860. ACM (2010)

  31. Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: ICWSM (2009)

  32. Shen, D., Sun, J.-T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. IJCAI 7, 2862–2867 (2007)

    Google Scholar 

  33. Wang, D., Ding, W.: A hierarchical pattern learning framework for forecasting extreme weather events. In: ICDM, pp. 1021–1026. IEEE (2015)

  34. Wang, J., Tong, W., Yu, H., Li, M., Ma, X., Cai, H., Hanratty, T., Han, J.: Mining multi-aspect reflection of news events in twitter: Discovery, linking and presentation. In: ICDM, pp. 429–438. IEEE (2015)

  35. Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: KDD, pp. 784–793. ACM (2007)

  36. Wei, Y., Singh, L., Gallagher, B., Buttler, D.: Overlapping target event and story line detection of online newspaper articles. In: DSAA

  37. Weng, J., Lee, B.-S.: Event detection in twitter. ICWSM 11, 401–408 (2011)

    Google Scholar 

  38. Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topicsketch: Real-time bursty topic detection from twitter. In: ICDM, pp. 837–846. IEEE (2013)

  39. Xu, F., Uszkoreit, H., Li, H.: Automatic event and relation detection with seeds of varying complexity. In: AAAI Workshop Event Extraction and Synthesis, pp. 12–17 (2006)

  40. Yang, Y., Pierce, T., Carbonell, J.: A study of retrospective and on-line event detection. In: SIGIR, pp. 28–36. ACM (1998)

Download references

Acknowledgements

We thank Abbie Taylor, Nili Yossinger, Lara Kinne, and Eleanor Swingewood for labeling data and their general subject matter expertise provided throughout the process. This work was supported in part by the National Science Foundation (NSF) Grant SMA-1338507, the Georgetown University Mass Data Institute (MDI), and the Lawrence Livermore National Laboratory (LLNL) under Contract DE-AC52-07NA27344. Any opinions, findings, conclusions, and recommendations expressed in this work are those of the authors and do not necessarily reflect the views of NSF, MDI, or LLNL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifang Wei.

Additional information

This paper is an extension version of the DSAA’2016 paper “Overlapping target event and story line detection of online newspaper articles”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Y., Singh, L., Buttler, D. et al. Using semantic graphs to detect overlapping target events and story lines from newspaper articles. Int J Data Sci Anal 5, 41–60 (2018). https://doi.org/10.1007/s41060-017-0066-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-017-0066-x

Keywords

Navigation