Skip to main content
Log in

The big data of violent events: algorithms for association analysis using spatio-temporal storytelling

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

This paper proposes three methods of association analysis that address two challenges of Big Data: capturing relatedness among real-world events in high data volumes, and modeling similar events that are described disparately under high data variability. The proposed methods take as input a set of geotemporally-encoded text streams about violent events called “storylines”. These storylines are associated for two purposes: to investigate if an event could occur again, and to measure influence, i.e., how one event could help explain the occurrence of another. The first proposed method, Distance-based Bayesian Inference, uses spatial distance to relate similar events that are described differently, addressing the challenge of high variability. The second and third methods, Spatial Association Index and Spatio-logical Inference, measure the influence of storylines in different locations, dealing with the high-volume challenge. Extensive experiments on social unrest in Mexico and wars in the Middle East showed that these methods can achieve precision and recall as high as 80 % in retrieval tasks that use both keywords and geospatial information as search criteria. In addition, the experiments demonstrated high effectiveness in uncovering real-world storylines for exploratory analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. civil unrest denotes an event of social impact, such as a strike or a protest.

  2. An actor can be a political organization, the military, militias, terrorist organizations, and individuals, among others.

References

  1. Bolzoni P, Helmer S, Wellenzohn K, Gamper J, Andritsos P (2014) Efficient itinerary planning with category constraints. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 203–212. doi:10.1145/2666310.2666411

  2. Bouros P, Sacharidis D, Bikakis N (2014) Regionally influential users in location-aware social networks. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 501–504. doi:10.1145/2666310.2666489.

  3. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30:107–117

    Article  Google Scholar 

  4. Chan J, Bailey J, Leckie C (2008) Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16:53–96

    Article  Google Scholar 

  5. Chan J, Bailey J, Leckie C (2009) Using graph partitioning to discover regions of correlated spatio-temporal change in evolving graphs. Intell Data Anal (IDA) 13:755–793

    Google Scholar 

  6. George B, Kang J, Shekhar S (2009) Spatio-temporal sensor graphs (stsg): a data model for the discovery of spatio-temporal patterns. Intell Data Anal (IDA) 13:457–475

    Google Scholar 

  7. Hossain MS, Andrews C, Ramakrishnan N, North C (2011) Helping intelligence analysts make connections. In: Workshop on scalable integration of analytics and visualization, AAAI ’11, pp 22–31

  8. Hossain MS, Butler P, Ramakrishnan N, Boedihardjo A Stortytelling in entity networks to support intelligence analysts. In: Conference on Knowledge Discovery and Data Mining (KDD’12), pp 1375–1383

  9. Hossain M.S., Gresock J., Edmonds Y., Helm R., Potts M., Ramakrishnan N. (2012) Connecting the dots between pubmed abstracts, vol 7

  10. Iarpa - open source indicators program (osi) (2014). http://www.iarpa.gov/solicitations_osi.html

  11. Kimmig A, Bach SH, Broecheler M, Huang B, Getoor L (2012) A short introduction to probabilistic soft logic. In: NIPS Workshop on probabilistic programming: Foundations and applications

  12. Kleinberg J. (1998) Authoritative sources in a hyperlinked environment. In: Society of industrial and applied mathematics (SIAM), pp 668–677

  13. Kreinovich V, Kosheleva O (2008) Computational complexity of determining which statements about causality hold in different space-time models. Theor Comput Sci 405(1-2):50–63

    Article  Google Scholar 

  14. Kumar D, Ramakrishnan N, Helm RF, Potts M (2008) Algorithms for storytelling. IEEE TKDE 20(6):32. http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.32

    Google Scholar 

  15. Leetaru K., Schrodt P. (2013) Gdelt: Global database of events, language, and tone, 1979-2012. In: Proceedings International Studies Associations Annual Conference (ISA)

  16. Li Z, Wang B, Li M, Ma WY (2005) A probabilistic model for retrospective news event detection. In: ACM SIGIR Conference on research and development in information retrieval, SIGIR ’05 , pp 106–113

  17. Liu M, Fu K, Lu CT, Chen G, Wang H (2014) A search and summary application for traffic events detection based on twitter data. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 549–552. doi:10.1145/2666310.2666366

  18. Marchiori M (1997) The quest for correct information on web: Hyper search engines. In: World wide web conference (WWW), pp 1225–1235

  19. Mondo GD, RodrGuez M, Claramunt C, Bravo L, Thibaud R (2013) Modeling consistency of spatio-temporal graphs. Data Knowl Eng 84:59–80

    Article  Google Scholar 

  20. P. Mohan S, Shekhar JS, Rogers J (2012) Cascading spatio-temporal pattern discovery. Trans Knowl Data Eng (TKDE) 24(11):1977–1992

    Article  Google Scholar 

  21. Radinsky K, Davidovich S, Markovitch S (2012) Learning causality for news events prediction. In: World wide web conference (WWW), pp 909–918

  22. Radinsky K, Davidovich S, Markovitch S (2012) Learning to predict from textual data. J Artif Intell Res (JAIR) 45:641–684

    Google Scholar 

  23. Radinsky K, Horvitz E (2013) Mining the web to predict future events. In: Conference on web search and data mining, WSDM ’13, pp 255–264

  24. Santos RD, Shah S, Chen F, Boedihardjo A, Butler P, Lu CT, Ramakrhishnan N (2016) A framework for intelligence analysis using spatio-temporal storytelling. Geoinformatica, Int J Adv Comput Sci Geogr Inf Syst:1

  25. Shahaf D, Guestrin C (2010) Connecting the dots between news articles. In: ACM Conference on knowledge, discovery, and data mining (KDD ’10), pp 745–770

  26. Shahaf D, Guestrin C, Horvitz E Metro maps of science. In: Conference on Knowledge Discovery and Data Mining, KDD’12, pp 1122–1130

  27. Shahaf D, Guestrin C, Horvitz E Trains of thought: Generating information maps. In: World Wide Web Conference, WWW’12, pp 899–908

  28. Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, New York

    Google Scholar 

  29. Turner S (1994) The creative process: A computer model of storytelling and creativity. Psychology Press, pp 122–123

  30. Vavliakis KN, Symeonidis AL, Mitkas PA (2013) Event identification in web social media through named entity recognition and topic modeling. Data Knowl Eng 88:1–24

    Article  Google Scholar 

  31. Wang B, Wang X (2011) Spatial entropy-based clustering for mining data with spatial correlation. In: Proceedings of the 15th pacific-asia conf. on adv. in knowledge discovery and data mining, PAKDD’11, pp 196–208

  32. Zhang J.D, Chow C.Y, Li Y (2014) Lore: Exploiting sequential influence for location recommendations. In: Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14. ACM, NY, USA, pp 103–112. doi:10.1145/2666310.2666400

  33. Zhou X, Chen L (2014) Event detection over twitter social media streams. VLDB J 23(3):381–400. doi:10.1007/s00778-013-0320-3

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raimundo F. Dos Santos Jr..

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dos Santos, R.F., Boedihardjo, A., Shah, S. et al. The big data of violent events: algorithms for association analysis using spatio-temporal storytelling. Geoinformatica 20, 879–921 (2016). https://doi.org/10.1007/s10707-016-0247-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-016-0247-0

Keywords

Navigation