Skip to main content
Log in

Dynamic event type recognition and tagging for data-driven insights in law-enforcement

  • Published:
Computing Aims and scope Submit manuscript

Abstract

In law enforcement, investigators are typically tasked with analyzing large collections of evidences in order to identify and extract key information to support investigation cases. In this context, events are key elements that help understanding and reconstructing what happened from the collection of evidence items. With the ever increasing amount of data (e.g., e-mails and content from social media) gathered today as part of investigation tasks (in most part done manually), managing such amount of data can be challenging and prone to missing important details that could be of high relevance to a case. In this paper, we aim to facilitate the work of investigators through a framework for deriving insights from data. We focus on the auto-recognition and dynamic tagging of event types (e.g., phone calls) from (textual) evidence items, and propose a framework to facilitate these tasks and provide support for insights and discovery. The experimental results obtained by applying our approach to a real, legal dataset demonstrate the feasibility of our proposal by achieving good performance in the task of automatically recognizing and tagging event types of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/berkmancenter/mediacloud-sentence-splitter.

  2. https://nlp.stanford.edu/software/openie.html.

  3. https://nlp.stanford.edu/software/tagger.shtml.

  4. https://spacy.io/api/lemmatizer.

  5. We set an initial threshold to 70%; this parameter can be tuned as needed.

  6. see earlier Footnote 1.

  7. see earlier Footnote 4.

  8. We choose bigrams and trigrams, as based on our observation, averaging the vector of more than 3 words together results in an embedding that is not semantically meaningful.

  9. https://nlp.stanford.edu/software/CRF-NER.html.

  10. https://code.google.com/archive/p/word2vec/.

  11. https://github.com/facebookresearch/fastText.

  12. https://nlp.stanford.edu/projects/glove/.

  13. http://conceptnet.io/.

  14. Term Frequency Inverse Document Frequency.

References

  1. Al Mutawa N, Baggili I, Marrington A (2012) Forensic analysis of social networking applications on mobile devices. Digit Invest 9:S24–S33

    Article  Google Scholar 

  2. Altman DG (1990) Practical statistics for medical research. CRC Press, Boca Raton

    Book  Google Scholar 

  3. Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (vol 1: Long Papers), vol 1, pp 344–354

  4. Baber C, Smith P, Cross J, Hunter JE, McMaster R (2006) Crime scene investigation as distributed cognition. Pragmat Cognit 14(2):357–385

    Article  Google Scholar 

  5. Basher ARM, Fung BC (2014) Analyzing topics and authors in chat logs for crime investigation. Knowl Inf Syst 39(2):351–381

    Article  Google Scholar 

  6. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606

  7. Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai A (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. CoRR. arXiv:1607.06520

  8. Chau M, Xu JJ, Chen H (2002) Extracting meaningful entities from police narrative reports. In: Proceedings of the 2002 annual national conference on Digital government research, pp 1–5. Digital Government Society of North America

  9. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  10. Decherchi S, Tacconi S, Redi J, Leoncini A, Sangiacomo F, Zunino R (2009) Text clustering for digital forensics analysis. In: Herrero Á, Gastaldo P, Zunino R, Corchado E (eds) Computational intelligence in security for information systems, Springer, Berlin, pp 29–36

  11. Dheeru D, Karra TE (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports. Accessed 26 Mar 2019

  12. Dobash RE, Dobash RP (1984) The nature and antecedents of violent events. Br J Criminol 24(3):269–288

    Article  Google Scholar 

  13. Fast E, McGrath W, Rajpurkar P, Bernstein MS (2016) Augur: Mining human behaviors from fiction to power interactive systems. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 237–247

  14. Galgani F, Compton P, Hoffmann A (2012) Citation based summarisation of legal texts. In: Pacific rim international conference on artificial intelligence. Springer, Berlin, pp 40–52

  15. Helbich M, Hagenauer J, Leitner M, Edwards R (2013) Exploration of unstructured narrative crime reports: an unsupervised neural network and point pattern analysis approach. Cartogr Geogr Inf Sci 40(4):326–336

    Article  Google Scholar 

  16. Insititute ALI (2018) Austlii: Free, comprehensive and independent access to australasian law. www.austlii.edu.au. Accessed 7 May 2018

  17. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  18. Keyvanpour MR, Javideh M, Ebrahimi MR (2011) Detecting and investigating crime by means of data mining: a general crime matching framework. Proc Comput Sci 3:872–880

    Article  Google Scholar 

  19. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  20. Ku CH, Iriberri A, Leroy G (2008) Natural language processing and e-government: crime information extraction from heterogeneous data sources. In: Proceedings of the 2008 international conference on Digital government research, pp 162–170. Digital Government Society of North America

  21. Kumar R, Raghuveer K (2012) Legal document summarization using latent dirichlet allocation. Int J Comput Sci Telecommun 3:114–117

    Google Scholar 

  22. Lenci A (2008) Distributional semantics in linguistic and cognitive research. Ital J Linguist 20(1):1–31

    Google Scholar 

  23. Liu CL, Liao TM (2005) Classifying criminal charges in Chinese for web-based legal services. In: Asia-pacific web conference, pp 64–75. Springer, Berlin

  24. Liu H, Chen S, Kubota N (2013) Intelligent video systems and analytics: a survey. IEEE Trans Ind Inf 9(3):1222–1233

    Article  Google Scholar 

  25. Liu X, Jian C, Lu CT (2010) A spatio-temporal-textual crime search engine. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp 528–529. ACM

  26. Lu Q, Conrad JG, Al-Kofahi K, Keenan W (2011) Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM international conference on Information and knowledge management. ACM

  27. Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. arXiv preprint arXiv:1707.09168

  28. Majone G (1989) Evidence, argument, and persuasion in the policy process. Yale University Press, London

    Google Scholar 

  29. Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60

  30. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  31. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. Curran Associates Inc., pp. 3111–3119

  32. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Invest 30(1):3–26

    Article  Google Scholar 

  33. Pandita R, Xiao X, Zhong H, Xie T, Oney S, Paradkar A (2012) Inferring method specifications from natural language API descriptions. In: 2012 34th international conference on software engineering (ICSE), pp 815–825. IEEE

  34. Polsley S, Jhunjhunwala P, Huang R (2016) Casesummarizer: A system for automated summarization of legal texts. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, pp 258–262

  35. Raghuveer K (2012) Legal documents clustering using latent dirichlet allocation. IAES Int J Artif Intell 2(1):34–37

    Google Scholar 

  36. Richard T (2009) Qualitative versus quantitative methods: understanding why qualitative methods are superior for criminology and criminal justice. J Theor Philos Criminol 1(1):38–58

    Google Scholar 

  37. Rowlingson BS, Diggle PJ (1993) Splancs: spatial point pattern analysis code in s-plus. Comput Geosci 19(5):627–655

    Article  Google Scholar 

  38. Soria C, Bartolini R, Lenci A, Montemagni S, Pirrelli V (2007) Automatic extraction of semantics in law documents. In: Proceedings of the V legislative XML workshop, pp 253–266

  39. Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: An open multilingual graph of general knowledge. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17). AAAI Press, pp 4444–4451

  40. Stotland E, Pendleton M (1989) Workload, stress, and strain among police officers. Behav Med 15(1):5–17

    Article  Google Scholar 

  41. Sulea OM, Zampieri M, Vela M, van Genabith J (2017) Predicting the law area and decisions of French supreme court cases. arXiv preprint arXiv:1708.01681

  42. Sun Z, Lim EP, Chang K, Ong TK, Gunaratna RK (2005) Event-driven document selection for terrorism information extraction. In: International conference on intelligence and security informatics, pp. 37–48. Springer

  43. Tao Y, Papadias D (2001) Efficient historical r-trees. In: Thirteenth international conference on scientific and statistical database management, 2001. SSDBM 2001. Proceedings, pp 223–232. IEEE

  44. Vig J, Sen S, Riedl J (2012) The tag genome: Encoding community knowledge to support novel interaction. ACM Trans Interact Intell Syst (TIIS) 2(3):13

    Google Scholar 

  45. Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning, pp. 977–984. ACM

  46. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington

    Google Scholar 

  47. Zheng R, Qin Y, Huang Z, Chen H (2003) Authorship analysis in cybercrime investigation. In: International conference on intelligence and security informatics, pp. 59–73. Springer, Berlin

Download references

Acknowledgements

We acknowledge Data to Decisions CRC (D2D-CRC) for funding this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Rodriguez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Carlos Rodríguez, Reza Nouri: This work was done while the authors were at UNSW Sydney.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zamanirad, S., Benatallah, B., Barukh, M.C. et al. Dynamic event type recognition and tagging for data-driven insights in law-enforcement. Computing 102, 1627–1651 (2020). https://doi.org/10.1007/s00607-020-00791-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-020-00791-z

Keywords

Mathematics Subject Classification

Navigation