Dynamic event type recognition and tagging for data-driven insights in law-enforcement

Zamanirad, Shayan; Benatallah, Boualem; Barukh, Moshe Chai; Rodriguez, Carlos; Nouri, Reza

doi:10.1007/s00607-020-00791-z

Dynamic event type recognition and tagging for data-driven insights in law-enforcement

Published: 31 January 2020

Volume 102, pages 1627–1651, (2020)
Cite this article

Computing Aims and scope Submit manuscript

Shayan Zamanirad¹,
Boualem Benatallah¹,
Moshe Chai Barukh¹,
Carlos Rodriguez ORCID: orcid.org/0000-0002-7263-821X² &
…
Reza Nouri³

275 Accesses
3 Citations
Explore all metrics

Abstract

In law enforcement, investigators are typically tasked with analyzing large collections of evidences in order to identify and extract key information to support investigation cases. In this context, events are key elements that help understanding and reconstructing what happened from the collection of evidence items. With the ever increasing amount of data (e.g., e-mails and content from social media) gathered today as part of investigation tasks (in most part done manually), managing such amount of data can be challenging and prone to missing important details that could be of high relevance to a case. In this paper, we aim to facilitate the work of investigators through a framework for deriving insights from data. We focus on the auto-recognition and dynamic tagging of event types (e.g., phone calls) from (textual) evidence items, and propose a framework to facilitate these tasks and provide support for insights and discovery. The experimental results obtained by applying our approach to a real, legal dataset demonstrate the feasibility of our proposal by achieving good performance in the task of automatically recognizing and tagging event types of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Social media analytics: a survey of techniques, tools and platforms

Article Open access 26 July 2014

Automatic event detection in football using tracking data

Article Open access 06 September 2022

The integration and implications of artificial intelligence in forensic science

Article 04 January 2024

Notes

https://github.com/berkmancenter/mediacloud-sentence-splitter.
https://nlp.stanford.edu/software/openie.html.
https://nlp.stanford.edu/software/tagger.shtml.
https://spacy.io/api/lemmatizer.
We set an initial threshold to 70%; this parameter can be tuned as needed.
see earlier Footnote 1.
see earlier Footnote 4.
We choose bigrams and trigrams, as based on our observation, averaging the vector of more than 3 words together results in an embedding that is not semantically meaningful.
https://nlp.stanford.edu/software/CRF-NER.html.
https://code.google.com/archive/p/word2vec/.
https://github.com/facebookresearch/fastText.
https://nlp.stanford.edu/projects/glove/.
http://conceptnet.io/.
Term Frequency Inverse Document Frequency.

References

Al Mutawa N, Baggili I, Marrington A (2012) Forensic analysis of social networking applications on mobile devices. Digit Invest 9:S24–S33
Article Google Scholar
Altman DG (1990) Practical statistics for medical research. CRC Press, Boca Raton
Book Google Scholar
Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (vol 1: Long Papers), vol 1, pp 344–354
Baber C, Smith P, Cross J, Hunter JE, McMaster R (2006) Crime scene investigation as distributed cognition. Pragmat Cognit 14(2):357–385
Article Google Scholar
Basher ARM, Fung BC (2014) Analyzing topics and authors in chat logs for crime investigation. Knowl Inf Syst 39(2):351–381
Article Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606
Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai A (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. CoRR. arXiv:1607.06520
Chau M, Xu JJ, Chen H (2002) Extracting meaningful entities from police narrative reports. In: Proceedings of the 2002 annual national conference on Digital government research, pp 1–5. Digital Government Society of North America
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Decherchi S, Tacconi S, Redi J, Leoncini A, Sangiacomo F, Zunino R (2009) Text clustering for digital forensics analysis. In: Herrero Á, Gastaldo P, Zunino R, Corchado E (eds) Computational intelligence in security for information systems, Springer, Berlin, pp 29–36
Dheeru D, Karra TE (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports. Accessed 26 Mar 2019
Dobash RE, Dobash RP (1984) The nature and antecedents of violent events. Br J Criminol 24(3):269–288
Article Google Scholar
Fast E, McGrath W, Rajpurkar P, Bernstein MS (2016) Augur: Mining human behaviors from fiction to power interactive systems. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 237–247
Galgani F, Compton P, Hoffmann A (2012) Citation based summarisation of legal texts. In: Pacific rim international conference on artificial intelligence. Springer, Berlin, pp 40–52
Helbich M, Hagenauer J, Leitner M, Edwards R (2013) Exploration of unstructured narrative crime reports: an unsupervised neural network and point pattern analysis approach. Cartogr Geogr Inf Sci 40(4):326–336
Article Google Scholar
Insititute ALI (2018) Austlii: Free, comprehensive and independent access to australasian law. www.austlii.edu.au. Accessed 7 May 2018
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Article Google Scholar
Keyvanpour MR, Javideh M, Ebrahimi MR (2011) Detecting and investigating crime by means of data mining: a general crime matching framework. Proc Comput Sci 3:872–880
Article Google Scholar
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Article Google Scholar
Ku CH, Iriberri A, Leroy G (2008) Natural language processing and e-government: crime information extraction from heterogeneous data sources. In: Proceedings of the 2008 international conference on Digital government research, pp 162–170. Digital Government Society of North America
Kumar R, Raghuveer K (2012) Legal document summarization using latent dirichlet allocation. Int J Comput Sci Telecommun 3:114–117
Google Scholar
Lenci A (2008) Distributional semantics in linguistic and cognitive research. Ital J Linguist 20(1):1–31
Google Scholar
Liu CL, Liao TM (2005) Classifying criminal charges in Chinese for web-based legal services. In: Asia-pacific web conference, pp 64–75. Springer, Berlin
Liu H, Chen S, Kubota N (2013) Intelligent video systems and analytics: a survey. IEEE Trans Ind Inf 9(3):1222–1233
Article Google Scholar
Liu X, Jian C, Lu CT (2010) A spatio-temporal-textual crime search engine. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp 528–529. ACM
Lu Q, Conrad JG, Al-Kofahi K, Keenan W (2011) Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM international conference on Information and knowledge management. ACM
Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. arXiv preprint arXiv:1707.09168
Majone G (1989) Evidence, argument, and persuasion in the policy process. Yale University Press, London
Google Scholar
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
MATH Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. Curran Associates Inc., pp. 3111–3119
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Invest 30(1):3–26
Article Google Scholar
Pandita R, Xiao X, Zhong H, Xie T, Oney S, Paradkar A (2012) Inferring method specifications from natural language API descriptions. In: 2012 34th international conference on software engineering (ICSE), pp 815–825. IEEE
Polsley S, Jhunjhunwala P, Huang R (2016) Casesummarizer: A system for automated summarization of legal texts. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, pp 258–262
Raghuveer K (2012) Legal documents clustering using latent dirichlet allocation. IAES Int J Artif Intell 2(1):34–37
Google Scholar
Richard T (2009) Qualitative versus quantitative methods: understanding why qualitative methods are superior for criminology and criminal justice. J Theor Philos Criminol 1(1):38–58
Google Scholar
Rowlingson BS, Diggle PJ (1993) Splancs: spatial point pattern analysis code in s-plus. Comput Geosci 19(5):627–655
Article Google Scholar
Soria C, Bartolini R, Lenci A, Montemagni S, Pirrelli V (2007) Automatic extraction of semantics in law documents. In: Proceedings of the V legislative XML workshop, pp 253–266
Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: An open multilingual graph of general knowledge. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17). AAAI Press, pp 4444–4451
Stotland E, Pendleton M (1989) Workload, stress, and strain among police officers. Behav Med 15(1):5–17
Article Google Scholar
Sulea OM, Zampieri M, Vela M, van Genabith J (2017) Predicting the law area and decisions of French supreme court cases. arXiv preprint arXiv:1708.01681
Sun Z, Lim EP, Chang K, Ong TK, Gunaratna RK (2005) Event-driven document selection for terrorism information extraction. In: International conference on intelligence and security informatics, pp. 37–48. Springer
Tao Y, Papadias D (2001) Efficient historical r-trees. In: Thirteenth international conference on scientific and statistical database management, 2001. SSDBM 2001. Proceedings, pp 223–232. IEEE
Vig J, Sen S, Riedl J (2012) The tag genome: Encoding community knowledge to support novel interaction. ACM Trans Interact Intell Syst (TIIS) 2(3):13
Google Scholar
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning, pp. 977–984. ACM
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Google Scholar
Zheng R, Qin Y, Huang Z, Chen H (2003) Authorship analysis in cybercrime investigation. In: International conference on intelligence and security informatics, pp. 59–73. Springer, Berlin

Download references

Acknowledgements

We acknowledge Data to Decisions CRC (D2D-CRC) for funding this research.

Author information

Authors and Affiliations

UNSW Sydney, Sydney, NSW, 2052, Australia
Shayan Zamanirad, Boualem Benatallah & Moshe Chai Barukh
Universidad Católica Nuestra Señora de la Asunción, Tte. Cantaluppi y G. Molinas, Asunción, Paraguay
Carlos Rodriguez
QANTAS Airways, Sydney, NSW, 2020, Australia
Reza Nouri

Authors

Shayan Zamanirad
View author publications
You can also search for this author in PubMed Google Scholar
Boualem Benatallah
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Chai Barukh
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Reza Nouri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Rodriguez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Carlos Rodríguez, Reza Nouri: This work was done while the authors were at UNSW Sydney.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zamanirad, S., Benatallah, B., Barukh, M.C. et al. Dynamic event type recognition and tagging for data-driven insights in law-enforcement. Computing 102, 1627–1651 (2020). https://doi.org/10.1007/s00607-020-00791-z

Download citation

Received: 01 April 2019
Accepted: 08 January 2020
Published: 31 January 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00607-020-00791-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic event type recognition and tagging for data-driven insights in law-enforcement

Abstract

Access this article

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

Automatic event detection in football using tracking data

The integration and implications of artificial intelligence in forensic science

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Dynamic event type recognition and tagging for data-driven insights in law-enforcement

Abstract

Access this article

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

Automatic event detection in football using tracking data

The integration and implications of artificial intelligence in forensic science

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation