Abstract
In law enforcement, investigators are typically tasked with analyzing large collections of evidences in order to identify and extract key information to support investigation cases. In this context, events are key elements that help understanding and reconstructing what happened from the collection of evidence items. With the ever increasing amount of data (e.g., e-mails and content from social media) gathered today as part of investigation tasks (in most part done manually), managing such amount of data can be challenging and prone to missing important details that could be of high relevance to a case. In this paper, we aim to facilitate the work of investigators through a framework for deriving insights from data. We focus on the auto-recognition and dynamic tagging of event types (e.g., phone calls) from (textual) evidence items, and propose a framework to facilitate these tasks and provide support for insights and discovery. The experimental results obtained by applying our approach to a real, legal dataset demonstrate the feasibility of our proposal by achieving good performance in the task of automatically recognizing and tagging event types of interest.
Similar content being viewed by others
Notes
We set an initial threshold to 70%; this parameter can be tuned as needed.
see earlier Footnote 1.
see earlier Footnote 4.
We choose bigrams and trigrams, as based on our observation, averaging the vector of more than 3 words together results in an embedding that is not semantically meaningful.
Term Frequency Inverse Document Frequency.
References
Al Mutawa N, Baggili I, Marrington A (2012) Forensic analysis of social networking applications on mobile devices. Digit Invest 9:S24–S33
Altman DG (1990) Practical statistics for medical research. CRC Press, Boca Raton
Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (vol 1: Long Papers), vol 1, pp 344–354
Baber C, Smith P, Cross J, Hunter JE, McMaster R (2006) Crime scene investigation as distributed cognition. Pragmat Cognit 14(2):357–385
Basher ARM, Fung BC (2014) Analyzing topics and authors in chat logs for crime investigation. Knowl Inf Syst 39(2):351–381
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606
Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai A (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. CoRR. arXiv:1607.06520
Chau M, Xu JJ, Chen H (2002) Extracting meaningful entities from police narrative reports. In: Proceedings of the 2002 annual national conference on Digital government research, pp 1–5. Digital Government Society of North America
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Decherchi S, Tacconi S, Redi J, Leoncini A, Sangiacomo F, Zunino R (2009) Text clustering for digital forensics analysis. In: Herrero Á, Gastaldo P, Zunino R, Corchado E (eds) Computational intelligence in security for information systems, Springer, Berlin, pp 29–36
Dheeru D, Karra TE (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports. Accessed 26 Mar 2019
Dobash RE, Dobash RP (1984) The nature and antecedents of violent events. Br J Criminol 24(3):269–288
Fast E, McGrath W, Rajpurkar P, Bernstein MS (2016) Augur: Mining human behaviors from fiction to power interactive systems. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 237–247
Galgani F, Compton P, Hoffmann A (2012) Citation based summarisation of legal texts. In: Pacific rim international conference on artificial intelligence. Springer, Berlin, pp 40–52
Helbich M, Hagenauer J, Leitner M, Edwards R (2013) Exploration of unstructured narrative crime reports: an unsupervised neural network and point pattern analysis approach. Cartogr Geogr Inf Sci 40(4):326–336
Insititute ALI (2018) Austlii: Free, comprehensive and independent access to australasian law. www.austlii.edu.au. Accessed 7 May 2018
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Keyvanpour MR, Javideh M, Ebrahimi MR (2011) Detecting and investigating crime by means of data mining: a general crime matching framework. Proc Comput Sci 3:872–880
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Ku CH, Iriberri A, Leroy G (2008) Natural language processing and e-government: crime information extraction from heterogeneous data sources. In: Proceedings of the 2008 international conference on Digital government research, pp 162–170. Digital Government Society of North America
Kumar R, Raghuveer K (2012) Legal document summarization using latent dirichlet allocation. Int J Comput Sci Telecommun 3:114–117
Lenci A (2008) Distributional semantics in linguistic and cognitive research. Ital J Linguist 20(1):1–31
Liu CL, Liao TM (2005) Classifying criminal charges in Chinese for web-based legal services. In: Asia-pacific web conference, pp 64–75. Springer, Berlin
Liu H, Chen S, Kubota N (2013) Intelligent video systems and analytics: a survey. IEEE Trans Ind Inf 9(3):1222–1233
Liu X, Jian C, Lu CT (2010) A spatio-temporal-textual crime search engine. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp 528–529. ACM
Lu Q, Conrad JG, Al-Kofahi K, Keenan W (2011) Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM international conference on Information and knowledge management. ACM
Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. arXiv preprint arXiv:1707.09168
Majone G (1989) Evidence, argument, and persuasion in the policy process. Yale University Press, London
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. Curran Associates Inc., pp. 3111–3119
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Invest 30(1):3–26
Pandita R, Xiao X, Zhong H, Xie T, Oney S, Paradkar A (2012) Inferring method specifications from natural language API descriptions. In: 2012 34th international conference on software engineering (ICSE), pp 815–825. IEEE
Polsley S, Jhunjhunwala P, Huang R (2016) Casesummarizer: A system for automated summarization of legal texts. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, pp 258–262
Raghuveer K (2012) Legal documents clustering using latent dirichlet allocation. IAES Int J Artif Intell 2(1):34–37
Richard T (2009) Qualitative versus quantitative methods: understanding why qualitative methods are superior for criminology and criminal justice. J Theor Philos Criminol 1(1):38–58
Rowlingson BS, Diggle PJ (1993) Splancs: spatial point pattern analysis code in s-plus. Comput Geosci 19(5):627–655
Soria C, Bartolini R, Lenci A, Montemagni S, Pirrelli V (2007) Automatic extraction of semantics in law documents. In: Proceedings of the V legislative XML workshop, pp 253–266
Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: An open multilingual graph of general knowledge. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17). AAAI Press, pp 4444–4451
Stotland E, Pendleton M (1989) Workload, stress, and strain among police officers. Behav Med 15(1):5–17
Sulea OM, Zampieri M, Vela M, van Genabith J (2017) Predicting the law area and decisions of French supreme court cases. arXiv preprint arXiv:1708.01681
Sun Z, Lim EP, Chang K, Ong TK, Gunaratna RK (2005) Event-driven document selection for terrorism information extraction. In: International conference on intelligence and security informatics, pp. 37–48. Springer
Tao Y, Papadias D (2001) Efficient historical r-trees. In: Thirteenth international conference on scientific and statistical database management, 2001. SSDBM 2001. Proceedings, pp 223–232. IEEE
Vig J, Sen S, Riedl J (2012) The tag genome: Encoding community knowledge to support novel interaction. ACM Trans Interact Intell Syst (TIIS) 2(3):13
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning, pp. 977–984. ACM
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
Zheng R, Qin Y, Huang Z, Chen H (2003) Authorship analysis in cybercrime investigation. In: International conference on intelligence and security informatics, pp. 59–73. Springer, Berlin
Acknowledgements
We acknowledge Data to Decisions CRC (D2D-CRC) for funding this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Carlos Rodríguez, Reza Nouri: This work was done while the authors were at UNSW Sydney.
Rights and permissions
About this article
Cite this article
Zamanirad, S., Benatallah, B., Barukh, M.C. et al. Dynamic event type recognition and tagging for data-driven insights in law-enforcement. Computing 102, 1627–1651 (2020). https://doi.org/10.1007/s00607-020-00791-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-020-00791-z