Abstract
Legal professionals in Malawi rely on a limited number of textbooks, outdated law reports and inadequate library services. Most documents available are in image form, are un-structured, i.e. contain no useful legal meta-data, summaries, keynotes, and do not support a system of citation that is essential to legal research. While advances in document processing and machine learning have benefited many fields, legal research is still only marginally affected. In this interdisciplinary research, the authors build semi-automatic tools for creating a corpus of Malawi criminal law decisions annotated with legal meta-data, case and law citations. We used this corpus to extract legal meta-data, including law and case citations as used in Malawi by employing machine learning tools, spaCy and Gensim LDA. We set the foundation for a new methodology for classifying Malawi criminal case law according to the recently introduced International Classification of Crime for Statistical Purposes (ICCS).
Similar content being viewed by others
Data Availability Statement
The datasets generated and/or analysed during the current study are not publicly available but are available from the corresponding author on reasonable request. A sample of the dataset, and annotations are available on Zenodo (Taylor 2021).
References
Atkins S, Clear J, Ostler N (1992) Corpus design criteria, Lit. Linguistics Comput. 7(1). https://doi.org/10.1093/llc/7.1.1
Baumann T, Kerner H-J, Mischkowitz R, Hergenhahn H (2016) National implementation of the new International Classification of Crimes for Statistical Purposes (ICCS), WISTA 5 102. https://www.destatis.de/EN/Methods/WISTAScientificJournal/Downloads/national-implementation-052016.pdf;jsessionid=AB45D845E2D26F456AAE3D6F26FF5E5D.internet8721?__blob=publicationFile
Bisogno E, Dawson-Faber J, Jandl M (2015) The international classification of crime for statistical purposes: a new instrument to improve comparative criminological research. Eur J Criminol 12(5):535–550. https://doi.org/10.1177/1477370815600609
Blair DC, Maron ME (1990) Full-text information retrieval: further analysis and clarification. Inf Process Manag 26:437–447. https://doi.org/10.1016/0306-4573(90)90102-8
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation, J Mach Learn Res 3(4-5). https://doi.org/10.1016/b978-0-12-411519-4.00006-9
Chang D, Lim S.-S (2016) Korean Institute of Criminology, Framework research for the development of the Korean classification of crime. https://eng.kic.re.kr/brdartcl/boardarticleView.do?srch_menu_nix=w5mg0hj7&brd_id=BDIDX_736t9S87ryDqxzPmkp5987&cont_idx=842&srch_mu_lang=CDIDX00023
Gloppen S, Kanyongolo FE (2007) Courts and the poor in Malawi: Economic marginalization, vulnerability, and the law, Vol. 5, pp. 258–293. https://doi.org/10.1093/icon/mom002
Jackson P, Al-Kofahi K, Tyrrell A, Vachher A (2003) Information extraction from case law and retrieval of prior cases, in: Artificial Intelligence, Vol. 150, pp. 60–70. https://doi.org/10.1016/S0004-3702(03)00106-1
Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey, Multimedia Tools and Applications 78(11) 15169–15211. arXiv:1711.04305, https://doi.org/10.1007/s11042-018-6894-4
Jenkins J (2009) Where angels fear to tread: the problems of keyword search in e-discovery, (Accessed on 12/10/2020) . https://static2.ftitechnology.com/docs/white-papers/white-paper-ediscovery-keyword-search-2009.pdf
Kelsen H (1991) General theory of norms. Clarendon, Oxford. https://doi.org/10.1093/acprof:oso/9780198252177.001.0001
Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf Sci 477:15–29. https://doi.org/10.1016/j.ins.2018.10.006
Koniaris M, Papastefanatos G, Anagnostopoulos I (2018) Solon: a holistic approach for modelling, managing and mining legal sources. Algorithms 11(12):196. https://doi.org/10.3390/a11120196
Mail and Guardian staff reporter, Access to the law is key to Africa’s future, (Accessed on 12/10/2020). https://mg.co.za/article/2015-08-28-00-access-to-the-law-is-key-to-africas-future/
Malawi Legal Information Institute, (Accessed on 10/05/2020). https://malawilii.org/
Taylor A (2021) Mwcc: A corpus of malawi criminal cases - extract . https://doi.org/10.5281/zenodo.5501086.https://doi.org/10.5281/zenodo.5501086
United Nations Office on Drugs and Crime, International classification of crime for statistical purposes (ICCS) Version 1.0 (2015). https://www.unodc.org/documents/data-and-analysis/statistics/crime/ICCS/ICCS_English_2016_web.pdf
United Nations Office on Drugs and Crime, S. Dewan, Global update on iccs implementation/implementatin manual review process joint second meeting of un-cts focal points and iccs-tag memberslima, 07-08 june 2018 (2018). https://www.unodc.org/documents/data-and-analysis/statistics/Activities/Session_2_ICCS_-_Global_Update_and_Implementation_Manual.pdf
Van Dijk B (2017) Towards text analytical information enrichment in the analysis of crime, Master thesis, Eindhoven University of Technology, Eindhoven
Funding
Part of this work was funded by Artificial Intelligence 4 Development under Grant No. BA200207E.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Taylor, A.V., Mfutso-Bengo, E. Towards a machine understanding of Malawi legal text. Artif Intell Law 31, 1–11 (2023). https://doi.org/10.1007/s10506-021-09303-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-021-09303-6