Abstract
A large number of cross-references to various bodies of text are used in legal texts, each serving a different purpose. It is often necessary for authorities and companies to look into certain types of these citations. Yet, there is a lack of automatic tools to aid in this process. Recently, citation graphs have been used to improve the intelligibility of complex rule frameworks. We propose an algorithm that builds the citation graph from a document and automatically labels each edge according to its purpose. Our method uses the citing text only and thus works only on citations who’s purpose can be uniquely identified by their surrounding text. This framework is then applied to the US code. This paper includes defining and evaluating a standard gold set of labels that cover a vast majority of citation types which appear in the “US Code” but are still short enough for practical use. We also proposed a novel linear-chain conditional random field model that extracts the features required for labeling the citations from the surrounding text. We then analyzed the effectiveness of different clustering methods such as K-means and support vector machine to automatically label each citation with the corresponding label. Besides this, we talk about the practical difficulties of this task and give a comparison of human accuracy compared to our end-to-end algorithm.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
This dataset was also obtained during the annotation process, but lacked a semantic label for the citations. This reduces the chances of over fitting because the predicate extraction is learned on a different dataset than the dataset used for training the label classifier.
References
Adedjouma M, Sabetzadeh M, Briand LC (2014) Automated detection and resolution of legal cross references: approach and a study of luxembourg’s legislation. In: Requirements Engineering Conference (RE), 2014 IEEE 22nd International. IEEE, pp 63–72
Alonso O, Mizzaro S (2012) Using crowdsourcing for TREC relevance assessment. Inf Process Manag 48:1053–1066
Amir Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: learning to track multiple cues with long-term dependencies. arXiv preprint arXiv:1701.01909
Ashley K, Bjerke E, Potter M, Guclu H (2014) Statutory network analysis plus information retrieval. In: Proceedings of Second Workshop on Network Analysis in Law at the 27th Annual Conference on Legal Knowledge and Information Systems. NAil, pp 1–7
Association HLR (1996) The bluebook: a uniform system of citation. Harvard Law Review Association, Cambridge
Bird S, Klein E, Loper E (2009) Natural language processing with Python. O’Reilly Media Inc, Sebastopol
Branting LK (2017) Data-centric and logic-based models for automated legal problem solving. Artif Intell Law 25(1):5–27
Breaux TD, Antón AI (2007) A systematic method for acquiring regulatory requirements: a frame-based approach. In: RHAS-6, Delhi, India
Cao Z, Yu S, Ouyang B, Dalgleish F, Vuorenkoski A, Alsenas G, Principe J (2017) Marine animal classification with correntropy loss based multi-view learning. arXiv preprint arXiv:1705.01217
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46
Cornell Law School US Code. https://www.law.cornell.edu/uscode/text
de Maat E, Winkels R, van Engers T (2006) Automated detection of reference structures in law. In: van Engers TM (ed) Legal knowledge and information systems. Jurix 2006: the nineteenth annual conference. Frontiers in artificial intelligence and applications, vol 152. IOS Press, pp 41–50
de Maat E, Winkels R, van Engers T (2009) Making sense of legal texts. Form. Linguist. Law 212:225
Galgani F, Hoffmann A (2010) Lexa: towards automatic legal citation classification. In: AI 2010—Advances in Artificial Intelligence. Springer, Berlin, pp 445–454
Glaser B, Strauss A (1967) The discovery grounded theory: strategies for qualitative inquiry. Aldin, Chicago
Hamdaqa M, Hamou-Lhadj A (2009) Citation analysis: an approach for facilitating the understanding and the analysis of regulatory compliance documents. In: Sixth International Conference on Information Technology—New Generations, 2009. ITNG’09. IEEE, pp 278–283
Hamdaqa M, Hamou-Lhadj A (2011) An approach based on citation analysis to support effective handling of regulatory compliance. Future Gener Comput Syst 27:395–410
Harrington WG (1984) Brief history of computer-assisted legal research. Law Libr J 77:543
Jain A, Lopez-Aguilera E, Demirkol I (2017) Mobility management as a service for 5G networks. arXiv preprint arXiv:1705.09101
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 282–289
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Maxwell JC, Antón AI, Swire P, Riaz M, McCraw CM (2012) A legal cross-references taxonomy for reasoning about compliance requirements. Requir Eng 17:99–115
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems 26. Curran Associates, Inc., pp 3111–3119
Mollalo A, Alimohammadi A, Shirzadi M, Malek M (2015) Geographic information system-based analysis of the spatial and spatio-temporal distribution of zoonotic cutaneous leishmaniasis in Golestan Province, north-east of Iran. Zoonoses Public Health 62:18–28
Neale T (2013) Citation analysis of canadian case law. J. Open Access L. 1:1
Pollman T, Kane LA (2000) ALWD citation manual: a professional system of citation. UNLV School of Law, Las Vegas
Prakken H (1993) A logical framework for modelling legal argument. In: Proceedings of the 4th International Conference on Artificial Intelligence and Law. ACM, pp 1–9
Rissland E (1988) Artificial intelligence and legal reasoning: a discussion of the field and gardner’s book. AI Mag 9:45
Rodrıguez M, Goldberg S, Wang DZ (2016) Consensus maximization fusion of probabilistic information extractors. In: Proceedings of NAACL-HLT, pp 1208–1216
Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification versus manual review. J Am Soc Inf Sci Technol 61:70–80
Sadeghian A, Lim D, Karlsson J, Li J (2015) Automatic target recognition using discrimination based on optimal transport. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2604–2608
Sadeghian A, Sundaram L, Wang D, Hamilton W, Branting K, Pfeifer C (2016) Semantic edge labeling over legal citation graphs. In: LTDCA
Sharghi A, Laurel JS, Gong B (2017) Query-focused video summarization: dataset, evaluation, and a memory network based approach. arXiv preprint arXiv:1707.04960
Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning, vol 2. Introduction to statistical relational learning. MIT Press
Tran OT, Ngo BX, Le Nguyen M, Shimazu A (2014) Automated reference resolution in legal texts. Artif Intell Law 22:29–60
Winkels R, Boer A, Vredebregt B, van Someren A (2014) Towards a legal recommender system. In: JURIX
Zhang P, Koppaka L (2007) Semantics-based legal citation network. In: Proceedings of the 11th International Conference on Artificial Intelligence and Law. ACM, pp 123–130
Acknowledgements
We thank two anonymous reviewers for their insightful feed back, which helped us improve this manuscript. In addition the authors would like to thank Vironica I Brown, Roman Diveev, Max Goldstein, Eva L Lauer, Nicholas W Long, Paul J Punzone and Joseph M Ragukonis for their contributions in the annotation process. We would also like to thank Benjamin Grider for his help in designing the graphical user interface for our system.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by UF CISE Data Science Research Lab, UF Law School and ICAIR Program.
Rights and permissions
About this article
Cite this article
Sadeghian, A., Sundaram, L., Wang, D.Z. et al. Automatic semantic edge labeling over legal citation graphs. Artif Intell Law 26, 127–144 (2018). https://doi.org/10.1007/s10506-018-9217-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-018-9217-1