Abstract
In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The sources for the documents compiled for both corpora will be published on our website: http://textmining.wp.hs-hannover.de/juver.html. Likewise, we publish the developed methods and also the document collections on our project page.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)
Burger, H., Dobrovol’skij, D., Kühn, P., Norrick, N.R.: Phraseologie: Objektbereich, Terminologie und Forschungsschwerpunkte. In: Burger, H., Dobrovol’skij, D., Kühn, P., Norrick, N.R. (eds.) Phraseologie. Ein internationales Handbuch zeitgenössischer Forschung, pp. 1–10. Mouton de Gruyter, Berlin (2007)
Burgess, M., et al.: The legislative influence detector: finding text reuse in state legislation. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016. pp. 57–66. ACM Press (2016). https://doi.org/10.1145/2939672.2939697
Busse, D.: Sprache und Recht, pp. 383–393. J.B. Metzler, Stuttgart (2018). https://doi.org/10.1007/978-3-476-04624-6_37
Clough, P., Gaizauskas, R., Piao, S.S.L., Wilks, Y.: METER: MEasuring TExt reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002). http://dx.doi.org/10.3115/1073083.1073110. Conference Name: ACL-02 Library Catalog: eprints.whiterose.ac.uk Meeting Name: ACL-02 Pages: 152–159 Place: Philadelphia Publisher: ACL
Engberg, J.: Signalfunktion und Kodierungsgrad von sprachlichen Merkmalen in Gerichtsurteilen. HERMES J. Lang. Commun. Bus. 65–82 (1992). https://doi.org/10.7146/hjlcb.v5i9.21506
Engberg, J.: Does routine formulation change meaning? - The impact of genre on word semantics in the legal domain, pp. 31–48. De Gruyter Mouton (2000). https://www.degruyter.com/view/book/9783110826005/10.1515/9783110826005.31.xml
Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd Int. Conference on Computational Linguistics, COLING 2010, pp. 322–330. Association for Computational Linguistics (2010)
Josi, F., Wartena, C.: Structural analysis of contract renewals. In: Proceedings of the ACM CIKM 2018 Workshops, Turin (2018)
Josi, F., Wartena, C., Ulrich, H.: Identifizierung von häufig vorkommenden Textabschnitten in juristischen Korpora. In: 56th Linguistics Colloquium, vol. 56. Peter Lang (2021, to appear)
Kjær, A.L.: On the structure of legal knowledge: the importance of knowing legal rules for understanding legal texts. In: Language, Text, and Knowledge. Mental Models of Expert Communication, pp. 127–161 (2000)
Kliche, F., Blessing, A., Heid, U., Sonntag, J.: The eIdentity text ExplorationWorkbench. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA) (2014)
Lindroos, E.: Dissertation: Im Namen des Gesetzes. Eine vergleichende rechtslinguistische Untersuchung zur Formelhaftigkeit in deutschen und finnischen Strafurteilen. Fachsprache 37(3), 218–222 (2015). https://doi.org/10.24989/fs.v37i3-4.1293
Ma, D., Chen, C., Golshan, B., Tan, W.C.: Essentia: mining domain-specific paraphrases with word-alignment graphs. In: Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 52–57. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-5307
Płomińska, M.: Routine expressions in German legal texts - an attempt at typology. Colloquia Germanica Stetinensia 29, 239–253 (2020). https://doi.org/10.18276/cgs.2020.29-13
Sailer, M.: Idiom and phraseology. In: Aronoff, M. (ed.) Oxford Bibliographies in Linguistics. Oxford University Press, New York (2013). https://doi.org/10.1093/obo/9780199772810-0137
Searle, J.R.: A taxonomy of illocutionary acts. Language, mind, and knowledge 07 (1975). http://conservancy.umn.edu/handle/11299/185220. Accepted 2017–03-16T18:32:14Z Publisher: University of Minnesota Press, Minneapolis
Sultan, M.A., Bethard, S., Sumner, T.: Back to basics for monolingual alignment: exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2, 219–230 (2014). https://doi.org/10.1162/tacl_a_00178
Wahl, A., Gries, S.T.: Computational extraction of formulaic sequences from corpora. Comput. Phraseol. 24, 83 (2020)
Wise, M.J.: Neweyes: a system for comparing biological sequences using the running Karp-Rabin greedy string-tiling algorithm. In: Proceedings. International Conference on Intelligent Systems for Molecular Biology, vol. 3, pp. 393–401 (1995)
Woźniak, J.: Pragmatische Phraseologismen in ausgewählten Rechtstexten-ein Systematisierungsversuch. Lingwistyka Stosowana/Applied Linguistics/Angewandte Linguistik, pp. 149–162 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendices
A Appendices
1.1 A.1 Sources for Case Law Corpus
-
1.
Bundesgerichtshof (BGH) – Decisions from criminal law: https://www.hrr-strafrecht.de/hrr/db/abfrage.php?type=erweitert&sortieren=relevanz&sortrichtung=ab&gericht=BGH&aktenzeichen=&datvon=&datbis=&volltext=&kurzbeschreibung=&norm=StGB&medium=-&verknuepfung=und&sz=2.
1.2 A.2 Sources for Contract Corpus
-
1.
Stadtverwaltung Hansestadt Hamburg – City administration of Hamburg: http://suche.transparenz.hamburg.de/dataset?q=vertrag&esq_title=&check_all_
-
2.
Stadtverwaltung Bremen – City administration of Bremen: https://www.transparenz.bremen.de, Keyword: Vertrag
-
3.
Cooperation contracts between universities and also between universities and service providers: We searched specifically for contract files on university websites and added them to Contract corpus.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Josi, F., Wartena, C., Heid, U. (2021). Representing Standard Text Formulations as Directed Graphs. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-86159-9_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86158-2
Online ISBN: 978-3-030-86159-9
eBook Packages: Computer ScienceComputer Science (R0)