Representing Standard Text Formulations as Directed Graphs

Josi, Frieda; Wartena, Christian; Heid, Ulrich

doi:10.1007/978-3-030-86159-9_34

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12917))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1902 Accesses

Abstract

In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Contratto – A Method for Transforming Legal Contracts into Formal Specifications

A Framework for Analyzing Legal Documents by Leveraging Knowledge Graphs

Notes

1.
The sources for the documents compiled for both corpora will be published on our website: http://textmining.wp.hs-hannover.de/juver.html. Likewise, we publish the developed methods and also the document collections on our project page.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)
Google Scholar
Burger, H., Dobrovol’skij, D., Kühn, P., Norrick, N.R.: Phraseologie: Objektbereich, Terminologie und Forschungsschwerpunkte. In: Burger, H., Dobrovol’skij, D., Kühn, P., Norrick, N.R. (eds.) Phraseologie. Ein internationales Handbuch zeitgenössischer Forschung, pp. 1–10. Mouton de Gruyter, Berlin (2007)
Google Scholar
Burgess, M., et al.: The legislative influence detector: finding text reuse in state legislation. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016. pp. 57–66. ACM Press (2016). https://doi.org/10.1145/2939672.2939697
Busse, D.: Sprache und Recht, pp. 383–393. J.B. Metzler, Stuttgart (2018). https://doi.org/10.1007/978-3-476-04624-6_37
Clough, P., Gaizauskas, R., Piao, S.S.L., Wilks, Y.: METER: MEasuring TExt reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002). http://dx.doi.org/10.3115/1073083.1073110. Conference Name: ACL-02 Library Catalog: eprints.whiterose.ac.uk Meeting Name: ACL-02 Pages: 152–159 Place: Philadelphia Publisher: ACL
Engberg, J.: Signalfunktion und Kodierungsgrad von sprachlichen Merkmalen in Gerichtsurteilen. HERMES J. Lang. Commun. Bus. 65–82 (1992). https://doi.org/10.7146/hjlcb.v5i9.21506
Engberg, J.: Does routine formulation change meaning? - The impact of genre on word semantics in the legal domain, pp. 31–48. De Gruyter Mouton (2000). https://www.degruyter.com/view/book/9783110826005/10.1515/9783110826005.31.xml
Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd Int. Conference on Computational Linguistics, COLING 2010, pp. 322–330. Association for Computational Linguistics (2010)
Google Scholar
Josi, F., Wartena, C.: Structural analysis of contract renewals. In: Proceedings of the ACM CIKM 2018 Workshops, Turin (2018)
Google Scholar
Josi, F., Wartena, C., Ulrich, H.: Identifizierung von häufig vorkommenden Textabschnitten in juristischen Korpora. In: 56th Linguistics Colloquium, vol. 56. Peter Lang (2021, to appear)
Google Scholar
Kjær, A.L.: On the structure of legal knowledge: the importance of knowing legal rules for understanding legal texts. In: Language, Text, and Knowledge. Mental Models of Expert Communication, pp. 127–161 (2000)
Google Scholar
Kliche, F., Blessing, A., Heid, U., Sonntag, J.: The eIdentity text ExplorationWorkbench. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA) (2014)
Google Scholar
Lindroos, E.: Dissertation: Im Namen des Gesetzes. Eine vergleichende rechtslinguistische Untersuchung zur Formelhaftigkeit in deutschen und finnischen Strafurteilen. Fachsprache 37(3), 218–222 (2015). https://doi.org/10.24989/fs.v37i3-4.1293
Ma, D., Chen, C., Golshan, B., Tan, W.C.: Essentia: mining domain-specific paraphrases with word-alignment graphs. In: Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 52–57. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-5307
Płomińska, M.: Routine expressions in German legal texts - an attempt at typology. Colloquia Germanica Stetinensia 29, 239–253 (2020). https://doi.org/10.18276/cgs.2020.29-13
Sailer, M.: Idiom and phraseology. In: Aronoff, M. (ed.) Oxford Bibliographies in Linguistics. Oxford University Press, New York (2013). https://doi.org/10.1093/obo/9780199772810-0137
Searle, J.R.: A taxonomy of illocutionary acts. Language, mind, and knowledge 07 (1975). http://conservancy.umn.edu/handle/11299/185220. Accepted 2017–03-16T18:32:14Z Publisher: University of Minnesota Press, Minneapolis
Sultan, M.A., Bethard, S., Sumner, T.: Back to basics for monolingual alignment: exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2, 219–230 (2014). https://doi.org/10.1162/tacl_a_00178
Article Google Scholar
Wahl, A., Gries, S.T.: Computational extraction of formulaic sequences from corpora. Comput. Phraseol. 24, 83 (2020)
Google Scholar
Wise, M.J.: Neweyes: a system for comparing biological sequences using the running Karp-Rabin greedy string-tiling algorithm. In: Proceedings. International Conference on Intelligent Systems for Molecular Biology, vol. 3, pp. 393–401 (1995)
Google Scholar
Woźniak, J.: Pragmatische Phraseologismen in ausgewählten Rechtstexten-ein Systematisierungsversuch. Lingwistyka Stosowana/Applied Linguistics/Angewandte Linguistik, pp. 149–162 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Applied Sciences and Arts Hanover, Expo Plaza 12, 30539, Hanover, Germany
Frieda Josi & Christian Wartena
University of Hildesheim, Lübecker Straße 3, 31141, Hildesheim, Germany
Ulrich Heid

Authors

Frieda Josi
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wartena
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Heid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frieda Josi .

Editor information

Editors and Affiliations

Boise State University, Boise, ID, USA
Elisa H. Barney Smith
Indian Statistical Institute, Kolkata, India
Umapada Pal

A Appendices

1.1 A.1 Sources for Case Law Corpus

1.
Bundesgerichtshof (BGH) – Decisions from criminal law: https://www.hrr-strafrecht.de/hrr/db/abfrage.php?type=erweitert&sortieren=relevanz&sortrichtung=ab&gericht=BGH&aktenzeichen=&datvon=&datbis=&volltext=&kurzbeschreibung=&norm=StGB&medium=-&verknuepfung=und&sz=2.

1.2 A.2 Sources for Contract Corpus

1.
Stadtverwaltung Hansestadt Hamburg – City administration of Hamburg: http://suche.transparenz.hamburg.de/dataset?q=vertrag&esq_title=&check_all_
2.
Stadtverwaltung Bremen – City administration of Bremen: https://www.transparenz.bremen.de, Keyword: Vertrag
3.
Cooperation contracts between universities and also between universities and service providers: We searched specifically for contract files on university websites and added them to Contract corpus.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Josi, F., Wartena, C., Heid, U. (2021). Representing Standard Text Formulations as Directed Graphs. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-86159-9_34
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86158-2
Online ISBN: 978-3-030-86159-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Representing Standard Text Formulations as Directed Graphs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Contratto – A Method for Transforming Legal Contracts into Formal Specifications

A Framework for Analyzing Legal Documents by Leveraging Knowledge Graphs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendices

A Appendices

1.1 A.1 Sources for Case Law Corpus

1.2 A.2 Sources for Contract Corpus

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships