Skip to main content

Representing Standard Text Formulations as Directed Graphs

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 Workshops (ICDAR 2021)

Abstract

In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The sources for the documents compiled for both corpora will be published on our website: http://textmining.wp.hs-hannover.de/juver.html. Likewise, we publish the developed methods and also the document collections on our project page.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)

    Google Scholar 

  2. Burger, H., Dobrovol’skij, D., Kühn, P., Norrick, N.R.: Phraseologie: Objektbereich, Terminologie und Forschungsschwerpunkte. In: Burger, H., Dobrovol’skij, D., Kühn, P., Norrick, N.R. (eds.) Phraseologie. Ein internationales Handbuch zeitgenössischer Forschung, pp. 1–10. Mouton de Gruyter, Berlin (2007)

    Google Scholar 

  3. Burgess, M., et al.: The legislative influence detector: finding text reuse in state legislation. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016. pp. 57–66. ACM Press (2016). https://doi.org/10.1145/2939672.2939697

  4. Busse, D.: Sprache und Recht, pp. 383–393. J.B. Metzler, Stuttgart (2018). https://doi.org/10.1007/978-3-476-04624-6_37

  5. Clough, P., Gaizauskas, R., Piao, S.S.L., Wilks, Y.: METER: MEasuring TExt reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002). http://dx.doi.org/10.3115/1073083.1073110. Conference Name: ACL-02 Library Catalog: eprints.whiterose.ac.uk Meeting Name: ACL-02 Pages: 152–159 Place: Philadelphia Publisher: ACL

  6. Engberg, J.: Signalfunktion und Kodierungsgrad von sprachlichen Merkmalen in Gerichtsurteilen. HERMES J. Lang. Commun. Bus. 65–82 (1992). https://doi.org/10.7146/hjlcb.v5i9.21506

  7. Engberg, J.: Does routine formulation change meaning? - The impact of genre on word semantics in the legal domain, pp. 31–48. De Gruyter Mouton (2000). https://www.degruyter.com/view/book/9783110826005/10.1515/9783110826005.31.xml

  8. Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd Int. Conference on Computational Linguistics, COLING 2010, pp. 322–330. Association for Computational Linguistics (2010)

    Google Scholar 

  9. Josi, F., Wartena, C.: Structural analysis of contract renewals. In: Proceedings of the ACM CIKM 2018 Workshops, Turin (2018)

    Google Scholar 

  10. Josi, F., Wartena, C., Ulrich, H.: Identifizierung von häufig vorkommenden Textabschnitten in juristischen Korpora. In: 56th Linguistics Colloquium, vol. 56. Peter Lang (2021, to appear)

    Google Scholar 

  11. Kjær, A.L.: On the structure of legal knowledge: the importance of knowing legal rules for understanding legal texts. In: Language, Text, and Knowledge. Mental Models of Expert Communication, pp. 127–161 (2000)

    Google Scholar 

  12. Kliche, F., Blessing, A., Heid, U., Sonntag, J.: The eIdentity text ExplorationWorkbench. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA) (2014)

    Google Scholar 

  13. Lindroos, E.: Dissertation: Im Namen des Gesetzes. Eine vergleichende rechtslinguistische Untersuchung zur Formelhaftigkeit in deutschen und finnischen Strafurteilen. Fachsprache 37(3), 218–222 (2015). https://doi.org/10.24989/fs.v37i3-4.1293

  14. Ma, D., Chen, C., Golshan, B., Tan, W.C.: Essentia: mining domain-specific paraphrases with word-alignment graphs. In: Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 52–57. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-5307

  15. Płomińska, M.: Routine expressions in German legal texts - an attempt at typology. Colloquia Germanica Stetinensia 29, 239–253 (2020). https://doi.org/10.18276/cgs.2020.29-13

  16. Sailer, M.: Idiom and phraseology. In: Aronoff, M. (ed.) Oxford Bibliographies in Linguistics. Oxford University Press, New York (2013). https://doi.org/10.1093/obo/9780199772810-0137

  17. Searle, J.R.: A taxonomy of illocutionary acts. Language, mind, and knowledge 07 (1975). http://conservancy.umn.edu/handle/11299/185220. Accepted 2017–03-16T18:32:14Z Publisher: University of Minnesota Press, Minneapolis

  18. Sultan, M.A., Bethard, S., Sumner, T.: Back to basics for monolingual alignment: exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2, 219–230 (2014). https://doi.org/10.1162/tacl_a_00178

    Article  Google Scholar 

  19. Wahl, A., Gries, S.T.: Computational extraction of formulaic sequences from corpora. Comput. Phraseol. 24, 83 (2020)

    Google Scholar 

  20. Wise, M.J.: Neweyes: a system for comparing biological sequences using the running Karp-Rabin greedy string-tiling algorithm. In: Proceedings. International Conference on Intelligent Systems for Molecular Biology, vol. 3, pp. 393–401 (1995)

    Google Scholar 

  21. Woźniak, J.: Pragmatische Phraseologismen in ausgewählten Rechtstexten-ein Systematisierungsversuch. Lingwistyka Stosowana/Applied Linguistics/Angewandte Linguistik, pp. 149–162 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frieda Josi .

Editor information

Editors and Affiliations

A Appendices

A Appendices

1.1 A.1 Sources for Case Law Corpus

  1. 1.

    Bundesgerichtshof (BGH) – Decisions from criminal law: https://www.hrr-strafrecht.de/hrr/db/abfrage.php?type=erweitert&sortieren=relevanz&sortrichtung=ab&gericht=BGH&aktenzeichen=&datvon=&datbis=&volltext=&kurzbeschreibung=&norm=StGB&medium=-&verknuepfung=und&sz=2.

1.2 A.2 Sources for Contract Corpus

  1. 1.

    Stadtverwaltung Hansestadt Hamburg – City administration of Hamburg: http://suche.transparenz.hamburg.de/dataset?q=vertrag&esq_title=&check_all_

  2. 2.

    Stadtverwaltung Bremen – City administration of Bremen: https://www.transparenz.bremen.de, Keyword: Vertrag

  3. 3.

    Cooperation contracts between universities and also between universities and service providers: We searched specifically for contract files on university websites and added them to Contract corpus.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Josi, F., Wartena, C., Heid, U. (2021). Representing Standard Text Formulations as Directed Graphs. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86159-9_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86158-2

  • Online ISBN: 978-3-030-86159-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics