Skip to main content

On Integrating and Classifying Legal Text Documents

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2020)

Abstract

This paper presents an exhaustive and unified dataset based on the European Court of Human Rights judgments since its creation. The interest of such database is explained through the prism of the researcher, the data scientist, the citizen and the legal practitioner. Contrarily to many datasets, the creation process, from the collection of raw data to the feature transformation, is provided under the form of a collection of fully automated and open-source scripts. It ensures reproducibility and a high level of confidence in the processed data, which is some of the most important issues in data governance nowadays. A first experimental campaign is performed to study some predictability properties and to establish baseline results on popular machine learning algorithms. The results are consistently good across the binary datasets with an accuracy comprised between 75.86% and 98.32% for a micro-average accuracy of 96.44%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://scdb.wustl.edu/.

  2. 2.

    https://hudoc.echr.coe.int/eng.

  3. 3.

    https://fantasyscotus.lexpredict.com/.

  4. 4.

    http://scdb.wustl.edu/.

  5. 5.

    https://hudoc.echr.coe.int/eng.

  6. 6.

    https://github.com/aquemy/ECHR-OD_predictions.

References

  1. Maastricht University Law and Tech Lab. https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab

  2. Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D., Lampos, V.: Predicting judicial decisions of the European Court of Human Rights: a natural language processing perspective. PeerJ. Comput. Sci. 2, e93 (2016)

    Article  Google Scholar 

  3. Ali, S.M.F., Wrembel, R.: From conceptual design to performance optimization of ETL workflows: current state of research and open problems. VLDB J. 26(6), 777–801 (2017). https://doi.org/10.1007/s00778-017-0477-2

    Article  Google Scholar 

  4. Ashley, K.D.: Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. Cambridge University Press (2017)

    Google Scholar 

  5. Atkinson, K., Bench-Capon, T.: Reasoning with legal cases: analogy or rule application? In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL), pp. 12–21. ACM (2019)

    Google Scholar 

  6. Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Intelligent assistance for data pre-processing. Comput. Stand. Interfaces 57, 101–109 (2018). https://doi.org/10.1016/j.csi.2017.05.004

    Article  Google Scholar 

  7. Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: PRESISTANT: learning based assistant for data pre-processing. Data Knowl. Eng. 123, 101727 (2019). https://doi.org/10.1016/j.datak.2019.101727

    Article  Google Scholar 

  8. Crone, S.F., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur. J. Oper. Res. 173(3), 781–800 (2006)

    Article  MathSciNet  Google Scholar 

  9. Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning, vol. 479. Wiley, Hoboken (2003)

    Book  Google Scholar 

  10. Guimerà, R., Sales-Pardo, M.: Justice Blocks and Predictability of U.S. Supreme Court Votes. PLoS ONE 6(11), e27188 (2011)

    Article  Google Scholar 

  11. Katz, D.M., Bommarito, M.J., Blackman, J.: A general approach for predicting the behavior of the Supreme Court of the United States. PLoS ONE 12(4), e0174698 (2017)

    Article  Google Scholar 

  12. Kelleher, J.D., Mac Namee, B., D’Arcy, A.: Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. MIT Press, Cambridge (2015)

    MATH  Google Scholar 

  13. Lemberger, P., Panico, I.: A primer on domain adaptation (2020)

    Google Scholar 

  14. Martin, A.D., Quinn, K.M., Ruger, T.W., Kim, P.T.: Competing approaches to predicting supreme court decision making. Perspect. Polit. 2(4), 761–767 (2004)

    Article  Google Scholar 

  15. Medvedeva, M., Vols, M., Wieling, M.: Using machine learning to predict decisions of the European Court of Human Rights. Artif. Intell. Law 28(2), 237–266 (2019). https://doi.org/10.1007/s10506-019-09255-y

    Article  Google Scholar 

  16. Pedregosa, F.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  17. Quemy, A.: Data science techniques for law and justice: current state of research and open problems. In: Kirikova, M., et al. (eds.) ADBIS 2017. CCIS, vol. 767, pp. 302–312. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67162-8_30

    Chapter  Google Scholar 

  18. Quemy, A.: Data pipeline selection and optimization. In: Proceedings of the International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) (2019)

    Google Scholar 

  19. Quemy, A.: ECHR-DB experiments, all detailed results (2019). https://github.com/echr-od/ECHR-OD_project_supplementary_material/blob/master/binary.md

  20. Quemy, A.: Predictions of the European Court of Human Rights (2019). https://github.com/aquemy/ECHR-OD_predictions

  21. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA (2010)

    Google Scholar 

  22. Rissland, E.L.: AI and similarity. IEEE Intell. Syst. 21(3), 39–49 (2006)

    Article  Google Scholar 

  23. Ruger, T.W., Kim, P.T., Martin, A.D., Quinn, K.M.: The supreme court forecasting project: legal and political science approaches to predicting supreme court decisionmaking. Columbia Law Rev. 104(4), 1150–1210 (2004)

    Article  Google Scholar 

  24. Yan, L., Wilson, C.: Developing AI for law enforcement in Singapore and Australia. Commun. ACM 63(4), 62 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Wrembel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Quemy, A., Wrembel, R. (2020). On Integrating and Classifying Legal Text Documents. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59003-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59002-4

  • Online ISBN: 978-3-030-59003-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics