skip to main content
10.1145/3503823.3503898acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

A Natural Language Processing Survey on Legislative and Greek Documents

Published:22 February 2022Publication History

ABSTRACT

Natural Language Processing is developing rapidly alongside the various complex applications that make use of it and they will depend on it even further in the future. It has many challenges that require the attention of both researchers and businesses. The state-of-the-art approaches usually involve the implementation of Deep Learning Neural Networks. Our work serves as a rigorous research of the bibliography on the field focusing on Legal and Greek documents. We also present the current challenges of the field and some future considerations.

References

  1. [n. d.]. Akoma Ntoso. Retrieved October 1, 2021 from http://www.akomantoso.org/Google ScholarGoogle Scholar
  2. [n. d.]. gr-nlp-toolkit. Retrieved October 1, 2021 from https://github.com/nlpaueb/gr-nlp-toolkitGoogle ScholarGoogle Scholar
  3. [n. d.]. Label Studio. Retrieved October 1, 2021 from https://labelstud.io/playground/Google ScholarGoogle Scholar
  4. [n. d.]. NLP Progress. Retrieved October 1, 2021 from https://nlpprogress.com/Google ScholarGoogle Scholar
  5. [n. d.]. NLTK. Retrieved October 1, 2021 from https://www.nltk.org/Google ScholarGoogle Scholar
  6. [n. d.]. Silk. Retrieved October 1, 2021 from http://silkframework.org/Google ScholarGoogle Scholar
  7. [n. d.]. Spacy. Retrieved October 1, 2021 from https://spacy.io/Google ScholarGoogle Scholar
  8. Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc-Pietro, and Vasileios Lampos. 2016. Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective. PeerJ Comput. Sci. 2(2016), e93.Google ScholarGoogle ScholarCross RefCross Ref
  9. Iosif Angelidis, Ilias Chalkidis, and Manolis Koubarakis. 2018. Named Entity Recognition, Linking and Generation for Greek Legislation. In JURIX.Google ScholarGoogle Scholar
  10. Jean-Michel Autebert, Jean Berstel, and Luc Boasson. 1997. Context-Free Languages and Pushdown Automata. Springer-Verlag, Berlin, Heidelberg, 111–174.Google ScholarGoogle Scholar
  11. Michalis Avgerinos Loutsaris, Zoi Lachana, Charalampos Alexopoulos, and Yannis Charalabidis. 2021. Legal Text Processing: Combing Two Legal Ontological Approaches through Text Mining. In DG.O2021: The 22nd Annual International Conference on Digital Government Research (Omaha, NE, USA) (DG.O’21). Association for Computing Machinery, New York, NY, USA, 522–532. https://doi.org/10.1145/3463677.3463730Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nikos Bartziokas, Thanassis Mavropoulos, and Constantine Kotropoulos. 2020. Datasets and Performance Metrics for Greek Named Entity Recognition. In 11th Hellenic Conference on Artificial Intelligence (Athens, Greece) (SETN 2020). Association for Computing Machinery, New York, NY, USA, 160–167. https://doi.org/10.1145/3411408.3411437Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristian Cardellino, Milagro Teruel, Laura Alonso Alemany, and Serena Villata. 2017. A low-cost, high-coverage legal named entity recognizer, classifier and linker. 9–18. https://doi.org/10.1145/3086512.3086514Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ilias Chalkidis and Ion Androutsopoulos. 2017. A Deep Learning Approach to Contract Element Extraction. In JURIX.Google ScholarGoogle Scholar
  15. Ilias Chalkidis, Ion Androutsopoulos, and Achilleas Michos. 2017. Extracting contract elements. Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law(2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 78–87. https://doi.org/10.18653/v1/W19-2209Google ScholarGoogle ScholarCross RefCross Ref
  17. Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. LEGAL-BERT: The Muppets straight out of Law School. arxiv:2010.02559 [cs.CL]Google ScholarGoogle Scholar
  18. Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, and Prodromos Malakasiotis. 2021. Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 226–241. https://doi.org/10.18653/v1/2021.naacl-main.22Google ScholarGoogle Scholar
  19. Ilias Chalkidis, Charalampos Nikolaou, Panagiotis Soursos, and Manolis Koubarakis. 2017. Modeling and Querying Greek Legislation Using Semantic Web Technologies. 591–606. https://doi.org/10.1007/978-3-319-58068-5_36Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning (Helsinki, Finland) (ICML ’08). Association for Computing Machinery, New York, NY, USA, 160–167. https://doi.org/10.1145/1390156.1390177Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 12 (Nov. 2011), 2493–2537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]Google ScholarGoogle Scholar
  23. Jacob Eisenstein. 2019. Introduction to Natural Language Processing. MIT Press.Google ScholarGoogle Scholar
  24. Ahmed Elnaggar, Christoph Gebendorfer, Ingo Glaser, and Florian Matthes. 2018. Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification. arxiv:1810.07513 [cs.CL]Google ScholarGoogle Scholar
  25. Ahmed Elnaggar, Robin Otto, and Florian Matthes. 2018. Deep Learning for Named-Entity Linking with Transfer Learning for Legal Documents. In Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference (Tokyo, Japan) (AICCC ’18). Association for Computing Machinery, New York, NY, USA, 23–28. https://doi.org/10.1145/3299819.3299846Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John Garofalakis, Konstantinos Plessas, and Athanasios Plessas. 2016. A semi-automatic system for the consolidation of Greek legislative texts. 1–6. https://doi.org/10.1145/3003733.3003735Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. John Garofalakis, Konstantinos Plessas, Athanasios Plessas, and Panoraia Spiliopoulou. 2018. A Project for the Transformation of Greek Legal Documents into Legal Open Data. In Proceedings of the 22nd Pan-Hellenic Conference on Informatics (Athens, Greece) (PCI ’18). Association for Computing Machinery, New York, NY, USA, 144–149. https://doi.org/10.1145/3291533.3291548Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yoav Goldberg and Graeme Hirst. 2017. Neural Network Methods in Natural Language Processing. Morgan and Claypool Publishers.Google ScholarGoogle Scholar
  29. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thomas Gordon. 2010. An Overview of the Legal Knowledge Interchange Format. 240–242. https://doi.org/10.1007/978-3-642-15402-7_30Google ScholarGoogle Scholar
  31. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural computation 9 (12 1997), 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Deepali Jain, Malaya Dutta Borah, and Anupam Biswas. 2020. Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach. In Forum for Information Retrieval Evaluation(Hyderabad, India) (FIRE 2020). Association for Computing Machinery, New York, NY, USA, 41–48. https://doi.org/10.1145/3441501.3441502Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nikitas Karanikolas. 2014. A Methodology for Building Simple but Robust Stemmers without Language Knowledge: Stemmer Configuration. Procedia - Social and Behavioral Sciences 147 (08 2014). https://doi.org/10.1016/j.sbspro.2014.07.113Google ScholarGoogle Scholar
  34. Mi-Young Kim, Ying Xu, and R. Goebel. 2015. A Convolutional Neural Network in Legal Question Answering.Google ScholarGoogle Scholar
  35. Marios Koniaris, George Papastefanatos, and Yannis Vassiliou. 2016. Towards Automatic Structuring and Semantic Indexing of Legal Documents. In Proceedings of the 20th Pan-Hellenic Conference on Informatics (Patras, Greece) (PCI ’16). Association for Computing Machinery, New York, NY, USA, Article 4, 6 pages. https://doi.org/10.1145/3003733.3003801Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. John Koutsikakis, Ilias Chalkidis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2020. GREEK-BERT: The Greeks visiting Sesame Street. 11th Hellenic Conference on Artificial Intelligence (Sep 2020). https://doi.org/10.1145/3411408.3411440Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning(ICML ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289.Google ScholarGoogle Scholar
  38. Christina Leber, Dan Yang, Luis Tari, Andrew Crapo, and Aravind Chandramouli. 2013. Using Semantics to Process Legal Document Updates. In Proceedings of the Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval (San Francisco, California, USA) (ESAIR ’13). Association for Computing Machinery, New York, NY, USA, 53–56. https://doi.org/10.1145/2513204.2513220Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michalis Avgerinos Loutsaris and Yannis Charalabidis. 2020. Legal Informatics from the Aspect of Interoperability: A Review of Systems, Tools and Ontologies. In Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance (Athens, Greece) (ICEGOV 2020). Association for Computing Machinery, New York, NY, USA, 731–737. https://doi.org/10.1145/3428502.3428611Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Qiang Lu, Jack G. Conrad, Khalid Al-Kofahi, and William Keenan. 2011. Legal Document Clustering with Built-in Topic Segmentation. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM ’11). Association for Computing Machinery, New York, NY, USA, 383–392. https://doi.org/10.1145/2063576.2063636Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Arpan Mandal, Raktim Chaki, Sarbajit Saha, Kripabandhu Ghosh, Arindam Pal, and Saptarshi Ghosh. 2017. Measuring Similarity among Legal Court Case Documents. In Proceedings of the 10th Annual ACM India Compute Conference (Bhopal, India) (Compute ’17). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3140107.3140119Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Eneldo Loza Mencía and Johannes Fürnkranz. 2008. Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain. In ECML/PKDD.Google ScholarGoogle Scholar
  43. Eneldo Mencía. 2009. Segmentation of Legal Documents. 88–97. https://doi.org/10.1145/1568234.1568245Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Marjan Mernik, Jan Heering, and Anthony Sloane. 2005. When and How to Develop Domain-Specific Languages. ACM Comput. Surv. 37 (12 2005), 316–. https://doi.org/10.1145/1118890.1118892Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yury Muravev. 2020. Machine translation and legal tech in legal translation training. 1–7. https://doi.org/10.1145/3446434.3446553Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. David Nadeau and Satoshi Sekine. 2007. A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes 30 (08 2007). https://doi.org/10.1075/li.30.1.03nadGoogle ScholarGoogle Scholar
  47. Jesus Manuel Niebla Zatarain. 2018. Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. SCRIPT-ed 15 (08 2018), 156–161. https://doi.org/10.2966/scrip.150118.156Google ScholarGoogle Scholar
  48. Natalya F. Noy. 2004. Semantic Integration: A Survey of Ontology-Based Approaches. SIGMOD Rec. 33, 4 (Dec. 2004), 65–70. https://doi.org/10.1145/1041410.1041421Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Stamatis Outsios, Christos Karatsalos, Konstantinos Skianis, and Michalis Vazirgiannis. 2020. Evaluation of Greek Word Embeddings. arxiv:1904.04032 [cs.CL]Google ScholarGoogle Scholar
  50. Girish Palshikar. 2012. Techniques for Named Entity Recognition: A Survey. Vol. 1. 191–. https://doi.org/10.4018/978-1-4666-3604-0.ch022Google ScholarGoogle Scholar
  51. Yannis Panagis, Urska Sadl, and Fabien Tarissan. 2017. Giving Every Case Its (Legal) Due - The Contribution of Citation Networks and Text Similarity Techniques to Legal Studies of European Union Law. In JURIX.Google ScholarGoogle Scholar
  52. Harris Papageorgiou, Prokopis Prokopidis, Voula Giouli, and Stelios Piperidis. 2000. A Unified POS Tagging Architecture and its Application to Greek. In Proceedings of the Second International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Athens, Greece. http://www.lrec-conf.org/proceedings/lrec2000/pdf/181.pdfGoogle ScholarGoogle Scholar
  53. Eleni Partalidou, Eleftherios Spyromitros-Xioufis, Stavros Doropoulos, Stavros Vologiannidis, and Konstantinos I. Diamantaras. 2019. Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy. arxiv:1912.10162 [cs.CL]Google ScholarGoogle Scholar
  54. Prokopis Prokopidis and Haris Papageorgiou. 2017. Universal Dependencies for Greek.Google ScholarGoogle Scholar
  55. Prokopis Prokopidis and Stelios Piperidis. 2020. A Neural NLP Toolkit for Greek. In 11th Hellenic Conference on Artificial Intelligence (Athens, Greece) (SETN 2020). Association for Computing Machinery, New York, NY, USA, 125–128. https://doi.org/10.1145/3411408.3411430Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arxiv:1910.10683 [cs.LG]Google ScholarGoogle Scholar
  57. Nils Reimers and Iryna Gurevych. 2017. Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks. arxiv:1707.06799 [cs.CL]Google ScholarGoogle Scholar
  58. Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics 8 (2020), 842–866. https://doi.org/10.1162/tacl_a_00349Google ScholarGoogle ScholarCross RefCross Ref
  59. Henok Sahilu and Solomon Atnafu. 2010. Change-Aware Legal Document Retrieval Model. In Proceedings of the International Conference on Management of Emergent Digital EcoSystems (Bangkok, Thailand) (MEDES ’10). Association for Computing Machinery, New York, NY, USA, 174–181. https://doi.org/10.1145/1936254.1936284Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Mike Schuster and Kuldip Paliwal. 1997. Bidirectional recurrent neural networks. Signal Processing, IEEE Transactions on 45 (12 1997), 2673 – 2681. https://doi.org/10.1109/78.650093Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Fabrizio Sebastiani. 2001. Machine Learning in Automated Text Categorization. Comput. Surveys 34 (04 2001), 1–47. https://doi.org/10.1145/505282.505283Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Rosa Stern, Benoît Sagot, and Frédéric Béchet. 2012. A Joint Named Entity Recognition and Entity Linking System. In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data. Association for Computational Linguistics, Avignon, France, 52–60. https://aclanthology.org/W12-0508Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web (Banff, Alberta, Canada) (WWW ’07). Association for Computing Machinery, New York, NY, USA, 697–706. https://doi.org/10.1145/1242572.1242667Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arxiv:1409.3215 [cs.CL]Google ScholarGoogle Scholar
  65. Dimitrios Tsarapatsanis and Nikolaos Aletras. 2021. On the Ethical Limits of Natural Language Processing on Legal Text. arxiv:2105.02751 [cs.CL]Google ScholarGoogle Scholar
  66. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arxiv:1706.03762 [cs.CL]Google ScholarGoogle Scholar
  67. Liu Xue, Song Qing, and Zhang Pengzhou. 2018. Relation Extraction Based on Deep Learning. 687–691. https://doi.org/10.1109/ICIS.2018.8466437Google ScholarGoogle Scholar

Index Terms

  1. A Natural Language Processing Survey on Legislative and Greek Documents
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics
            November 2021
            499 pages

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 February 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate190of390submissions,49%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format