Skip to main content

Extracting Definitions from Brazilian Legal Texts

  • Conference paper
Computational Science and Its Applications – ICCSA 2012 (ICCSA 2012)

Abstract

In order to avoid ambiguity and to ensure, as far as possible, a strict interpretation of law, legal texts usually define the specific lexical terms used within their discourse by means of normative rules. With an often large amount of rules in effect in a given domain, extracting these definitions manually would be a costly undertaking. This paper presents an approach to cope with this problem based in a variation of an automated technique of natural language processing of Brazilian Portuguese texts. For the sake of generality, the proposed solution was developed to address the more general problem of building a glossary from domain specific texts that contain definitions amongst their content. This solution was applied to a corpus of texts on the telecommunications regulations domain and the results are reported. The usual pipeline of natural language processing has been followed: preprocessing, segmentation, and part-of-speech tagging. A set of feature extraction functions is specified and used along with reference glossary information on whether or not a text fragment is a definition, to train a SVM classifier. At last, the definitions are extracted from the texts and evaluated upon a testing corpus, which also contains the reference glossary annotations on definitions. The results are then discussed in light of other definition extraction techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alarcón, R., Sierra, G., Bach, C.: Developing a Definitional Knowledge Extraction System. In: Proceedings of Third Language & Technology Conference, LTC 2007 (2007)

    Google Scholar 

  2. Alarcón, R., Sierra, G., Bach, C.: ECODE: A Definition Extraction System. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 382–391. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Alarcón, R., Sierra, G., Bach, C.: Description and evaluation of a definition extraction system for Spanish language. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 7–13. Association for Computational Linguistics, Borovets (2009)

    Google Scholar 

  4. Aluísio, S.M., Pinheiro, G., Finger, M., Nunes, M.G.V., Tagnin, S.E.: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation. In: Proceedings of Corpus Linguistics, Lancaster, UK, vol. 16, pp. 14–21 (2003)

    Google Scholar 

  5. Aluísio, S., Pelizzoni, J., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 110–117. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Aranha, M.I., Lima, J.A.O.: Coleção Brasileira de Direito das Telecomunicações, Grupos de Pesquisa. v. 3. Brasília, Brazil (2009)

    Google Scholar 

  7. Blair-Goldensohn, S., McKeown, K.R., Schlaikjer, A.H.: Answering definitional questions: A hybrid approach. New directions in question answering. AAAI Press (2004)

    Google Scholar 

  8. Borg, C., Rosner, M., Pace, G.J.: Towards Automatic Extraction of Definitions. In: Proceedings of the 5th Computer Science Annual Workshop, CSAW 2007 (2007)

    Google Scholar 

  9. Borg, C., Rosner, M., Pace, G.: Evolutionary algorithms for definition extraction. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 26–32. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  10. Branco, A., Silva, J.: Evaluating solutions for the rapid development of state-of-the-art POS taggers for Portuguese. In: Proceedings of the 4th Language Resources and Evaluation Conference, LREC 2004, Lisbon, Portugal, pp. 507–510 (2004)

    Google Scholar 

  11. BRASIL. Lei nº 8.666 (1993), http://www3.dataprev.gov.br/sislex/paginas/42/1993/8666.html

  12. BRASIL. Lei Complementar nº 95 (1998), http://www.lexml.gov.br/urn/urn:lex:br:federal:lei.complementar:1998-02-26;95

  13. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing – ANLC, pp. 152–155. Association for Computational Linguistics, Trento (1992)

    Chapter  Google Scholar 

  14. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(27) (2011), http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf

  15. Clark, A., Fox, C., Lappin, S. (Orgs.): The Handbook of Computational Linguistics and Natural Language Processing. John Wiley and Sons (2010)

    Google Scholar 

  16. Del Gaudio, R., Branco, A.: Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach. In: Neves, J., Santos, M.F., Machado, J.M. (eds.) EPIA 2007. LNCS (LNAI), vol. 4874, pp. 659–670. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Del Gaudio, R., Branco, A.: Extraction of definitions in portuguese: An imbalanced data set problem. In: Proceedings of Text Mining and Applications at EPIA (2009)

    Google Scholar 

  18. Demšar, J., Zupan, B., Leban, G., Curk, T.: Orange: From Experimental Machine Learning to Interactive Data Mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 537–539. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Fahmi, I., Bouma, G.: Learning to identify definitions using syntactic features. In: Proceedings of the Workshop on Learning Structured Information in Natural Language Applications, pp. 64–71. Association for Computational Linguistics, Trento (2006)

    Google Scholar 

  20. Feldman, R., Sanger, J.: The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press (2007)

    Google Scholar 

  21. Fernandes, A.D.: Answering definitional questions before they are asked. PhD Thesis. Massachusetts Institute of Technology, Cambridge, USA (2004)

    Google Scholar 

  22. Ferraresi, A., Zanchetta, E., Baroni, M., Bernardini, S.: Introducing and evaluating ukwac, a very large web-derived corpus of english. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4), pp. 47–54. Marrakech, Marrocos (2008)

    Google Scholar 

  23. Kiss, T., Strunk, J.: Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32(4), 485–525 (2006)

    Article  Google Scholar 

  24. Klavans, J.L., Muresan, S.: DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text. In: Proceedings of the AMIA Symposium, pp. 1049–1049 (2000)

    Google Scholar 

  25. Loper, E., Bird, S.: NLTK: the Natural Language Toolkit. In: Proceedings of the ACL 2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics – ETMTNLP, vol. 1, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2002)

    Chapter  Google Scholar 

  26. 26. Magnini, B.; Cappelli, A.; Tamburini, F.: Evaluation of natural language tools for italian: Evalita 2007. Proceedings of the International Language Resources and Evaluation Conference, LREC 2008, vol. 8, p. 2536-2543, 2008.

    Google Scholar 

  27. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Computational Linguistic 19(2), 313–330 (1993)

    Google Scholar 

  28. Marques, N.C., Lopes, J.G.P.: A Neural Network Approach to Portuguese Part-of-Speech Tagging. In: Garcia, L.S. (ed.) Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado. CEFET-PR, Curitiba (1996)

    Google Scholar 

  29. Miliaraki, S., Androutsopoulos, I.: Learning to identify single-snippet answers to definition questions. In: Proceedings of the 20th International Conference on Computational Linguistics - COLING 2004. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  30. Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1318–1327 (2010)

    Google Scholar 

  31. Pearson, J.: Terms in context. John Benjamins Publishing Company (1998)

    Google Scholar 

  32. Pinto, A.S., Oliveira, D.: Extracção de definições no Corpógrafo. Faculdade de Letras da Universidade do Porto, Portugal (2004), http://comum.rcaap.pt/bitstream/123456789/281/1/OliveiraPintoOut2004.pdf

  33. Przepiórkowski, A., Degórski, Ł., Wójtowicz, B.: Towards the automatic extraction of definitions in Slavic. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, pp. 43–50. Association for Computational Linguistics, Prague (2007)

    Chapter  Google Scholar 

  34. Rigutini, L., Diligenti, M., Maggini, M., Gori, M.: A Fully Automatic Crossword Generator. In: Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 362–367. IEEE Computer Society (2008)

    Google Scholar 

  35. Rondeau, G.: Introduction à la Terminologie, Québec, Gaëten Morin Editeur (1984)

    Google Scholar 

  36. Sager, J.C.: A practical course in terminology processing. J. Benjamins Pub. Co. (1990)

    Google Scholar 

  37. Saggion, H.: Identifying Definitions in Text Collections for Question Answering. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  38. Saggion, H.: Mining Profiles and Definitions with Natural Language Processing. In: Prado, H.A., Ferneda, E. (Orgs.) Emerging Technologies of Text Mining: Techniques and Applications, IGI Global, Hershey (2008)

    Google Scholar 

  39. Sang, E.T.K., Bouma, G., De Rijke, M.: Developing offline strategies for answering medical questions. In: Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, Pittsburgh, USA, pp. 41–45 (2005)

    Google Scholar 

  40. Sarmento, L., Maia, B., Santos, D.: The Corpógrafo – a Web-based environment for corpora research. In: Proceedings of the International Language Resources and Evaluation Conference, LREC 2004, pp. 449–452 (2004)

    Google Scholar 

  41. Shaw, W.C.: The Art of Debate. Allyn and Bacon, New York (1922)

    Google Scholar 

  42. Tanev, H., Negri, M., Magnini, B., Kouylekov, M.: The DIOGENE Question Answering System at CLEF-2004. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 435–445. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  43. Westerhout, E., Monachesi, P.: Extraction of Dutch definitory contexts for elearning purposes. In: Proceedings of Computational Linguistics in the Netherlands, CLIN 2006 (2006)

    Google Scholar 

  44. Wüster, E.: Die allgemeine Terminologielehre–ein Grenzgebiet zwischen Sprachwissenschaft, Logik, Ontologie, Informatik und den Sachwissenschaften. Linguistics 12(119), 61–106 (1974)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferneda, E., do Prado, H.A., Batista, A.H., Pinheiro, M.S. (2012). Extracting Definitions from Brazilian Legal Texts. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31137-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31137-6_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31136-9

  • Online ISBN: 978-3-642-31137-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics