RulingBR: A Summarization Dataset for Legal Texts

de Vargas Feijó, Diego; Moreira, Viviane Pereira

doi:10.1007/978-3-319-99722-3_26

RulingBR: A Summarization Dataset for Legal Texts

Conference paper
First Online: 26 August 2018

1014 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Abstract

Text summarization consists in generating a shorter version of an input document, which captures its main ideas. Despite the recent developments in this area, most of the existing techniques have been tested mostly in English and Chinese, due in part to the low availability of datasets in other languages. In addition, experiments have been run mostly on collections of news articles, which could lead to some bias in the research. In this paper, we address both these limitations by creating a dataset for the summarization of legal texts in Portuguese. The dataset, called RulingBR, contains about 10K rulings from the Brazilian Federal Supreme Court. We describe how the dataset was assembled and we also report on the results of standard summarization methods which may serve as a baseline for future works.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://duc.nist.gov/.
2.
http://www.stf.jus.br/.

References

Aleixo, P., Pardo, T.A.S.: CSTNews: um córpus de textos jornalísticos anotados segundo a teoria discursiva multidocumento CST (cross-document structure theory) (2008)
Google Scholar
Barrios, F., López, F., Argerich, L., Wachenchauzer, R.: Variations of the similarity function of TextRank for automated summarization. arXiv preprint arXiv:1602.03606 (2016)
Belica, M.: Sumy: module for automatic summarization of text documents and HTML pages, April 2018. https://github.com/miso-belica/sumy
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, COLING-ACL 2006, pp. 69–72. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Chen, D., Bolton, J., Manning, C.D.: A thorough examination of the CNN/Daily mail reading comprehension task. CoRR abs/1606.02858 (2016). http://arxiv.org/abs/1606.02858
Collovini, S., Carbonel, T.I., Fuchs, J., Coelho, J.C., Rino, L., Vieira, R.: Summ-it: Um corpus anotado com informações discursivas visando à sumarização automática. In: V Workshop em Tecnologia da Informação e da Linguagem Humana, Congresso da SBC, pp. 1605–1614 (2007)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 340–348. Association for Computational Linguistics (2010)
Google Scholar
Huyck, C., Orengo, V.: A stemming algorithm for the Portuguese language. In: International Symposium on String Processing and Information Retrieval, SPIRE, p. 0186, November 2001
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Article MathSciNet Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of EMNLP 2004 and the 2004 Conference on Empirical Methods in Natural Language Processing, July 2004
Google Scholar
Nallapati, R., Xiang, B., Zhou, B.: Sequence-to-sequence RNNs for text summarization. CoRR abs/1602.06023 (2016). http://arxiv.org/abs/1602.06023
Napoles, C., Gormley, M., Van Durme, B.: Annotated gigaword. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, AKBC-WEKEX 2012, pp. 95–100. Association for Computational Linguistics, Stroudsburg (2012)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, pp. 161–172 (1998). citeseer.nj.nec.com/page98pagerank.html
Pardo, T.A.S., Rino, L.H.M.: Temário: Um corpus para sumarização automática de textos. Universidade de São Carlos, Relatório Técnico, São Carlos (2003)
Google Scholar
Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English gigaword fifth edition, linguistic data consortium. Google Scholar (2011)
Google Scholar
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010
Google Scholar
ScrapingHub: Scrapy - a fast and powerful scraping and web crawling framework (2018). https://scrapy.org
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)
Xiao: PyTeaser: Summarizes news articles, April 2018. https://github.com/xiaoxu193/PyTeaser
Xin Pan, P.L.: Models: models and examples built with TensorFlow, April 2018. https://github.com/tensorflow/models
Yin, W., Pei, Y.: Optimizing sentence modeling and selection for document summarization. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 1383–1389. AAAI Press (2015)
Google Scholar
Zhang, X., Lapata, M.: Sentence simplification with deep reinforcement learning. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017, pp. 584–594 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
Diego de Vargas Feijó & Viviane Pereira Moreira

Authors

Diego de Vargas Feijó
View author publications
You can also search for this author in PubMed Google Scholar
Viviane Pereira Moreira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego de Vargas Feijó .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Vargas Feijó, D., Moreira, V.P. (2018). RulingBR: A Summarization Dataset for Legal Texts. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_26
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics