research-article

Ordering sentences and paragraphs with pre-trained encoder-decoder transformers and pointer ensembles

Authors:
Rémi Calizzano

DFKI GmbH, Berlin, Germany

DFKI GmbH, Berlin, Germany
View Profile

,
Malte Ostendorff

DFKI GmbH, Berlin, Germany

DFKI GmbH, Berlin, Germany
View Profile

,
Georg Rehm

DFKI GmbH, Berlin, Germany

DFKI GmbH, Berlin, Germany
View Profile

DocEng '21: Proceedings of the 21st ACM Symposium on Document EngineeringAugust 2021Article No.: 10Pages 1–9https://doi.org/10.1145/3469096.3469874

Published:16 August 2021Publication History

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

Pages 1–9

ABSTRACT

Passage ordering aims to maximise discourse coherence in document generation or document modification tasks such as summarisation or storytelling. This paper extends the passage ordering task from sentences to paragraphs, i.e., passages with multiple sentences. Increasing the passage length increases the task's difficulty. To account for this, we propose the combination of a pre-trained encoder-decoder Transformer model, namely BART, with variations of pointer networks. We empirically evaluate the proposed models for sentence and paragraph ordering. Our best model outperforms previous state of the art methods by 0.057 Kendall's Tau on one of three sentence ordering benchmarks (arXiv, VIST, ROC-Story). For paragraph ordering, we construct two novel datasets from Wikipedia and CNN-DailyMail on which we achieve 0.67 and 0.47 Kendall's Tau. The best model variation utilises multiple pointer networks in an ensemble-like fashion. We hypothesise that the use of multiple pointers better reflects the multitude of possible orders of paragraphs in more complex texts. Our code, data, and models are publicly available¹.

References

Regina Barzilay, Noemie Elhadad, and Kathleen R. McKeown. 2001. Sentence Ordering in Multidocument Summarization. In Proceedings of the First International Conference on Human Language Technology Research (San Diego) (HLT '01). Association for Computational Linguistics, USA, 1--7. Google ScholarDigital Library
Regina Barzilay and Mirella Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics 34, 1 (2008), 1--34.Google ScholarDigital Library
Yoshua Bengio and Yann LeCun (Eds.). 2015. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. https://iclr.cc/archive/www/doku.php%3Fid=iclr2015: accepted-main.htmlGoogle Scholar
Xinchi Chen, Xipeng Qiu, and Xuanjing Huang. 2016. Neural Sentence Ordering. arXiv (2016), arXiv-1607.Google Scholar
Baiyun Cui, Yingming Li, Ming Chen, and Zhongfei Zhang. 2018. Deep Attentive Sentence Ordering Network. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4340--4349. Google ScholarCross Ref
Baiyun Cui, Yingming Li, and Zhongfei Zhang. 2020. BERT-enhanced Relational Sentence Ordering Network. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6310--6320. Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. Google ScholarCross Ref
Jingjing Gong, Xinchi Chen, Xipeng Qiu, and Xuanjing Huang. 2016. End-to-End Neural Sentence Ordering Using Pointer Network. arXiv (2016), arXiv-1611.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Ting-Hao Kenneth Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, and Margaret Mitchell. 2016. Visual Storytelling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 1233--1239. Google ScholarCross Ref
Monisha Kanakaraj and Ram Mohana Reddy Guddeti. 2015. NLP based sentiment analysis on Twitter data using ensemble classifiers. In 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN). IEEE, Chennai, India, 1--5. Google ScholarCross Ref
Vijay Kotu and Bala Deshpande. 2015. Chapter 2: Data Mining Process. Predictive Analytics and Data Mining. Elsevier (2015), 26.Google Scholar
Mirella Lapata. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1 (Sapporo, Japan) (ACL '03). Association for Computational Linguistics, USA, 545--552. Google ScholarDigital Library
Mirella Lapata. 2006. Automatic Evaluation of Information Ordering: Kendall's Tau. Computational Linguistics 32, 4 (2006), 471--484.Google ScholarDigital Library
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871--7880. Google ScholarCross Ref
Jiwei Li and Dan Jurafsky. 2017. Neural Net Models of Open-domain Discourse Coherence. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 198--209. Google ScholarCross Ref
Lajanugen Logeswaran, Honglak Lee, and Dragomir Radev. 2018. Sentence Ordering and Coherence Modeling using Recurrent Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018). https://ojs.aaai.org/index.php/AAAI/article/view/11997Google Scholar
Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. 2016. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 839--849. Google ScholarCross Ref
Robi Polikar. 2006. Ensemble based systems in decision making. IEEE Circuits and systems magazine 6, 3 (2006), 21--45.Google ScholarCross Ref
Shrimai Prabhumoye, Ruslan Salakhutdinov, and Alan W Black. 2020. Topological Sort for Sentence Ordering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2783--2792. Google ScholarCross Ref
A. Radford. 2018. Improving Language Understanding by Generative Pre-Training.Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julian Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, Sören Räuchle, Jens Rauenbusch, Lisa Rutenburg, Andre Schmidt, Mikka Wild, Henry Hoffmann, Julian Fink, Sarah Schulz, Jurica Seva, Joachim Quantz, Joachim Böttger, Josefine Matthey, Rolf Fricke, Jan Thomsen, Adrian Paschke, Jamal Al Qundus, Thomas Hoppe, Naouel Karam, Frauke Weichhardt, Christian Fillies, Clemens Neudecker, Mike Gerber, Kai Labusch, Vahid Rezanezhad, Robin Schaefer, David Zellhöfer, Daniel Siewert, Patrick Bunk, Lydia Pintscher, Elena Aleynikova, and Franziska Heine. 2020. QURATOR: Innovative Technologies for Content and Data Curation. In Proceedings of QURATOR 2020 - The conference for intelligent content solutions. Conference on Digital Curation Technologies (QURATOR-2020), January 20-21, Berlin, Germany, Adrian Paschke, Clemens Neudecker, Georg Rehm, Jamal Al Qundus, and Lydia Pintscher (Eds.). CEUR Workshop Proceedings. Volume 2535.Google Scholar
Georg Rehm, Karolina Zaczynska, Julián Moreno-Schneider, Malte Ostendorff, Peter Bourgonje, Maria Berger, Jens Rauenbusch, André Schmidt, and Mikka Wild. 2020. Towards Discourse Parsing-inspired Semantic Storytelling. arXiv e-prints (2020), arXiv-2004.Google Scholar
Julian Risch, A. Stoll, Marc Ziegele, and Ralf Krestel. 2019. hpiDEDIS at GermEval 2019: Offensive Language Identification using a German BERT model. In KONVENS. KONVENS, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.Google Scholar
Lior Rokach. 2010. Ensemble-based classifiers. Artificial intelligence review 33, 1-2 (2010), 1--39.Google Scholar
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1073--1083. Google ScholarCross Ref
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 3104--3112.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, undefinedukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000--6010.Google ScholarDigital Library
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc., Palais des Congrès de Montréal, Montréal CANADA. https://proceedings.neurips.cc/paper/2015/file/29921001f2f04bd3baee84a12e98098f-Paper.pdfGoogle ScholarDigital Library
Tianming Wang and Xiaojun Wan. 2019. Hierarchical Attention Networks for Sentence Ordering. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 7184--7191. Google ScholarDigital Library
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv abs/1910.03771 (2019).Google Scholar

Index Terms

Ordering sentences and paragraphs with pre-trained encoder-decoder transformers and pointer ensembles
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
      2. Document structure
    2. Retrieval models and ranking
      1. Language models

Recommendations

Methods to Identify Topic Sentences in Paragraphs in TESOL
ICDEL '21: Proceedings of the 2021 6th International Conference on Distance Education and Learning

In current Chinese TOESOL education, many university students put more efforts on words recitation, but taking little attention of the logical structuring of a text. Therefore, the necessity of topic sentence is of great importance. As it portrays the ...
Read More
DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content. Query-aware multi-video summarization is a promising technique that caters to this demand. In this ...
Read More
Injecting Multiple Psychological Features into Standard Text Summarisers
CERI '16: Proceedings of the 4th Spanish Conference on Information Retrieval

Automatic Text Summarisation is an essential technology to cope with the overwhelming amount of documents that are daily generated. Given an information source, such as a webpage or a news article, text summarisation consists of extracting content from ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering
August 2021
178 pages
ISBN:9781450385961
DOI:10.1145/3469096
General Chairs:
Patrick Healy
University of Limerick, Ireland
,
Mihai Bilauca
University of Limerick, Ireland
,
Program Chair:
Alexandra Bonnici
University of Malta, Malta
Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
discourse coherence
ordering
pointer networks
summarisation
transformers
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate178of537submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 148
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ordering sentences and paragraphs with pre-trained encoder-decoder transformers and pointer ensembles

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Methods to Identify Topic Sentences in Paragraphs in TESOL

DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization

Injecting Multiple Psychological Features into Standard Text Summarisers