Automatic Alignment of News Texts and Their Multi-document Summaries: Comparison among Methods

Agostini, Verônica; López Condori, Roque Enrique; Pardo, Thiago Alexandre Salgueiro

doi:10.1007/978-3-319-09761-9_25

Verônica Agostini²⁵,
Roque Enrique López Condori²⁵ &
Thiago Alexandre Salgueiro Pardo²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

644 Accesses
1 Citations

Abstract

Aligning texts and their multi-document summaries is the task of determining the correspondences among textual segments in the texts and in their corresponding summaries. The study of alignments allows a better understanding of the multi-document summarization process, which may subsidize new summarization models for producing more informative summaries. In this paper, we investigate some approaches for text-summary sentence alignment, including superficial, deep and hybrid approaches. Our results show that superficial approaches may obtain very good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agostini, V., Camargo, R.T., Di Felippo, A.: Manual Alignment of News Texts and their Multi-document Human Summaries. In: Aluísio, S.M., Tagnin, S.E.O. (eds.) New Language Technologies and Linguistic Research: A Two-Way Road, pp. 148–170. Cambridge Scholars Publishing (2014)
Google Scholar
Banko, M., Mittal, V., Kantrowitz, M., Goldstein, J.: Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans. In: The Proceedings of the 4th Conference of the Pacific Association for Computational Linguistics, 5 p. (1999)
Google Scholar
Barzilay, R., Elhadad, N.: Sentence Alignment for Monolingual Comparable Corpora. In: The Proceedings of the Empirical Methods for Natural Language, pp. 25–32 (2003)
Google Scholar
Camargo, R.T., Agostini, V., Di Felippo, A., Pardo, T.A.S.: Manual Typification of Source Texts and Multi-document Summaries Alignments. Procedia – Social and Behavioral Sciences 95, 498–506 (2013)
Article Google Scholar
Cardoso, P.C.F., Maziero, E.G., Castro Jorge, M.L.C., Seno, E.M.R., Di Felippo, A., Rino, L.H.M., Nunes, M.G.V., Pardo, T.A.S.: CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese. In: The Proceedings of the 3rd RST Brazilian Meeting, October 26, pp. 88–105. Cuiabá/MT, Brazil (2011)
Google Scholar
Carletta, J.: Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2), 249–254 (1996)
Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Daumé III, H., Marcu, D.: A Phrase-Based HMM Approach to Document/Abstract Alignment. In: The Empirical Methods in Natural Language Processing (EMNLP), 8 p. (2004)
Google Scholar
Daumé III, H., Marcu, D.: Induction of Word and Phrase Alignments for Automatic Document Summarization. Computational Linguistics 31(4), 505–530 (2005)
Article MATH Google Scholar
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning. In: The Proceedings of the Empirical Methods for Natural Language Processing, pp. 203–212 (1999)
Google Scholar
Hatzivassiloglou, V., Klavans, J.L., Holcombe, M.L., Barzilay, R., Kan, M., McKeown, K.R.: SIMFINDER: A Flexible Clustering Tool for Summarization. In: The Proceedings of the NAACL Workshop for Summarization, pp. 41–49 (2001)
Google Scholar
Hirao, T., Suzuki, J., Isozaki, H., Maeda, E.: Dependency-based Sentence Alignment for Multiple Document Summarization. In: The COLING 2004 Proceedings of the 20th International Conference on Computational Linguistics, pp. 446-452 (2004)
Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11(1), 63–90 (1993)
Article MATH MathSciNet Google Scholar
Jing, H., McKeown, K.: The Decomposition of Human-Written Summary Sentences. In: The Proceedings of the 22nd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 129-136 (1999)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: The Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Google Scholar
Mani, I.: Automatic Summarization. Natural Language Processing, vol. 3, 285 p. John Benjamins Publishing Company, Amsterdam (2001)
Google Scholar
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California, 83 p. (1987)
Google Scholar
Marcu, D.: The automatic construction of large-scale corpora for summarization research. In: The Proceedings of the 22nd Conference on Research and Development in Information Retrieval, pp. 137-144 (1999)
Google Scholar
Maziero, E.G., Pardo, T.A.S.: Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning. In: The Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology, Cuiabá/MT, Brazil, October 24-26, pp. 1–10 (2011)
Google Scholar
Maziero, E.G., Castro Jorge, M.L.C., Pardo, T.A.S.: Identifying Multidocument Relations. In: The Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science - NLPCS, Funchal/Madeira, Portugal, June 8-12, pp. 60–69 (2010)
Google Scholar
Nenkova, A., McKeown, K.: Automatic summarization. Foundations and Trends in Information Retrieval 5(2-3), 103–233 (2011)
Article Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning, vol. 1. Morgan Kaufmann (1993)
Google Scholar
Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: The Proceedings of the 1st ACL SIGDIAL Workshop on Discourse and Dialogue, pp. 74–83 (2000)
Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: The Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL), Toulouse, France, pp. 523–530 (July 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Interinstitutional Center for Computational Linguistics (NILC), Institute of Mathematical and Computer Sciences, University of São Paulo, Brazil
Verônica Agostini, Roque Enrique López Condori & Thiago Alexandre Salgueiro Pardo

Authors

Verônica Agostini
View author publications
You can also search for this author in PubMed Google Scholar
Roque Enrique López Condori
View author publications
You can also search for this author in PubMed Google Scholar
Thiago Alexandre Salgueiro Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FCHS, Universidade do Algarve, Campus de Gambelas,, 8005-139, Faro, Portugal
Jorge Baptista
INESC-ID Lisboa, Lisbon, Portugal
Nuno Mamede
IT-University of Coimbra, Coimbra, Portugal
Sara Candeias
USP-EACH, São Paulo-SP, Brazil
Ivandré Paraboni
USP-ICMC, Universidade de São Paulo, São Carlos, SP, Brazil
Thiago A. S. Pardo
SCC-ICMC, University of São Paulo, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agostini, V., López Condori, R.E., Pardo, T.A.S. (2014). Automatic Alignment of News Texts and Their Multi-document Summaries: Comparison among Methods. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-09761-9_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics