Abstract
Aligning texts and their multi-document summaries is the task of determining the correspondences among textual segments in the texts and in their corresponding summaries. The study of alignments allows a better understanding of the multi-document summarization process, which may subsidize new summarization models for producing more informative summaries. In this paper, we investigate some approaches for text-summary sentence alignment, including superficial, deep and hybrid approaches. Our results show that superficial approaches may obtain very good results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agostini, V., Camargo, R.T., Di Felippo, A.: Manual Alignment of News Texts and their Multi-document Human Summaries. In: Aluísio, S.M., Tagnin, S.E.O. (eds.) New Language Technologies and Linguistic Research: A Two-Way Road, pp. 148–170. Cambridge Scholars Publishing (2014)
Banko, M., Mittal, V., Kantrowitz, M., Goldstein, J.: Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans. In: The Proceedings of the 4th Conference of the Pacific Association for Computational Linguistics, 5 p. (1999)
Barzilay, R., Elhadad, N.: Sentence Alignment for Monolingual Comparable Corpora. In: The Proceedings of the Empirical Methods for Natural Language, pp. 25–32 (2003)
Camargo, R.T., Agostini, V., Di Felippo, A., Pardo, T.A.S.: Manual Typification of Source Texts and Multi-document Summaries Alignments. Procedia – Social and Behavioral Sciences 95, 498–506 (2013)
Cardoso, P.C.F., Maziero, E.G., Castro Jorge, M.L.C., Seno, E.M.R., Di Felippo, A., Rino, L.H.M., Nunes, M.G.V., Pardo, T.A.S.: CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese. In: The Proceedings of the 3rd RST Brazilian Meeting, October 26, pp. 88–105. Cuiabá/MT, Brazil (2011)
Carletta, J.: Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2), 249–254 (1996)
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Daumé III, H., Marcu, D.: A Phrase-Based HMM Approach to Document/Abstract Alignment. In: The Empirical Methods in Natural Language Processing (EMNLP), 8 p. (2004)
Daumé III, H., Marcu, D.: Induction of Word and Phrase Alignments for Automatic Document Summarization. Computational Linguistics 31(4), 505–530 (2005)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning. In: The Proceedings of the Empirical Methods for Natural Language Processing, pp. 203–212 (1999)
Hatzivassiloglou, V., Klavans, J.L., Holcombe, M.L., Barzilay, R., Kan, M., McKeown, K.R.: SIMFINDER: A Flexible Clustering Tool for Summarization. In: The Proceedings of the NAACL Workshop for Summarization, pp. 41–49 (2001)
Hirao, T., Suzuki, J., Isozaki, H., Maeda, E.: Dependency-based Sentence Alignment for Multiple Document Summarization. In: The COLING 2004 Proceedings of the 20th International Conference on Computational Linguistics, pp. 446-452 (2004)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11(1), 63–90 (1993)
Jing, H., McKeown, K.: The Decomposition of Human-Written Summary Sentences. In: The Proceedings of the 22nd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 129-136 (1999)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: The Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Mani, I.: Automatic Summarization. Natural Language Processing, vol. 3, 285 p. John Benjamins Publishing Company, Amsterdam (2001)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California, 83 p. (1987)
Marcu, D.: The automatic construction of large-scale corpora for summarization research. In: The Proceedings of the 22nd Conference on Research and Development in Information Retrieval, pp. 137-144 (1999)
Maziero, E.G., Pardo, T.A.S.: Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning. In: The Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology, Cuiabá/MT, Brazil, October 24-26, pp. 1–10 (2011)
Maziero, E.G., Castro Jorge, M.L.C., Pardo, T.A.S.: Identifying Multidocument Relations. In: The Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science - NLPCS, Funchal/Madeira, Portugal, June 8-12, pp. 60–69 (2010)
Nenkova, A., McKeown, K.: Automatic summarization. Foundations and Trends in Information Retrieval 5(2-3), 103–233 (2011)
Quinlan, J.R.: C4.5: programs for machine learning, vol. 1. Morgan Kaufmann (1993)
Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: The Proceedings of the 1st ACL SIGDIAL Workshop on Discourse and Dialogue, pp. 74–83 (2000)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: The Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL), Toulouse, France, pp. 523–530 (July 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Agostini, V., López Condori, R.E., Pardo, T.A.S. (2014). Automatic Alignment of News Texts and Their Multi-document Summaries: Comparison among Methods. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)