Skip to main content

Automatic Alignment of News Texts and Their Multi-document Summaries: Comparison among Methods

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2014)

Abstract

Aligning texts and their multi-document summaries is the task of determining the correspondences among textual segments in the texts and in their corresponding summaries. The study of alignments allows a better understanding of the multi-document summarization process, which may subsidize new summarization models for producing more informative summaries. In this paper, we investigate some approaches for text-summary sentence alignment, including superficial, deep and hybrid approaches. Our results show that superficial approaches may obtain very good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agostini, V., Camargo, R.T., Di Felippo, A.: Manual Alignment of News Texts and their Multi-document Human Summaries. In: Aluísio, S.M., Tagnin, S.E.O. (eds.) New Language Technologies and Linguistic Research: A Two-Way Road, pp. 148–170. Cambridge Scholars Publishing (2014)

    Google Scholar 

  2. Banko, M., Mittal, V., Kantrowitz, M., Goldstein, J.: Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans. In: The Proceedings of the 4th Conference of the Pacific Association for Computational Linguistics, 5 p. (1999)

    Google Scholar 

  3. Barzilay, R., Elhadad, N.: Sentence Alignment for Monolingual Comparable Corpora. In: The Proceedings of the Empirical Methods for Natural Language, pp. 25–32 (2003)

    Google Scholar 

  4. Camargo, R.T., Agostini, V., Di Felippo, A., Pardo, T.A.S.: Manual Typification of Source Texts and Multi-document Summaries Alignments. Procedia – Social and Behavioral Sciences 95, 498–506 (2013)

    Article  Google Scholar 

  5. Cardoso, P.C.F., Maziero, E.G., Castro Jorge, M.L.C., Seno, E.M.R., Di Felippo, A., Rino, L.H.M., Nunes, M.G.V., Pardo, T.A.S.: CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese. In: The Proceedings of the 3rd RST Brazilian Meeting, October 26, pp. 88–105. Cuiabá/MT, Brazil (2011)

    Google Scholar 

  6. Carletta, J.: Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2), 249–254 (1996)

    Google Scholar 

  7. Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)

    Article  Google Scholar 

  8. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  9. Daumé III, H., Marcu, D.: A Phrase-Based HMM Approach to Document/Abstract Alignment. In: The Empirical Methods in Natural Language Processing (EMNLP), 8 p. (2004)

    Google Scholar 

  10. Daumé III, H., Marcu, D.: Induction of Word and Phrase Alignments for Automatic Document Summarization. Computational Linguistics 31(4), 505–530 (2005)

    Article  MATH  Google Scholar 

  11. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993)

    Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  13. Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning. In: The Proceedings of the Empirical Methods for Natural Language Processing, pp. 203–212 (1999)

    Google Scholar 

  14. Hatzivassiloglou, V., Klavans, J.L., Holcombe, M.L., Barzilay, R., Kan, M., McKeown, K.R.: SIMFINDER: A Flexible Clustering Tool for Summarization. In: The Proceedings of the NAACL Workshop for Summarization, pp. 41–49 (2001)

    Google Scholar 

  15. Hirao, T., Suzuki, J., Isozaki, H., Maeda, E.: Dependency-based Sentence Alignment for Multiple Document Summarization. In: The COLING 2004 Proceedings of the 20th International Conference on Computational Linguistics, pp. 446-452 (2004)

    Google Scholar 

  16. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11(1), 63–90 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. Jing, H., McKeown, K.: The Decomposition of Human-Written Summary Sentences. In: The Proceedings of the 22nd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 129-136 (1999)

    Google Scholar 

  18. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: The Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  19. Mani, I.: Automatic Summarization. Natural Language Processing, vol. 3, 285 p. John Benjamins Publishing Company, Amsterdam (2001)

    Google Scholar 

  20. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California, 83 p. (1987)

    Google Scholar 

  21. Marcu, D.: The automatic construction of large-scale corpora for summarization research. In: The Proceedings of the 22nd Conference on Research and Development in Information Retrieval, pp. 137-144 (1999)

    Google Scholar 

  22. Maziero, E.G., Pardo, T.A.S.: Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning. In: The Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology, Cuiabá/MT, Brazil, October 24-26, pp. 1–10 (2011)

    Google Scholar 

  23. Maziero, E.G., Castro Jorge, M.L.C., Pardo, T.A.S.: Identifying Multidocument Relations. In: The Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science - NLPCS, Funchal/Madeira, Portugal, June 8-12, pp. 60–69 (2010)

    Google Scholar 

  24. Nenkova, A., McKeown, K.: Automatic summarization. Foundations and Trends in Information Retrieval 5(2-3), 103–233 (2011)

    Article  Google Scholar 

  25. Quinlan, J.R.: C4.5: programs for machine learning, vol. 1. Morgan Kaufmann (1993)

    Google Scholar 

  26. Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: The Proceedings of the 1st ACL SIGDIAL Workshop on Discourse and Dialogue, pp. 74–83 (2000)

    Google Scholar 

  27. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: The Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL), Toulouse, France, pp. 523–530 (July 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Agostini, V., López Condori, R.E., Pardo, T.A.S. (2014). Automatic Alignment of News Texts and Their Multi-document Summaries: Comparison among Methods. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09761-9_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09760-2

  • Online ISBN: 978-3-319-09761-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics