Abstract
In this paper we compare a number of Topiary-style headline generation systems. The Topiary system, developed at the University of Maryland with BBN, was the top performing headline generation system at DUC 2004. Topiary-style headlines consist of a number of general topic labels followed by a compressed version of the lead sentence of a news story. The Topiary system uses a statistical learning approach to finding topic labels for headlines, while our approach, the LexTrim system, identifies key summary words by analysing the lexical cohesive structure of a text. The performance of these systems is evaluated using the ROUGE evaluation suite on the DUC 2004 news stories collection. The results of these experiments show that a baseline system that identifies topic descriptors for headlines using term frequency counts outperforms the LexTrim and Topiary systems. A manual evaluation of the headlines also confirms this result.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Document Understanding Conference, DUC, http://duc.nist.gov/
Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries using n-gram Co-occurrence Statistics. In: The Proceedings of HLT/NACCL (2003)
Zajic, D., Dorr, B., Schwartz, R.: BBN/UMD at DUC-2004: Topiary. In: The Proceedings of the Document Understanding Conference, DUC (2004)
Kraaij, W., Spitters, M., Hulth, A.: Headline extraction based on a combination of uni- and multi-document summarization techniques. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2002 (2002)
Alfonseca, E., Rodriguez, P.: Description of the UAM system for generating very short summaries at DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2003 (2003)
Copeck, T., Szpakowicz, S.: Picking phrases, picking sentences. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2003 (2003)
Zhou, L., Hovy, E.: Headline Summarization at ISI. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2003 (2003)
Lacatusu, F., Hickl, A., Harabagiu, S., Nezda, L.: Lite-GISTexter at DUC2004. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2004 (2004)
Angheluta, R., Mitra, R., Jing, X., Moens, M.-F., Leuven, K.U.: Summarization System at DUC 2004. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference. DUC 2004 (2004)
Alfonseca, E., Moreno-Sandoval, A., Guirao, J.M.: Description of the UAM System for Generation Very Short Summaries at DUC 2004. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2004 (2004)
Kolluru, B., Christensen, H., Gotoh, Y.: Decremental Feature-based Compaction. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2004 (2004)
Zhou, L., Hovy, E.: Template-filtered Headline Summarization. In: The Proceedings of the ACL workshop, Text Summarization Branches Out, pp. 56–60 (2004)
Witbrock, M., Mittal, V.: Ultra-Summarisation: A Statistical approach to generating highly condensed non-extractive summaries. In: The Proceedings of the ACM-SIGIR, pp. 315–316 (1999)
Banko, M., Mittal, V., Witbrock, M.: Generating Headline-Style Summaries. In: The Proceedings of the Association for Computational Linguistics (2000)
Jin, R., Hauptmann, A.G.: A new probabilistic model for title generation. In: The Proceedings of the International Conference on Computational Linguistics (2002)
Berger, A.L., Mittal, V.O.: OCELOT: a system for summarizing Web pages. In: The Proceedings of the ACM-SIGIR, pp. 144–151 (2000)
Zajic, D., Dorr, B.: Automatic headline generation for newspaper stories. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference, DUC 2002 (2002)
Dorr, B., Zajic, D., Schwartz, R.: Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In: The Proceedings of the Document Understanding Conference, DUC (2003)
Morris, J., Hirst, G.: Lexical Cohesion by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics 17(1) (1991)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Five Papers on WordNet. CSL Report 43, Cognitive Science Laboratory, Princeton University (July 1990)
Collins, M.: Three generative lexicalised models for statistical parsing. In: The Proceedings of ACL (1997)
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stones, R., Weischedel, R.: BBN: Description of the SIFT system as used for MUC-7. In: The Proceedings of MUC-7 (1998)
Xu, J., Broglio, J., Croft, W.B.: The design and implementation of a part of speech tagger for English. Technical Report IR-52, University of Massachusetts, Amherst, Center for Intelligent Information Retrieval (1994)
Stokes, N.: Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking domain. Ph.D. thesis. Department of Computer Science, University College Dublin (2004)
Stokes, N., Newman, E., Carthy, J., Smeaton, A.F.: Broadcast News Gisting using Lexical Cohesion Analysis. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 209–222. Springer, Heidelberg (2004)
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: The Proceedings of the ACL workshop, Text Summarization Branches Out, pp. 56–60 (2004)
Doran, W.P., Stokes, N., Newman, E., Dunnion, J., Carthy, J., Toolan, F.: News Story Gisting at University College Dublin. In: The Proceedings of the Document Understanding Conference, DUC (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, R., Stokes, N., Doran, W.P., Newman, E., Carthy, J., Dunnion, J. (2005). Comparing Topiary-Style Approaches to Headline Generation. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)