Skip to main content

A Sentence Compression Module for Machine-Assisted Subtitling

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

  • 1487 Accesses


We present in this paper a sentence compression module used in a machine-assisted subtitling application developed in the European e-content project e-title. Our approach to compression and the architecture of the system are motivated by the commercial and multilingual nature of the project, that is, the need to output reasonable compressions and the ability to add new strategies, genres and languages easily. The compression module currently works for the Catalan and English languages and uses the Constraint Grammar engine for linguistic preprocessing and for the linguistically motivated compression rules, thus providing a homogenous format throughout the compression process. The compression rules were implemented based on a corpus of automatically aligned <script,subtitle> pairs of films for both languages. We performed for both languages an automatic quantitative evaluation of the compression using the aligned corpus and a qualitative manual evaluation of grammaticality and informativeness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  • Alsina, A., Badia, T., Boleda, G., Bott, S., Gil, Á., Quixal, M., Valentín, O.: CATCG: a general purpose parsing tool applied. In: Proceedings of Third International Conference on Language Resources and Evaluation, Las Palmas, vol. III, pp. 1130–1134 (2002)

    Google Scholar 

  • Brants, T.: TnT – a statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference, ANLP-2000, Seattle, WA, April 29 – May 3 (2000)

    Google Scholar 

  • Díaz Cinta, J.: Teoría y practica de la subtitulación Inglés-Español. Ariel (2003)

    Google Scholar 

  • Daelemans, W., Höthker, A., Tjong Kim Sang, E.: Automatic Sentence Simplification for Subtitling in Dutch and English. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 1045–1048 (2004)

    Google Scholar 

  • Gottlieb, H.: Subtitling - a New University Discipline. In: Dollerup, C., et al. (eds.) Teaching Translation and Interpreting, pp. 161–170. John Benjamins, Amsterdam (1992)

    Google Scholar 

  • Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Conference on Applied Natural Language Processing, pp. 310–315 (2000)

    Google Scholar 

  • Hori, C., Furui, S.: Automatic Summarization of English Broadcast News speech. In: Notebook of HLT 2002, San Diego, U.S.A., pp. 228–233 (2002)

    Google Scholar 

  • Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar: a language-independent system for parsing unrestricted text. Natural Language Processing, vol. 4. Mouton de Gruyter, Berlin/New York (1995)

    Google Scholar 

  • Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence Journal; Extended version of paper: Statistics-based summarization – Step 1: sentence compression, AAAI 2002 (2002)

    Google Scholar 

  • Vandeghinste, V., Pan, Y.: Sentence Compression for Automated Subtitling: A Hybrid Approach. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Workshop: Text Summarization Branches Out, Barcelona, Spain (July 2004)

    Google Scholar 

  • Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  • Zechner, K.: Automatic Summarization of Spoken Dialogues in Unrestricted Domains. Ph.D. thesis, Carnegie Mellon University, School of Computer Science, Language Technologies Institute (November 2001). Also printed as: Technical Report CMU-LTI-01-168 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bouayad-Agha, N., Gil, A., Valentin, O., Pascual, V. (2006). A Sentence Compression Module for Machine-Assisted Subtitling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32205-4

  • Online ISBN: 978-3-540-32206-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics