Abstract
We present in this paper a sentence compression module used in a machine-assisted subtitling application developed in the European e-content project e-title. Our approach to compression and the architecture of the system are motivated by the commercial and multilingual nature of the project, that is, the need to output reasonable compressions and the ability to add new strategies, genres and languages easily. The compression module currently works for the Catalan and English languages and uses the Constraint Grammar engine for linguistic preprocessing and for the linguistically motivated compression rules, thus providing a homogenous format throughout the compression process. The compression rules were implemented based on a corpus of automatically aligned <script,subtitle> pairs of films for both languages. We performed for both languages an automatic quantitative evaluation of the compression using the aligned corpus and a qualitative manual evaluation of grammaticality and informativeness.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alsina, A., Badia, T., Boleda, G., Bott, S., Gil, Á., Quixal, M., Valentín, O.: CATCG: a general purpose parsing tool applied. In: Proceedings of Third International Conference on Language Resources and Evaluation, Las Palmas, vol. III, pp. 1130–1134 (2002)
Brants, T.: TnT – a statistical part-of-speech tagger. In: Proceedings of the 6th Applied NLP Conference, ANLP-2000, Seattle, WA, April 29 – May 3 (2000)
Díaz Cinta, J.: Teoría y practica de la subtitulación Inglés-Español. Ariel (2003)
Daelemans, W., Höthker, A., Tjong Kim Sang, E.: Automatic Sentence Simplification for Subtitling in Dutch and English. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 1045–1048 (2004)
Gottlieb, H.: Subtitling - a New University Discipline. In: Dollerup, C., et al. (eds.) Teaching Translation and Interpreting, pp. 161–170. John Benjamins, Amsterdam (1992)
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the 6th Conference on Applied Natural Language Processing, pp. 310–315 (2000)
Hori, C., Furui, S.: Automatic Summarization of English Broadcast News speech. In: Notebook of HLT 2002, San Diego, U.S.A., pp. 228–233 (2002)
Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar: a language-independent system for parsing unrestricted text. Natural Language Processing, vol. 4. Mouton de Gruyter, Berlin/New York (1995)
Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence Journal; Extended version of paper: Statistics-based summarization – Step 1: sentence compression, AAAI 2002 (2002)
Vandeghinste, V., Pan, Y.: Sentence Compression for Automated Subtitling: A Hybrid Approach. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Workshop: Text Summarization Branches Out, Barcelona, Spain (July 2004)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Zechner, K.: Automatic Summarization of Spoken Dialogues in Unrestricted Domains. Ph.D. thesis, Carnegie Mellon University, School of Computer Science, Language Technologies Institute (November 2001). Also printed as: Technical Report CMU-LTI-01-168 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bouayad-Agha, N., Gil, A., Valentin, O., Pascual, V. (2006). A Sentence Compression Module for Machine-Assisted Subtitling. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_51
Download citation
DOI: https://doi.org/10.1007/11671299_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)