Abstract
Translation systems tend to have more trouble with long sentences than with short ones for a variety of reasons. When the source and target languages differ rather markedly, as do Japanese and English, this problem is reflected in lower quality output. To improve readability, we experimented with automatically splitting long sentences into shorter ones. This paper outlines the problem, describes the sentence splitting procedure and rules, and provides an evaluation of the results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bloomfield, L. 1933. Language. New York: Henry Holt.
Fodor, J.A., T.G. Bever, and M.F. Garrett. 1974. The Psychology of Language. New York: McGraw-Hill.
Gunter, R. 1974. Sentences in Dialog. Columbia, SC: Hernbeam Press.
Hovy, E.H. and L. Gerber. 1997. MT at the Paragraph Level: Improving English Synthesis in SYSTRAN. In Proceedings of the 4th Conference of Theoretical and Methodological Issues in Machine Translation (TMI). Santa Fe, New Mexico.
Hovy, E.H. and C-Y. Lin. 1998. Automated Text Summarization in SUMMARIST. In M. Maybury and I. Mani (eds), Advances in Automatic Text Summarization. MIT Press, to appear.
Hutchins, J. and H. Somers. 1992. An Introdution to Machine Translation. Cambridge: Academic Press Limited.
Knight, K. and V. Hatzivassiloglou. 1995. Two-Level, Many-Paths Generation. In Proceedings of the 4th Conference of the ACL (252–260).
Knight, K., I. Chander, M. Haines, V. Hatzivassiloglou, E.H. Hovy, M. Iida, S.K. Luk, R.A. Whitney, and K. Yamada. 1995. Filling Knowledge Gaps in a Broad-Coverage MT System. In Proceedings of the 14th IJCAI Conference. Montreal, Canada.
Levinson, J. 1986. Punctuation and the Orthographic Sentence: A Linguistic Analysis. Ph.D dissertation, City University of New York.
Marcu, D. 1997. The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. Ph.D. dissertation, University of Toronto.
Nunberg, G. 1990. The Linguistics of Punctuation. CSLI Lecture Notes 18, Center for the Study of Language and Information, Stanford University.
Penman 1989 The Penman Documentation. Unpublished documentation for the Penman Language Generation System, USC/Information Sciences Institute.
Quirk, R., S. Greenbaum, G. Leech, and J. Svartvik. 1985. A Comprehensive Grammar of the English Language. London and New York: Longman.
Vaughan Payne, L. 1965. The Lively Art of Writing. Chicago: Follett Publishing Company.
Weathers, W. and O. Winchester. 1978. The New Strategy of Style. McGraw-Hill.
Yang, J. and Gerber L. 1996. SYSTRAN Chinese-English MT System. In Proceedings of the International Conference on Chinese Computing 96. Singapore.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gerber, L., Hovy, E. (1998). Improving Translation Quality by Manipulating Sentence Length. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_40
Download citation
DOI: https://doi.org/10.1007/3-540-49478-2_40
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65259-5
Online ISBN: 978-3-540-49478-2
eBook Packages: Springer Book Archive