Abstract
This paper describes a language independent method for aligning parallel texts (texts that are translations of each other, or of a common source text), statistically supported. This new approach is inspired on previous work by Ribeiro et al (2000). The application of the second statistical filter, proposed by Ribeiro et al, based on Confidence Bands (CB), is substituted by the application of the Longest Sorted Sequence algorithm (LSSA). LSSA is described in this paper. As a result, 35% decrease in processing time and 18% increase in the number of aligned segments was obtained, for Portuguese-French alignments. Similar results were obtained regarding Portuguese-English alignments. Both methods are compared and evaluated, over a large parallel corpus made up of Portuguese, English and French parallel texts (approximately 250Mb of text per language).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), October 3-6, Hong Kong, China (2000)
Ribeiro, A., Dias, G., Lopes, G., Mexia, J.: Cognates Alignment. In: Maegaard, B. (ed.) Proceedings of the Machine Translation Summit VIII (MT Summit VIII), Santiago de Compostela, Spain, September 18-22. European Association of Machine Translation, pp. 287–292 (2001)
Ferreira da Silva, J., Dias, G., Guilloré, S., Pereira Lopes, J.G.: Using Local Maxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J.J. (eds.) EPIA 1999. LNCS (LNAI), vol. 1695, pp. 113–132. Springer, Heidelberg (1999)
Danielsson, P., Mühlenbock, K.: Small but efficient: The misconception of high frequency words in Scandinavian translation. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 158–168. Springer, Heidelberg (2000)
Simard, M., Foster, G., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation, vol. TMI-92, pp. 67–91 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ildefonso, T., Lopes, G.P. (2005). Longest Sorted Sequence Algorithm for Parallel Text Alignment. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2005. EUROCAST 2005. Lecture Notes in Computer Science, vol 3643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11556985_13
Download citation
DOI: https://doi.org/10.1007/11556985_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29002-5
Online ISBN: 978-3-540-31829-3
eBook Packages: Computer ScienceComputer Science (R0)