Abstract
Originally seen as a problem in translation of multilingual texts, the alignment of corresponding entities from two versions of a document has become a scientific research topic. In this paper, natural language processing methods are reviewed and an alignment algorithm is presented that takes into account both the linguistic features, and the structural data present in modern multilingual documents.
Preview
Unable to display preview. Download preview PDF.
References
Margaret King. Sdt: A case study. ISSCO (University of Geneva, Geneva, Switzerland), July 1996.
M. Bryan. Linking HTML Translations. In WWW Conference, Internationalization Workshop, Paris, May 1996.
T.C. Benitez. Internationalization & multilinguism. In WWW Conference, Internationalization Workshop, Paris, May 1996.
K. Church. Char-align: A program for aligning parallel texts at the character level. In Computational Linguistics. Association for Computational Linguistics, 1993.
Martin Kay and Martin Roescheisen. Text-translation alignment. Computational Linguistics, 19(1), march 1993.
Adnane Zribi. Contribution à l'étude de l'appariement de textes bilingues et monolingues'. PhD thesis, University of Paris-Sud, July 1995.
P. Brown, S. Delia Pietra, V. Delia Pietrs, and R. Mercer. A statistical approach to language translation. In Proceedings of the 12th International Conference on Computational Linguistics, Budapest, Hungary, 1988.
Frank Debili. Construction automatique de transfert d'expressions français-anglais et français-arabe. CNRS, Paris, December 1990.
R. Catizone, G. Russell, and S. Warwick. Deriving translation data from bilingual texts. In U. Zernick, editor, Proceedings of the First Lexical Acquisition Workshop, Detroit, Mich., USA, 1989.
Peter F. Brown, Jennifer C. Lai, and Robert L. Mercer. Aligning sentences in parallel corpora. In 29th Annual Meeting of the ACL-Proceedings of the Conference, pages 169–176, Berkeley, Californie, USA, juin 1996. Association for Computational Linguistics (ACL), The University of California at Berkeley.
W. Gale and K. W. Church. A program for aligning sentences in bilingual corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (U. S. A.), 1991.
M. Simard, G. Foster, and P. Isabelle. Using cognates to align sentences in bilingual corpora. In Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, 1992.
A. M. McEnery and P. Oakes. Cognate extraction in the crater project. In Proceedings of the EACL-SIGDAT workshop, pages 77–86, Dublin, 1995.
L. Cranias, H. Papageorgiou, and S. Piperidis. A matching technique in example-based machine translation. In Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994.
A. McEnery and A. Wilson. Corpus Linguistics. Edinburgh University Press, 1996.
Association of Computational Linguistics, editor. An algorithm for finding noun phrase correspondences in bilingual corpora, Palo Alto, 1993. Rank Xerox.
Stanley F. Chen. Building Probabilistic Models for Natural Language. PhD thesis, Harvard University, 1996.
Hadar Shemtov. Text alignment in a tool for translating revised documents. In Sixth Conference of the EACL, pages 449–453. EACL, avril 1993.
A. Garside, G. Leech, and A. McEnery. Corpus Annotation: Linguistic Information from Computer Corpora. Longman, London, forthcoming.
David D. Palmer and Marti A. Hearst. Adaptive sentence boundary disambiguation. Computational Linguistics, novembre 1994.
G. D. Ritchie, A. W. Black, G. Russell, and S. G. Pullmann. Computational morphology. MIT Press, Cambridge, Mass. (U. S. A.), 1992.
Eric Brill. A Corpus-Based Approach to Language Learning. PhD thesis, University of Pennsylvania (U. S. A.), 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ballim, A., Coray, G., Linden, A., Vanoirbeek, C. (1998). The use of automatic alignment on structured multilingual documents. In: Hersch, R.D., André, J., Brown, H. (eds) Electronic Publishing, Artistic Imaging, and Digital Typography. RIDT 1998. Lecture Notes in Computer Science, vol 1375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053292
Download citation
DOI: https://doi.org/10.1007/BFb0053292
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64298-5
Online ISBN: 978-3-540-69718-3
eBook Packages: Springer Book Archive