Skip to main content

Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries

  • Conference paper
Text, Speech and Dialogue (TSD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

Abstract

Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity. The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are described.

Work done under partial support of Mexican Government (CONACyT, SNI) and National Polytechnic Institute, Mexico (SIP, COFAA, PIFI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 169–176 (1991)

    Google Scholar 

  2. Chen, S.: Aligning sentences in bilingual corpora using lexical information. In: Proceeding of ACL 1993, pp. 9–16 (1993)

    Google Scholar 

  3. Cowie, J., Guthrie, J.A., Guthrie, L.: Lexical disambiguation using simulated annealing. In: Proc. of the International Conference on Computational Linguistics, pp. 359–365 (1992)

    Google Scholar 

  4. Kit, C., Webster, J.J., Sin, K.K., Pan, H., Li, H.: Clause alignment for Hong Kong legal texts: A lexical-based approach. International Journal of Corpus Linguistics 9(1), 29–51 (2004)

    Article  Google Scholar 

  5. Gale, W.A., Church, K.W.: A program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California (1991)

    Google Scholar 

  6. Gelbukh, A., Sidorov, G.: Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Gelbukh, A., Sidorov, G., Han, S.Y.: On Some Optimization Heuristics for Lesk-Like WSD Algorithms. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 402–405. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. McEnery, A.M., Oakes, M.P.: Sentence and word alignment in the CRATER project. In: Thomas, J., Short, M. (eds.) Using Corpora for Language Research, London, pp. 211–231 (1996)

    Google Scholar 

  9. Mikhailov, M.: Two Approaches to Automated Text Aligning of Parallel Fiction Texts. Across Languages and Cultures 2(1), 87–96 (2001)

    Article  Google Scholar 

  10. Kay, M., Roscheisen, M.: Text-translation alignment. Computational Linguistics 19(1), 121–142 (1993)

    Google Scholar 

  11. Langlais, P., Simard, M., Veronis, J.: Methods and practical issues in evaluation alignment techniques. In: Proceeding of Coling-ACL 1998 (1998)

    Google Scholar 

  12. Li, W., Sun, M.: Automatic Image Annotation based on WordNet and Hierarchical Ensembles. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 551–563. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Meyers, A., Kosaka, M., Grishman, R.: A Multilingual Procedure for Dictionary-Based Sentence Alignment. In: Proceedings of AMTA 1998: Machine Translation and the Information Soup, pp. 187–198 (1998)

    Google Scholar 

  14. Velásquez, F., Gelbukh, A., Sidorov, G.: AGME: un sistema de análisis y generación de la morfología del español. In: Proc. Of Workshop Multilingual information access and natural language processing of IBERAMIA 2002 (8th Iberoamerican conference on Artificial Intelligence), Sevilla, España, November 12, pp. 1–6 (2002)

    Google Scholar 

  15. Villaseñor Pineda, L., Massé Márquez, J.A., Pineda Cortés, L.A.: Towards a Multimodal Dialogue Coding Scheme. In: Gelbukh, A. (ed.) Proc. of CICLing 2000 Computational Linguistics and Intelligent Text Processing, IPN, Mexico, pp. 551–563 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gelbukh, A., Sidorov, G., Vera-Félix, J.Á. (2006). Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts Using Bilingual Dictionaries. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_8

Download citation

  • DOI: https://doi.org/10.1007/11846406_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39090-9

  • Online ISBN: 978-3-540-39091-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics