Abstract
Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and used as evidence on linguistic facts which, in a monolingual context, might be unavailable to (or overlooked by) a computer program. Multilingual technologies, which to a large extent are language independent, provide a powerful support for systematic and consistent cross-lingual studies and allow for easier building of annotated linguistic resources for languages where such resources are scarce or missing. In this paper we will briefly present some underlying multilingual technologies and methodologies we developed for exploiting parallel corpora and we will discuss their relevance for cross-linguistic studies and applications.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barbu-Mititelu, V., Ion, R.: Cross-language Transfer of Syntactic Relations Using Parallel Corpora. In: Proceedings of the Workshop on Cross-Language Knowledge Induction, EUROLAN 2005, Cluj-Napoca,Romania, pp. 46–51 (2005)
Brants, T.: TnT a statistical part-of-speech tagger. In: Proceedings of the 6th ANLP Conference, Seattle, WA, pp. 224–231 (2000)
Bertagna, F., Monachini, M., Soria, C., Calzolari, N., Huang, C.-R., Hsieh, S.-K., Marchetti, A., Tesconi, M.: Fostering Intercultural Collaboration: a Web Service Architecture for Cross-Fertilization of Distributed Wordnets. In [17] (2007)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Ceauşu, A.: Maximum Entropy Tiered Tagging. In: Proceedings of the Eleventh ESSLLI Student Session, Malaga, Spain, pp. 173–179 (2006)
Ceauşu, A., Ştefănescu, D., Tufiş, D.: Acquis Communautaire sentence alignment using Support Vector Machines. In: Proceedings of the 5th LREC Conference, Genoa, Italy, pp. 2134–2137 (2006)
Erjavec, T.: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proceedings of the 4th LREC Conference, Lisbon, Portugal, pp. 1535–1538 (2004)
Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using the second order information for training SVM. Technical report, Department of Computer Science, National Taiwan University (2005), www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf
Fellbaum, C., Vossen, P.: Connecting the Universal to the Specic: Towards the Global Grid. In [17] (2007)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, USA, pp. 177–184 (1991)
Hashimoto, C., Bond, F., Flickinger, D.: The Lextype DB: A Web-based Framework for Supporting Collaborative Multilingual Grammar and Treebank Development. In [17] (2007)
Hayashi, Y.: Conceptual Framework of an Upper Ontology for Describing Linguistic Services. In [17] (2007)
Inaba, R., Murakami, Y., Nadamoto, A., Ishida, T.: Multilingual Communication Support Using the Language Grid. In [17] (2007)
Ion, R.: Metode de dezambiguizare semantică automată. Aplicaţii pentru limbile engleză şi română, PhD Thesis, Romanian Academy, Bucharest, Romania, pp. 145 (2006)
Ion, R., Tufiş, D.: Multilingual Word Sense Disambiguation Using Aligned Wordnets. Romanian Journal on Information Science and Technology, Tufiş D. (ed.) Special Issue on BalkaNet, Romanian Academy, 7(2-3), 198-214 (2004)
Ishida, T., Fussell, S.R., Vossen, P.T.J.M. (eds.): IWIC 2007. LNCS, vol. 4568. Springer, Heidelberg (2007)
Koda, T.: Cross-cultural Study of Avatars’ Facial Expressions and Design Considerations within Asian Countries. In [17] (2007)
Magnini, B., Cavaglià, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of LREC 2000, Athens, Greece, pp. 1413–1418 (2000)
Martin, J., Mihalcea, R., Pedersen, T.: Word Alignment for Languages with Scarce Resources. In: Proceeding of the ACL 2005 Workshop on Building and Using Parallel Corpora: Data-driven Machine Translation and Beyond. Association for Computational Linguistics, Ann Arbor, Michigan, pp. 65–74 (2005)
Moore, R.C.: Fast and Accurate Sentence Alignment of Bilingual Corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–244. Springer, Heidelberg (2002)
Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine, pp. 17–19 (2001)
Och, F., Ney, J.: Improved Statistical Alignment Models. In: Proceedings of ACL 2000, Hong Kong, China, pp. 440–447 (2000)
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Schneider, P., Mattenklott, A.: Emotion Eliciting Events in the Workplace: An Intercultural Comparison. In [17] (2007)
Smadja, F.A., McKeown, K.R.: Automatically extracting and representing collocations for language generation. In: Proceedings of the 28th annual meeting on Association for Computational Linguistics, Pittsburgh, Pennsylvania, pp. 252–259 (1990)
Sornlertlamvanich, V., Charoenporn, T., Robkop, K., Isahara, H.: Collaborative Platform for Multilingual Resource Development and Intercultural Communication. In [17] (2007)
Tapanainen, P., Järvinen, T.: A dependency parser for English. Technical Report no. TR-1, Department of General Linguistics, University of Helsinki, Finland (1997)
Todiraşcu, A.: Towards an automatic extraction of collocations; the case of the verb MAKE/DO (în Romanian). In: Proceedings of the National Workshop on Romanian Language Processing, Iaşi, pp. 95–101 (November 3-4, 2006)
Tufiş, D., Barbu, A., Ion, R.: Extracting Multilingual Lexicons from Parallel Corpora. Computers and the Humanities 38(2), 163–189 (2004)
Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D.: Improved Lexical Alignment by Combining Multiple Reified Alignments. In: Proceedings of the 11th Conference of the European Association for Computational Linguistics (EACL), Trento, Italy, pp. 153–160 (2006)
Tufiş, D., Ion, R., Ceauşu, A., Stefănescu, D.: Combined Aligners. In: Proceeding of the ACL 2005 Workshop on Building and Using Parallel Corpora: Data-driven Machine Translation and Beyond. Association for Computational Linguistics, Ann Arbor, Michigan, pp. 107–110 (2005)
Tufiş, D., Ion, R., Ide, N.: Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets. In: Proceedings of the 20th COLING Conference, Geneva, pp. 1312–1318 (2004)
Tufiş, D.: Tiered Tagging and Combined Classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)
Tufiş, D., Cristea, D., Stamou, S.: BalkaNet: Aims, Methods, Results and Perspectives. A General Overview. Romanian Journal on Information Science and Technology, Tufiş, D. (ed.) Special Issue on BalkaNet, Romanian Academy, 7(2-3), 9-34 (2004)
Tufiş, D., Barbu-Mititelu, V., Bozianu, L., Mihăilă, C.: Romanian WordNet: New Developments and Applications. In: Proceedings of the 3rd Conference of the Global WordNet Association, Jeju, Republic of Korea, pp. 337–344 (2006)
Vossen, P. (ed.): A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tufiş, D. (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In: Ishida, T., Fussell, S.R., Vossen, P.T.J.M. (eds) Intercultural Collaboration. IWIC 2007. Lecture Notes in Computer Science, vol 4568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74000-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-74000-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73999-9
Online ISBN: 978-3-540-74000-1
eBook Packages: Computer ScienceComputer Science (R0)