Skip to main content
Log in

Toward Language-dependent Applications

  • Published:
Machine Translation

Abstract

Is adaptation of English NLP applications the right way to gomultilingual? Should one prefer ``language-independent'' systems with aview to applying them to a large number of different languages? Experience from the processing of Portuguese in several differentareas (part-of-speech tagging, corpus tools, lexical decomposition,machine translation, etc.) suggests that neither of these offers a satisfactory solution.

This paper argues for a thorough study of the way individual languageswork in order to develop applications suited for the language inquestion, i.e., ``language-dependent'' systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Almeida, J. J. and Ulisses Pinto: 1995, Manual de utilizador do JSpell [JSpell User's Manual], Departamento de Informática, Universidade do Minho, Braga, Portugal.

    Google Scholar 

  • Aston, Guy and Lou Burnard: 1996, The BNC Handbook: Exploring the British National Corpus with SARA, Edinburgh University Press, Edinburgh.

    Google Scholar 

  • Bick, Eckhard: 1998, 'Structural Lexical Heuristics in the Automatic Analysis of Portuguese', in Proceedings of the 11th Nordic Conference on Computational Linguistics, Nodalida '98, Copenhagen, pp. 44-56.

  • Bindi, Remo, Nicoletta Calzolari, Monica Monachini, Vito Pirrelli and Antonio Zampolli: 1994, 'Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition', Literary and Linguistic Computing 9, 29-46.

    Google Scholar 

  • Catford, J. C.: 1967, A Linguistic Theory of Translation: An Essay in Applied Linguistics, Oxford University Press, Oxford.

    Google Scholar 

  • Christ, Oliver: 1994, 'A Modular and Flexible Architecture for an Integrated Corpus Query System', in Proceedings of COMPLEX'94: 3rd Conference on Computational Lexicography and Text Research, Budapest, pp. 23-32.

  • Christ, Oliver: 1998, 'Linking WordNet to a Corpus Query System', in Nerbonne (1998), pp. 189-202.

    Google Scholar 

  • Church, Kenneth Ward: 1988, 'A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text', in 2nd Conference on Applied Natural Language Processing, Austin, TX, pp. 136-143.

  • Church, Kenneth W. and William A. Gale: 1991, 'Concordances for Parallel Text', in Using Corpora: Proceedings of the Eight Annual Conference of the UW Centre for the New OED and Text Research, Oxford, pp. 40-62.

  • Dagan, Ido and Alon Itai: 1994, 'Word Sense Disambiguation Using a Second Language Monolingual Corpus', Computational Linguistics 20, 563-596.

    Google Scholar 

  • des Tombe, Louis and Susan Armstrong-Warwick: 1993, 'Using Function Words to Measure Translation Quality', in Making Sense of Words: Proceedings of the Ninth Annual Conference of the UW Centre for the New OED and Text Research, Oxford, pp. 1-18.

  • Doherty, Monika: 1992, 'Informationelle Holzwege: Ein Problem der Übersetzungswissenschaft' [Informational garden paths: a problem of translation science], Zeitschrift für Literaturwissenschaft und Linguistik 84, 30-49.

    Google Scholar 

  • Doherty, Monika: 1997, 'Übersetzen im Spannungsfeld zwischen Grammatik und Pragmatik', [Translation in the middle ground between grammar and pragmatics], in Rudi Keller (ed.), Linguistik und Literaturübersetzen, Narr, Tübingen, pp. 79-102.

  • Dorr, Bonnie Jean: 1993. Machine Translation: A View from the Lexicon, The MIT Press, Cambridge, Massachusetts.

    Google Scholar 

  • Dorr, Bonnie J.: 1997, 'Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation', Machine Translation 12, 271-322.

    Google Scholar 

  • Eagles: 1996a. Recommendations for the Morphosyntactic Annotation of Corpora, EAG-TCWG-MAC/R, Version of March, 1996, retrieved 11 May 1999 from the World Wide Web, http://www.ilc.pi.cnr.it/EAGLES96/annotate/annotate.html.

  • Eagles: 1996b. Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicons and Corpora: A Common Proposal and Applications to European Languages, EAG-CLWG-MORPHSYN/R, 31 August 1996, retrieved 8 May 1999 from the World Wide Web, http://www.ilc.pi.cnr.it/EAGLES96/morphsyn/morphsyn.html.

  • Engh, Jan: 1993, 'Linguistic Normalisation in Language Industry: Some Normative and Descriptive Aspects of Dictionary Development', Hermes: Journal of Linguistics 10, 53-64.

    Google Scholar 

  • Fabricius-Hansen, Cathrine: 1991, 'Contrastive Stylistics: Outline of a Research Project on German and Norwegian Non-fictional Prose', in Contrastive Linguistics: Papers from the CL Symposium at the Aarhus School of Business, Århus, Denmark, pp. 51-76.

  • Fabricius-Hansen, Cathrine: 1998, 'Information Density and Translation, with Special Reference to German-Norwegian-English', in Johansson and Oksefjell (1998), pp. 197-234.

    Google Scholar 

  • Frankenberg-Garcia, Ana: 1998, 'Using Translation Traps to Sort Out Portuguese-English Crosslinguistic Influence', paper delivered at the 7th Brazilian Translators' Forum and 1st Brazilian International Translators' Forum, University of São Paulo, Brazil.

  • Garside, Roger, Geoffrey Leech and Anthony McEnery (eds): 1997, Corpus Annotation: Linguistic Information from Computer Text Corpora, Longman, London.

    Google Scholar 

  • Gawrońska, Barbara: 1993, An MT Oriented Model of Aspect and Article Semantics, Lund University Press, Lund.

    Google Scholar 

  • Granger, Sylviane: 1998, 'The Computer Learner Corpus: A Testbed for Electronic EFL Tools', in Nerbonne (1998), pp. 175-188.

    Google Scholar 

  • Hovy, Eduard, Nancy Ide, Robert Frederking, Joseph Mariani and Antonio Zampolli: 1998, 'Multilingual Information Management: Current Levels and Future Abilities', July 1998, retrieved 5 March 1999 from the World Wide Web, http://www.cs.cmu.edu/~ref/mlim/.

  • Isabelle, Pierre, Marc Dymetman, George Foster, Jean-Marc Jutras, Elliot Macklovitch, François Perrault, Xiaobo Ren and Michel Simard: 1993, 'Translation Analysis and Translation Automation', in TMI-93: the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation with Special Emphasis on: MT in the Next Generation, Kyoto, Japan, pp. 201-217.

  • Johansson, Stig and Signe Oksefjell (eds): (1998), Corpora and Crosslinguistic Research: Theory, Method, and Case Studies, Rodopi, Amsterdam.

    Google Scholar 

  • Källgren, Gunnel: 1985, 'Swedish Language Processing', in Proceedings of ELS Conference on Natural-Language Applications, Lyngby, Denmark, pp. 1-6.

  • Kay, Martin, Jean Mark Gawron and Peter Norvig: 1994, Verbmobil: A Translation System for Face-to-Face Dialog, Center for the Study of Language and Information, Stanford, California.

    Google Scholar 

  • Kilgarriff, Adam: 1997, 'I Don't Believe in Word Senses', Computers and the Humanities 31, 91-113.

    Google Scholar 

  • Koskenniemi, Kimmo: 1983, Two-level Morphology: A General Computational Model for Word-Form Recognition and Production, Publication No. 11, Department of General Linguistics, University of Helsinki.

  • Landsbergen, Jan: 1987, 'Isomorphic Grammars and their Use in the Rosetta Translation System', in Margaret King (ed.), Machine Translation Today: The State of the Art, Edinburgh University Press, Edinburgh, pp. 351-372.

    Google Scholar 

  • Leech, Geoffrey: 1997, 'Grammatical Tagging', in Anthony McEnery (eds): Corpus Annotation: Linguistic Information from Computer Text Corpora, Longman, London Garside et al. (1997), pp. 19-33.

    Google Scholar 

  • León, Fernando Sánchez and Amalio F. Nieto Serrano: 1997, 'Retargeting a Tagger', in Anthony McEnery (eds): Corpus Annotation: Linguistic Information from Computer Text Corpora, Longman, London Garside et al. (1997), pp. 151-165.

    Google Scholar 

  • Macklovitch, Elliott: 1992, 'Where the Tagger Falters', in Quatrième colloque international sur les aspects théoriques et méthodologiques de la traduction automatique, Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Méthodes empiristes versus méthodes rationalistes en TA, Empiricist vs. Rationalist Methods in MT — TMI-92, Montréal, Canada, pp. 113-126.

  • Marques, Rui: 1994, 'Anotação Contextual do Corpus INESC, 1990' [Contextual annotation of Corpus INESC, 1990], INESC Report, Lisbon.

  • Medeiros, José Carlos: 1992, 'Ferramentas de processamento de corpora usando o PALAVROSO' [Corpus processing tools using PALAVROSO], in Diana Santos (ed.), Processamento de corpora no INESC, Vol. 1, INESC Report RT-65/92, Lisbon, pp. 29-37.

  • Medeiros, José Carlos, Rui Marques and Diana Santos: 1993, 'Português Quantitativo' [Quantitative Portuguese], in Actas do 1.o Encontro de Processamento da Língua Portuguesa (Escrita e Falada) — EPLP'93, Lisbon, pp. 33-38.

  • Mota, Cristina: 1999, 'Enhancing the INTEX Morphological Parser with Lexical Constraints', Lingvisticae Investigationes 12, pp. 413-423.

    Google Scholar 

  • Nerbonne, John (ed.): 1998. Linguistic Databases, CSLI Publications, Stanford, Calif.

    Google Scholar 

  • O'Hagan, Minako: 1996, The Coming Industry of Teletranslation, Multilingual Matters Ltd, Clevedon.

    Google Scholar 

  • Paraboni, Ivandré and Vera Lúcia Strube de Lima: 1998, 'Resolução de referências pronominais possessivas no português escrito' [Resolution of possessive pronominal reference in written Portuguese], in Anais do III Encontro para o Processamento Computacional de Português Escrito e Falado, PROPOR'98, Porto Alegre, Brazil, pp. 48-58.

  • Pinkham, Jessie: 1996, 'Grammar Sharing Between English and French', Microsoft Research Report MSR-TR-96-15, Redmond, WA.

  • Rocha, Marco António Esteves da: 1998, A Description of an Annotation Scheme to Analyse Anaphora in Dialogues, Cognitive Science Research Paper 347, University of Sussex, Brighton, England.

    Google Scholar 

  • Santos, Diana: 1990, 'Lexical Gaps and Idioms in Machine Translation', in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, pp. 330-335.

    Google Scholar 

  • Santos, Diana: 1993, 'Broad-coverage Machine Translation', in K. Jensen, G. Heidorn and S. Richardson (eds), Natural Language Processing: The PLNLP Approach, Kluwer Academic Publishers, Dordrecht, pp. 101-118.

    Google Scholar 

  • Santos, Diana: 1994, 'Bilingual Alignment and Tense', in Proceedings of the Second Annual Workshop on Very Large Corpora, Kyoto, Japan, pp. 129-141.

  • Santos, Diana: 1995, 'L'Imperfeito portugais: étude systématique de ses fonctions et de comment en rendre compte en traduisant vers l'anglais' [Portuguese Imperfeito: a systematic study of its functions and how to render it when translating into English], paper presented at XXIV Colloque sur la linguistique des langues romanes, Palermo, Italy; available at http://www.portugues.mct.pt/Diana/public.html.

  • Santos, Diana: 1996a, 'Português Computacional' [Computational Portuguese], in Actas do Congresso Internacional sobre o português, 1994, Lisbon, Vol. III, pp. 167-184.

    Google Scholar 

  • Santos, Diana Maria de Sousa Marques Pinto dos: 1996b, 'Tense and Aspect in English and Portuguese: a Contrastive Semantical [sic] Study', PhD thesis, Instituto Superior Técnico, Technical University of Lisbon.

  • Santos, Diana: 1996c, 'Para uma classificação aspectual portuguesa do português' [Towards a Portuguese aspectual classification for Portuguese], in Actas do XII Encontro da Associação Portuguesa de Linguística, Braga, Portugal, pp. 299-315.

  • Santos, Diana: 1997, 'The Importance of Vagueness in Translation: Examples from English to Portuguese', Romansk Forum 5, 43-69.

    Google Scholar 

  • Santos, Diana: 1998a, 'Punctuation and Multilinguality: Reflections from a Language Engineering Perspective', in Jo Terje Ydstie and Anne C. Wollebak (eds), Working Papers in Applied Linguistics (Department of Linguistics, University of Oslo) 4/98, pp. 138-160.

  • Santos, Diana: 1998b, 'Perception verbs in English and Portuguese', in Johansson and Oksefjell (1998), pp. 319-342.

    Google Scholar 

  • Santos, Diana: 1999, 'The Pluperfect in English and Portuguese: What Translation Patterns Show', in Hilde Hasselgaard and Signe Oksefjell (eds), Out of Corpora: Studies in Honour of Stig Johansson, Rodopi, Amsterdam, pp. 283-299.

    Google Scholar 

  • Santos, Diana: in press, 'Comparação de corpora em português: algumas experiências' [Comparison of Portuguese corpora: some experiments] to appear in Tony Berber Sardinha (ed.), A língua portuguesa no computador, São Paulo.

  • Santos, Diana: in preparation, Corpus-based Contrastive Semantics, with Special Reference to Tense and Aspect in Portuguese and English, Rodopi, Amsterdam.

  • Santos, Diana, Carla Fernandes, Rui Marques and José Carlos Medeiros: 1992, 'Gramática sem dicionário: Relatório preliminar' [Grammar without dictionary: Preliminary report], INESC Report RT/15-92, Lisbon.

  • Santos, Diana and Signe Oksefjell: 1999, 'An Evaluation of the Translation Corpus Aligner with Special Reference to the Language Pair English-Portuguese', in NODALIDA'99, Proceedings from the 12th “Nordisk Datalingvistikkdager”, Trondheim, pp. 191-205.

  • Schulze, Bruno Maximilian and Oliver Christ: 1996, The CQP User's Manual, Version 1.6, Stuttgart: Institut für Maschinelle Sprachverarbeitung (IMS), Universität Stuttgart.

    Google Scholar 

  • Silberztein, Max: 1993, Dictionnaires électroniques et analyse automatique de textes: le système INTEX [Electronic dictionaries and automatic analysis of texts: the INTEX system], Masson Ed, Paris.

    Google Scholar 

  • Simons, Gary F. and John V. Thomson: 1998, 'Multilingual Data Processing in the CELLAR Environment', in Nerbonne (1998), pp. 203-234.

    Google Scholar 

  • Sinclair, John, Oliver Mason, Jackie Ball and Geoff Barnbrook: 1998, 'Language Independent Statistical Software for Corpus Exploration', Computers and the Humanities 31, 229-255.

    Google Scholar 

  • Slobin, Dan I.: 1987, 'Thinking for Speaking', in Berkeley Linguistics Society Proceedings of the Thirteenth Annual Meeting: General Session and Parasession on Grammar and Cognition, Berkeley, CA, pp. 435-445.

  • Slobin, Dan I.: 1997, 'Mind, Code and Text', in Joan Bybee, John Haiman and Sandra A. Thompson (eds), Essays on Language Function and Language Type, Dedicated to T. Givón, John Benjamins, Amsterdam, pp. 437-467.

    Google Scholar 

  • Slobin, D. I.: 2000, 'Verbalized Events: A Dynamic Approach to Linguistic Relativity and Determinism', in S. Niemeier and R. Dirven (eds), Evidence for Linguistic Relativity, John Benjamins, Amsterdam/Philadelphia, pp. 107-138.

    Google Scholar 

  • Snell-Hornby, Mary: 1983, Verb-descriptivity in German and English: A Contrastive Study in Semantic Fields, Carl Winter Universitätsverlag, Heidelberg.

    Google Scholar 

  • Somers, Harold, Jun-ichi Tsujii and Danny Jones: 1990, 'Machine Translation without a Source Text', in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3, pp. 271-276.

    Google Scholar 

  • Stede, Manfred: 1999, Lexical Semantics and Knowledge Representation in Multilingual Text Generation, Kluwer Academic Publishers, Boston.

    Google Scholar 

  • Steiner, George: 1975, After Babel: Aspects of Language and Translation, Oxford University Press, Oxford.

    Google Scholar 

  • Talmy, Leonard: 1985, 'Lexicalization patterns: Semantic structure in Lexical Forms', in Timothy Shopen (ed.), Language Typology and Semantic Description, Vol.3: Grammatical Categories and the Lexicon, Cambridge University Press, Cambridge, pp. 57-149.

    Google Scholar 

  • Tobin, Yishai: 1994, Invariance, Markedness and Distinctive Feature Analysis: A Contrastive Study of Sign Systems in English and Hebrew, John Benjamins, Amsterdam.

    Google Scholar 

  • Toury, Gideon: 1995, Descriptive Translation Studies and Beyond, John Benjamins, Amsterdam.

    Google Scholar 

  • Trancoso, Isabel, with the collaboration of Céu Viana: 1995, 'Issues in the Pronunciation of Proper Names', in Proceedings of the Workshop on Integration of Language and Speech, Moscow, pp. 193-209.

  • Tsujii, Jun-ichi: 1986, 'Future Directions of Machine Translation', in 11th International Conference on Computational Linguistics: Proceedings of Coling '86, Bonn, pp. 655-668.

  • União Latina: 1998, 'A presença das línguas e das culturas latinas na Internet' [The Internet presence of Latin languages and cultures], União Latina, 28 September 1998, retrieved 5 February 1999 from the World Wide Web, http://www.unilat.org/dtil/lenguainternet/pt/lingua/ lingua_indice.htm.

  • Vendler, Zeno: 1967, Linguistics in Philosophy, Cornell University Press, Ithaca, NY.

    Google Scholar 

  • Whitelock, Peter: 1992. 'Shake-and-bake Translation', in Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, Actes du quinzième colloque international en linguistique informatique: COLING-92, Nantes, pp. 784-791.

  • Yarowsky, David: 1992, 'Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora', in Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, Actes du quinzième colloque international en linguistique informatique: COLING-92, Nantes, pp. 454-460.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santos, D. Toward Language-dependent Applications. Machine Translation 14, 83–112 (1999). https://doi.org/10.1023/A:1008169917741

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008169917741

Navigation