Skip to main content

Combining Evidence in Cognate Identification

  • Conference paper
Book cover Advances in Artificial Intelligence (Canadian AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3060))

Abstract

Cognates are words of the same origin that belong to distinct languages. The problem of automatic identification of cognates arises in language reconstruction and bitext-related tasks. The evidence of cognation may come from various information sources, such as phonetic similarity, semantic similarity, and recurrent sound correspondences. I discuss ways of defining the measures of the various types of similarity and propose a method of combining then into an integrated cognate identification program. The new method requires no manual parameter tuning and performs well when tested on the Indoeuropean and Algonquian lexical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, D., Och, F., Purdy, D., Smith, N., Yarowsky, D.: Statistical machine translation. Technical report, Johns Hopkins University (1999)

    Google Scholar 

  2. Brew, C., McKelvie, D.: Word-pair extraction for lexicography. In: Oflazer, K., Somers, H. (eds.) Proceedings of the 2nd International Conference on New Methods in Language Processing, Ankara, Bilkent University, pp. 45–55 (1996)

    Google Scholar 

  3. Kenneth, W.: Church. Char align: A program for aligning parallel texts at the character level. In: Proceedings of ACL 1993: 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 1–8 (1993)

    Google Scholar 

  4. Dyen, I., Kruskal, J.B., Black, P.: An Indoeuropean classification: A lexicostatistical experiment. Transactions of the American Philosophical Society 82(5) (1992)

    Google Scholar 

  5. Fellbaum, C. (ed.): WordNet: an electronic lexical database. The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Guy, J.B.M.: An algorithm for identifying cognates in bilingual wordlists and its applicability to machine translation. Journal of Quantitative Linguistics 1(1), 35–42 (1994), MS-DOS executable available at http://garbo.uwasa.fi

    Article  MathSciNet  Google Scholar 

  7. Hewson, J.: Comparative reconstruction on the computer. In: Proceedings of the 1st International Conference on Historical Linguistics, pp. 191–197 (1974)

    Google Scholar 

  8. Hewson, J.: A computer-generated dictionary of proto-Algonquian. Canadian Museum of Civilization, Hull (1993)

    Google Scholar 

  9. Hewson, J.: Vocabularies of Fox, Cree, Menomini, and Ojibwa (1999), Computer file

    Google Scholar 

  10. Kessler, B.: The Significance of Word Lists. CSLI Publications, Stanford (2001), Word lists available at http://spell.psychology.wayne.edu/~bkessler

    Google Scholar 

  11. Koehn, P., Knight, K.: Knowledge sources for word-level translation models. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 27–35 (2001)

    Google Scholar 

  12. Kondrak, G.: A new algorithm for the alignment of phonetic sequences. In: Proceedings of NAACL 2000: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 288–295 (2000)

    Google Scholar 

  13. Kondrak, G.: Identifying cognates by phonetic and semantic similarity. In: Proceedings of NAACL 2001: 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 103–110 (2001)

    Google Scholar 

  14. Kondrak, G.: Determining recurrent sound correspondences by inducing translation models. In: Proceedings of COLING 2002: 19th International Conference on Computational Linguistics, pp. 488–494 (2002)

    Google Scholar 

  15. Kondrak, G.: Identifying complex sound correspondences in bilingual wordlists. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 432–443. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Kondrak, G., Dorr, B.: Identification of confusable drug names: A new approach and evaluation methodology (2004) (in preparation)

    Google Scholar 

  17. Kondrak, G., Marcu, D., Knight, K.: Cognates can improve statistical translation models. In: Proceedings of HLT-NAACL 2003: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 46–48 (2003), Companion volume

    Google Scholar 

  18. Mann, G.S., Yarowsky, D.: Multipath translation lexicon induction via bridge languages. In: Proceedings of NAACL 2001: 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 151–158 (2001)

    Google Scholar 

  19. McEnery, T., Oakes, M.: Sentence and word alignment in the CRATER Project. In: Thomas, J., Short, M. (eds.) Using Corpora for Language Research, pp. 211–231. Longman (1996)

    Google Scholar 

  20. Dan Melamed, I.: Automatic discovery of non-compositional compounds in parallel data. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 97–108 (1997)

    Google Scholar 

  21. Dan Melamed, I.: Bitext maps and alignment via pattern recognition. Computational Linguistics 25(1), 107–130 (1999)

    Google Scholar 

  22. Dan Melamed, I.: Models of translational equivalence among words. Computational Linguistics 26(2), 221–249 (2000)

    Article  Google Scholar 

  23. Oakes, M.P.: Computer estimation of vocabulary in protolanguage from word lists in four daughter languages. Journal of Quantitative Linguistics 7(3), 233–243 (2000)

    Article  Google Scholar 

  24. Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, Canada, pp. 67–81 (1992)

    Google Scholar 

  25. Swadesh, M.: Lexico-statistical dating of prehistoric ethnic contacts. Proceedings of the American Philosophical Society 96, 452–463 (1952)

    Google Scholar 

  26. Tiedemann, J.: Automatic construction of weighted string similarity measures. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, Maryland (1999)

    Google Scholar 

  27. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)

    MATH  MathSciNet  Google Scholar 

  28. Yarowsky, D., Wincentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: Proceedings of ACL 2000, pp. 207–216 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kondrak, G. (2004). Combining Evidence in Cognate Identification. In: Tawfik, A.Y., Goodwin, S.D. (eds) Advances in Artificial Intelligence. Canadian AI 2004. Lecture Notes in Computer Science(), vol 3060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24840-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24840-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22004-6

  • Online ISBN: 978-3-540-24840-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics