skip to main content
10.1145/1076034.1076124acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Bootstrapping dictionaries for cross-language information retrieval

Authors Info & Claims
Published:15 August 2005Publication History

ABSTRACT

The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages. We here introduce a methodology by which multilingual dictionaries (for Spanish and Swedish) emerge automatically from simple seed lexicons. These seed lexicons are automatically generated, by cognate mapping, from (previously manually constructed) Portuguese and German as well as English sources. Lexical and semantic hypotheses are then validated and new ones iteratively generated by making use of co-occurrence patterns of hypothesized translation synonyms in parallel corpora. We evaluate these newly derived dictionaries on a large medical document collection within a cross-language retrieval setting.

References

  1. P.-J. Cheng, J.-W. Teng, R.-C. Chen, J.-H. Wang, W.-H. Lu, and L.-F. Chien. Translating unknown queries with web corpora for cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146--153, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Déjean, É. Gaussier, and F. Sadat. An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In Proceedings of the 19th Intl. Conf. on Computational Linguistics, pages 218--224, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Eichmann, M. E. Ruiz, and P. Srinivasan. Cross-language information retrieval with the Umls Metathesaurus. In Proceedings of the 21st Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 72--80, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Fung. A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, pages 1--17. 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Fung and L.Y. Yee An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics & 17th International Conference on Computational Linguistics, pages 414--420. 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Gonzalo, F. Verdejo, and I. Chugur. Using EuroWord- Net in a concept-based approach to cross-language text retrieval. Applied Artificial Intelligence, 13(7):647--678, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. U. Hahn, K. Markó, M. Poprat, S. Schulz, J. Wermter, and P. Nohama. Crossing languages in text retrieval via an interlingua. In RIAO 2004 -- Conference Proceedings: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, pages 100--115, 2004.]]Google ScholarGoogle Scholar
  8. W. R. Hersh, C. Buckley, T. J. Leone, and D. H. Hickam. Ohsumed: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the 17th Annual Intl. ACM SIGIR Conference on Research and Development in Information Retrieval, pages 192--201, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. R. Hersh and L. C. Donohoe. Saphire International: A tool for cross-language information retrieval. In Proceedings of the AMIA Annual Fall Symposium, pages 673--677, 1998.]]Google ScholarGoogle Scholar
  10. P. Koehn and K. Knight. Learning a translation lexicon from monolingual corpora. In Unsupervised Lexical Acquisition. Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX), pages 9--16, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Markó, U. Hahn, S. Schulz, P. Daumke, and P. Nohama. Interlingual indexing across different languages. In RIAO 2004 -- Conference Proceedings: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, pages]]Google ScholarGoogle Scholar
  12. D. W. Oard and A. R. Diekema. Cross-language information retrieval. In M. E. Williams, editor, Annual Review of Information Science and Technology (ARIST), Vol. 33: 1998, pages 223--256. Medford, NJ: Information Today, 1998.]]Google ScholarGoogle Scholar
  13. A. Pirkola, T. Hedlund, H. Keskustalo, and K. Järvelin. Dictionary-based cross-language information retrieval: Problems, methods, and research findings. Information Retrieval, 4(3/4):209--230, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Rapp. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 519--526, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Rogati and Y. Yang. Resource selection for domain-specific cross-lingual IR. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 154--161, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Ruiz, A. Diekema, and P. Sheridan. Cindor conceptual interlingua document retrieval: Trec-8 evaluation. In Proceedings of the 8th Text REtrieval Conference (TREC-8), pages 597--606, 1999.]]Google ScholarGoogle Scholar
  18. MeSH. Medical Subject Headings. Bethesda, MD: National Library of Medicine, 2004.]]Google ScholarGoogle Scholar
  19. Umls. Unified Medical Language System. Bethesda, MD: National Library of Medicine, 2004.]]Google ScholarGoogle Scholar
  20. S. Schulz, M. Honeck, and U. Hahn. Biomedical text retrieval in languages with a complex morphology. In Proceedings of the ACL/NAACL 2002 Workshop on `Natural Language Processing in the Biomedical Domain', pages 61--68, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Tellex, B. Katz, J. J. Lin, A. Fernandes, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 41--47, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Volk, B. Ripplinger, S. Vintar, P. Buitelaar, D. Raileanu, and B. Sacaleanu. Semantic annotation for concept-based cross-language medical information retrieval. International Journal of Medical Informatics, 67(1/3):79--112, 2002.]]Google ScholarGoogle Scholar
  23. D. Widdows, B. Dorow, and C.-K. Chan. Using parallel corpora to enrich multilingual lexical resources. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pages 240--245, 2002.]]Google ScholarGoogle Scholar
  24. Y. Zhang and P. Vines. Using the web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 162--169, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bootstrapping dictionaries for cross-language information retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
          August 2005
          708 pages
          ISBN:1595930345
          DOI:10.1145/1076034

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 August 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader