Skip to main content

Cross-Language Access to Recorded Speech in the MALACH Project

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2002)

Abstract

The MALACH project seeks to help users find information in a vast multilingual collections of untranscribed oral history interviews. This paper introduces the goals of the project and focuses on supporting access by users who are unfamiliar with the interview language. It begins with a review of the state of the art in crosslanguage speech retrieval; approaches that will be investigated in the project are then described. Czech was selected as the first non-English language to be supported, so results of an initial experiment with Czech/English cross-language retrieval are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. James Allan. Perspectives on information retrieval and speech. In Anni R. Coden, Eric W. Brown, and Savitha Srinivasan, editors, Information Retrieval Techniques for Speech Applications, pages 1–10. Springer, 2002. Lecture Notes in Computer Science 2273.

    Chapter  Google Scholar 

  2. Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, and Nizar Habash. Improved word-level alignment: Injecting knowledge about MT divergences. Technical Report CS-TR-4333, University of Maryland, Institute for Advanced Computer Studies, 2002.

    Google Scholar 

  3. Frederic C. Gey, Michael Buckland, Aitao Chen, and Ray Larson. Entry vocabulary-a technology to enhance digital search. In First International Conference on Human Language Technologies, 2001.

    Google Scholar 

  4. Samuel Gustman, Dagobert Soergel, Douglas Oard, William Byrne, Michael Picheny, Bhuvana Ramadhadran, and Douglas Greenberg. Supporting access to large digital oral history archives. In The Second Joint Digital Libraries, June 2002. to appear.

    Google Scholar 

  5. Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, Petr Sgall, and Barbora Vidová-Hladká. Prague dependency treebank 1.0, 2001. LDC2001T10.

    Google Scholar 

  6. Gina-Anne Levow and Douglas W. Oard. Signal boosting for translingual topic tracking. In James Allan, editor, Topic Detection and Tracking: Event-based Information Organization, pages 175–195. Kluwer Academic Publishers, Boston, 2002.

    Chapter  Google Scholar 

  7. J. Scott McCarley and Martin Franz. Influence of speech recognition errors on topic detection. In Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 342–344, 2000.

    Google Scholar 

  8. Helen Meng, Berlin Chen, Erika Grams, Sanjeev Khudanpur, Gina-Anne Levow, Wai-Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin-Min Wang, and Jianqiang Wang. Mandarin-English information (MEI): Investigating translingual speech retrieval. In First International Conference on Human Language Technologies, San Diego, March 2001.

    Google Scholar 

  9. Douglas W. Oard and Anne R. Diekema. Cross-language information retrieval. In Annual Review of Information Science and Technology, volume 33. American Society for Information Science, 1998.

    Google Scholar 

  10. Douglas W. Oard and Julio Gonzalo. The CLEF 2001 interactive track. In Carol Peters, editor, Proceedings of the Second Cross-Language Evaluation Forum. 2002.

    Google Scholar 

  11. D. Yarowsky, G. Nagi, and R. Wicentowski. Inducing multilingual text analysis tools via robust projection across aligned corpora. In First International Conference on Human Language Technologies, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oard, D.W. et al. (2002). Cross-Language Access to Recorded Speech in the MALACH Project. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-46154-X_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44129-8

  • Online ISBN: 978-3-540-46154-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics