Abstract
The MALACH project seeks to help users find information in a vast multilingual collections of untranscribed oral history interviews. This paper introduces the goals of the project and focuses on supporting access by users who are unfamiliar with the interview language. It begins with a review of the state of the art in crosslanguage speech retrieval; approaches that will be investigated in the project are then described. Czech was selected as the first non-English language to be supported, so results of an initial experiment with Czech/English cross-language retrieval are reported.
Preview
Unable to display preview. Download preview PDF.
References
James Allan. Perspectives on information retrieval and speech. In Anni R. Coden, Eric W. Brown, and Savitha Srinivasan, editors, Information Retrieval Techniques for Speech Applications, pages 1–10. Springer, 2002. Lecture Notes in Computer Science 2273.
Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, and Nizar Habash. Improved word-level alignment: Injecting knowledge about MT divergences. Technical Report CS-TR-4333, University of Maryland, Institute for Advanced Computer Studies, 2002.
Frederic C. Gey, Michael Buckland, Aitao Chen, and Ray Larson. Entry vocabulary-a technology to enhance digital search. In First International Conference on Human Language Technologies, 2001.
Samuel Gustman, Dagobert Soergel, Douglas Oard, William Byrne, Michael Picheny, Bhuvana Ramadhadran, and Douglas Greenberg. Supporting access to large digital oral history archives. In The Second Joint Digital Libraries, June 2002. to appear.
Jan Hajič, Eva Hajičová, Petr Pajas, Jarmila Panevová, Petr Sgall, and Barbora Vidová-Hladká. Prague dependency treebank 1.0, 2001. LDC2001T10.
Gina-Anne Levow and Douglas W. Oard. Signal boosting for translingual topic tracking. In James Allan, editor, Topic Detection and Tracking: Event-based Information Organization, pages 175–195. Kluwer Academic Publishers, Boston, 2002.
J. Scott McCarley and Martin Franz. Influence of speech recognition errors on topic detection. In Proceedings of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, pages 342–344, 2000.
Helen Meng, Berlin Chen, Erika Grams, Sanjeev Khudanpur, Gina-Anne Levow, Wai-Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin-Min Wang, and Jianqiang Wang. Mandarin-English information (MEI): Investigating translingual speech retrieval. In First International Conference on Human Language Technologies, San Diego, March 2001.
Douglas W. Oard and Anne R. Diekema. Cross-language information retrieval. In Annual Review of Information Science and Technology, volume 33. American Society for Information Science, 1998.
Douglas W. Oard and Julio Gonzalo. The CLEF 2001 interactive track. In Carol Peters, editor, Proceedings of the Second Cross-Language Evaluation Forum. 2002.
D. Yarowsky, G. Nagi, and R. Wicentowski. Inducing multilingual text analysis tools via robust projection across aligned corpora. In First International Conference on Human Language Technologies, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oard, D.W. et al. (2002). Cross-Language Access to Recorded Speech in the MALACH Project. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_8
Download citation
DOI: https://doi.org/10.1007/3-540-46154-X_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44129-8
Online ISBN: 978-3-540-46154-8
eBook Packages: Springer Book Archive