Abstract
We present a study concerning the applicability of a distributed computing technique to a million-page free-text document retrieval problem. We propose a high-performance DSIR retrieval algorithm on a Beowulf PC Pentium cluster using PVM message-passing library. DSIR is a vector space based retrieval model in which semantic similarity between documents and queries is characterized by semantic vectors derived from the document collection. Retrieval of relevant answers is then interpreted in terms of computing the geometric proximity between a large number of document vectors and query vectors in a semantic vector space. We test this DSIR parallel algorithm and present the experimental results using a large-scale TREC-7 collection and investigate both computing performance and problem size scalability issue.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Geist et al. PVM: Parallel Virtual Machine-A Users’ Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
J. Dongarra et al. Integrated PVM Framework Supports Heterogeneous Networking Computing. Computers in Physics, 7(2):166–175, April 1993.
T.E. Anderson et al. A Case for NOWs. IEEE Micro, Febuary 1995.
A. Rungsawang. DSIR: The First TREC-7 Attempt. In E. Voorhees and D.K. Harman, editors, Proceedings of the Seventh Text REtrieval Conference. NIST Special publication, November 1988.
A. Rungsawang and M. Rajman. Textual Information Retrieval Based on the Concept of the Distributional Semantics. In Proceedings of the 3 th International Conference on Statistical Analysis of Textual Data, December 1995.
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983.
P. Uthayopas. Beowulf Class Cluster: Opportunities and Approach in Thailand. In First NASA workshop on Beowulf class computer systems. NASA JPL, October 1997.
E.M. Voorhees and D.K. Harman. Overview of the Seventh Text REtrieval Confrence (TREC-7). In Proceedings of the Seventh Text REtrieval Conference. NIST Special publication, November 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rungsawang, A., Tangpong, A., Laohawee, P. (1999). Parallel DSIR Text Retrieval System. In: Dongarra, J., Luque, E., Margalef, T. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 1999. Lecture Notes in Computer Science, vol 1697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48158-3_40
Download citation
DOI: https://doi.org/10.1007/3-540-48158-3_40
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66549-6
Online ISBN: 978-3-540-48158-4
eBook Packages: Springer Book Archive