Abstract
We present a parallel vector space based text retrieval prototype implemented on a low-cost PC cluster running Linux operating system, using the PVM message passing library. We also embed the inverted file structure into our proposed prototype for fast retrieval. From several experiments derived from the standard TREC-9 collection, this prototype can index up to 500,000 web pages per hour using a simple x86 machine. We also obtain 5.4 seconds query response time on searching in the one and a half million TREC-9 web pages, using 2 machines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates and B. Ribeiro-Neto, editors. Modern Information Retrieval. Addison-Wesley, 1999.
J. Cringean, R. England, G. Manson, and P. Willett. Parallel Text Searching In Serial Files using a Processor Farm. In Proceedings of the ACM SIGIR, 1990.
P. Efraimidis, C. Glymidakis, and B. Mamalis. Parallel Text Retrieval on a High Performance Supercomputer using the Vector Space Model. In Proceedings of the ACM SIGIR, 1995.
W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures & Algorithms. Prentice Hall, 1992.
A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine-A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994.
F. Grandi, P. Tiberio, and P. Zezula. Frame Sliced Partitioned Parallel Signature Files. In Proceedings of the ACM SIGIR, 1992.
J.H. Lee. Combining Multiple Evidence from Different Properties of Weighting Schemes. In Proceedings of the ACM SIGIR, 1995.
C. Pogue and P. Willett. Use of Text Signatures for Document Retrieval in a Highly Parallel Environment. Parallel Computing, 4, 1987.
TREC-9 publications. See http://trec.nist.gov, TREC web site.
G. Salton, editor. The SMART Retrieval System, Experiments in Automatic Document Processing. Prentice-Hall, 1971.
G. Salton and C. Buckley. Parallel Text Search Methods. Communications of the ACM, 31(2), 1988.
G. Salton and C. Buckley. Term-Weighting Approaches in Automatic Text Retrieval. Info. Processing and Management, 24(5), 1988.
C. Stanfill. Partitioned Posting Files: A Parallel Inverted File Structure for Information Retrieval. In Proceedings of the ACM SIGIR, 1990.
C. Stanfill and B. Kahle. Parallel Free Text Search on the Connection Machine System. Communications of the ACM, 29(12), 1986.
C. Stanfill, R. Thai, and D. Waltz. A Parallel Indexed Algorithm for Information Retrieval. In Proceeding of the ACM SIGIR, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rungsawang, A., Laohakanniyom, A., Lertprasertkune, M. (2001). Low-Cost Parallel Text Retrieval Using PC-Cluster. In: Cotronis, Y., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2001. Lecture Notes in Computer Science, vol 2131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45417-9_56
Download citation
DOI: https://doi.org/10.1007/3-540-45417-9_56
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42609-7
Online ISBN: 978-3-540-45417-5
eBook Packages: Springer Book Archive