Skip to main content

Low-Cost Parallel Text Retrieval Using PC-Cluster

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2131))

Abstract

We present a parallel vector space based text retrieval prototype implemented on a low-cost PC cluster running Linux operating system, using the PVM message passing library. We also embed the inverted file structure into our proposed prototype for fast retrieval. From several experiments derived from the standard TREC-9 collection, this prototype can index up to 500,000 web pages per hour using a simple x86 machine. We also obtain 5.4 seconds query response time on searching in the one and a half million TREC-9 web pages, using 2 machines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto, editors. Modern Information Retrieval. Addison-Wesley, 1999.

    Google Scholar 

  2. J. Cringean, R. England, G. Manson, and P. Willett. Parallel Text Searching In Serial Files using a Processor Farm. In Proceedings of the ACM SIGIR, 1990.

    Google Scholar 

  3. P. Efraimidis, C. Glymidakis, and B. Mamalis. Parallel Text Retrieval on a High Performance Supercomputer using the Vector Space Model. In Proceedings of the ACM SIGIR, 1995.

    Google Scholar 

  4. W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures & Algorithms. Prentice Hall, 1992.

    Google Scholar 

  5. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine-A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994.

    Google Scholar 

  6. F. Grandi, P. Tiberio, and P. Zezula. Frame Sliced Partitioned Parallel Signature Files. In Proceedings of the ACM SIGIR, 1992.

    Google Scholar 

  7. J.H. Lee. Combining Multiple Evidence from Different Properties of Weighting Schemes. In Proceedings of the ACM SIGIR, 1995.

    Google Scholar 

  8. C. Pogue and P. Willett. Use of Text Signatures for Document Retrieval in a Highly Parallel Environment. Parallel Computing, 4, 1987.

    Google Scholar 

  9. TREC-9 publications. See http://trec.nist.gov, TREC web site.

  10. G. Salton, editor. The SMART Retrieval System, Experiments in Automatic Document Processing. Prentice-Hall, 1971.

    Google Scholar 

  11. G. Salton and C. Buckley. Parallel Text Search Methods. Communications of the ACM, 31(2), 1988.

    Google Scholar 

  12. G. Salton and C. Buckley. Term-Weighting Approaches in Automatic Text Retrieval. Info. Processing and Management, 24(5), 1988.

    Google Scholar 

  13. C. Stanfill. Partitioned Posting Files: A Parallel Inverted File Structure for Information Retrieval. In Proceedings of the ACM SIGIR, 1990.

    Google Scholar 

  14. C. Stanfill and B. Kahle. Parallel Free Text Search on the Connection Machine System. Communications of the ACM, 29(12), 1986.

    Google Scholar 

  15. C. Stanfill, R. Thai, and D. Waltz. A Parallel Indexed Algorithm for Information Retrieval. In Proceeding of the ACM SIGIR, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rungsawang, A., Laohakanniyom, A., Lertprasertkune, M. (2001). Low-Cost Parallel Text Retrieval Using PC-Cluster. In: Cotronis, Y., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2001. Lecture Notes in Computer Science, vol 2131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45417-9_56

Download citation

  • DOI: https://doi.org/10.1007/3-540-45417-9_56

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42609-7

  • Online ISBN: 978-3-540-45417-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics