Skip to main content
Log in

Information Retrieval on an SCI-Based PC Cluster

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This article presents an efficient parallel information retrieval (IR) system which provides fast information service for the Internet users on low-cost high-performance PC-NOW environment. The IR system is implemented on a PC cluster based on the scalable coherent interface (SCI), a powerful interconnecting mechanism for both shared memory models and message-passing models. In the IR system, the inverted-index file (IIF) is partitioned into pieces using a greedy declustering algorithm and distributed to the cluster nodes to be stored on each node's hard disk. For each incoming user's query with multiple terms, terms are sent to the corresponding nodes which contain the relevant pieces of the IIF to be evaluated in parallel. The IR system is developed using a distributed-shared memory (DSM) programming technique based on the SCI. According to the experiments, the IR system outperforms an MPI-based IR system using Fast Ethernet as an interconnect. Speed-up of up to 5.0 was obtained with an 8-node cluster in processing each query on a 500,000-document IIF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Basu, V. Buch, W. Vogels, and T. von Eicken. U-Net: A user-level network interface for parallel and distributed computing. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, pp. 40–53. Copper Mountain, Colorado, 3–6 December 1995.

    Google Scholar 

  2. N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W.-K. Su. Myrinet: a gigabit-per-second local area network. IEEE Micro, 15(1):29–36, 1995.

    Google Scholar 

  3. R. Clark. SCI interconnect chipset and adapter: building large scale enterprise servers with Pentium Pro SHV nodes. White Paper. Data General Corporation, 1999. Available at http://www.dg.com/ about/html/sci_ interconnect_chipset_and_a.html.

  4. J. K. Cringean, R. England, G. A. Manson, and P. Willett. Network design for the implementation of text searching using a multicomputer. Information Processing and Management, 27(4):265–283, 1991.

    Google Scholar 

  5. T. von Eicken and D. Culler et al. Active messages: a mechanism for integrated communication and computation. Report No. UCB/CSD 92/# 675. Computer Science Division, University of California, Berkeley, CA 94720, March 1992.

  6. F. Giacomini, T. Amundsen, A. Bogaerts, R. Hauser, B. D. Johnsen, H. Kohmann, R. Nordstrøm, and P. Werner. Esprit Project 23174 — Software Infrastructure for SCI (SISCI), Version 2.1.1. White Paper. Dolphin Interconnect Solutions, 1999. Available at http://-www.dolphinics.no/ customer/software/documentation/SISCI_API-2_1_1.pdf.

  7. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789–828, 1996.

    Google Scholar 

  8. IBM. The IBM NUMA-Q enterprise server architecture. White Paper. IBM, 2000. Available at http://www.sequent.com/whitepapers/numa _arch.html.

  9. Myricom, Inc. The GM API. White Paper. Myricom, Inc., 1997. Available at http://www.myri.com/ GM/doc/gm_toc.html.

  10. S. Pakin, V. Karamcheti, and A. A. Chien. Fast messages (FM): ef.cient, portable communication for workstation clusters and massively-parallel processors. IEEE Concurrency, 5(2):60–73, 1997.

    Google Scholar 

  11. S. H. Park and H. C. Kwon. An Improved Relevance Feedback for Korean Information Retrieval System. In Proceedings of the 16th IASTED International Conference Applied Informatics, pp. 65–68. IASTED/ACTA Press. Garmisch-Partenkirchen, Germany, 23–25 February, 1998.

    Google Scholar 

  12. C. J. van Rijsbergen. Information Retrieval, 2nd ed., Butterworths, London, 1979.

    Google Scholar 

  13. G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. American Society for Information Science, 41(4):288–297, 1990.

    Google Scholar 

  14. R. Sharma. A generic machine for parallel information retrieval. Information Processing and Management, 25(3):223–235, 1989.

    Google Scholar 

  15. C. Stanfill and R. Thau. Information retrieval on the connection machine: 1 to 8192 Gigabytes. Information processing and Management, 27(4):285–310, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, SH., Kwon, HC., Ryu, K.R. et al. Information Retrieval on an SCI-Based PC Cluster. The Journal of Supercomputing 19, 251–265 (2001). https://doi.org/10.1023/A:1011178530932

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011178530932

Navigation