Skip to main content

Performance Comparison of Clustered and Replicated Information Retrieval Systems

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

  • 2113 Accesses

Abstract

The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)

    Article  Google Scholar 

  2. Beitzel, S.M., et al.: Hourly Analysis of a Very Large Topically Categorized Web Query Log. In: Proc. of the 27th Conf. on Research and Development in Information Retrieval, pp. 321–328. ACM Press, New York (2004)

    Chapter  Google Scholar 

  3. Cacheda, F., et al.: Performance Network Analysis for Distributed Information Retrieval Architectures. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 527–529. Springer, Heidelberg (2005)

    Google Scholar 

  4. Cacheda, F., et al.: Performance Network Analysis for Distributed Information Retrieval Architectures. Information Processing and Management Journal, published on-line (2006)

    Google Scholar 

  5. Cacheda, F., Plachouras, V.,, Ounis, I.: Performance Analysis of Distributed Architectures to Index One Terabyte of Text. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 394–408. Springer, Heidelberg (2004)

    Google Scholar 

  6. Cacheda, F., Plachouras, V., Ounis, I.: A Case Study of Distributed Information Retrieval Architectures to Index One Terabyte of Text. Information Processing and Management Journal 41(5), 1141–1161 (2005)

    Article  Google Scholar 

  7. Cacheda, F., Viña, A.: Experiences retrieving information in the World Wide Web. In: Proc. of the 6th IEEE Symposium on Computers and Communications, pp. 72–79. IEEE Computer Society Press, Los Alamitos (2001)

    Chapter  Google Scholar 

  8. Cahoon, B., McKinley, K.S.: Performance evaluation of a distributed architecture for information retrieval. In: Proc. of 19th ACM-SIGIR International Conf. on Research and Development in Information Retrieval, pp. 110–118. ACM Press, New York (1996)

    Chapter  Google Scholar 

  9. Frieder, O., Siegelmann, H.T.: On the Allocation of Documents in Multiprocessor Information Retrieval Systems. In: Proc. of the 14th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 230–239. ACM Press, New York (1991)

    Chapter  Google Scholar 

  10. Hawking, D.: Scalable text retrieval for large digital libraries. In: Peters, C., Thanos, C. (eds.) ECDL 1997. LNCS, vol. 1324, pp. 127–146. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  11. Hawking, D., Thistlewaite, P.: Methods for Information Server Selection. ACM Transactions on Information Systems 17(1), 40–76 (1999)

    Article  Google Scholar 

  12. Jeong, B., Omiecinski, E.: Inverted File Partitioning Schemes in Multiple Disk Systems. IEEE Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)

    Article  Google Scholar 

  13. Jones, C.B., et al.: Spatial information retrieval and geographical ontologies an overview of the SPIRIT project. In: Proc. of the 25th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 387–388. ACM Press, New York (2002)

    Chapter  Google Scholar 

  14. Lin, Z., Zhou, S.: Parallelizing I/O intensive applications for a workstation cluster: a case study. ACM SIGARCH Computer Architecture News 21(5), 15–22 (1993)

    Article  Google Scholar 

  15. Little, M.C.: JavaSim User’s Guide. Public Release 0.3, Version 1.0. University of Newcastle upon Tyne (Retrieved 1 June 2003), http://javasim.ncl.ac.uk/manual/javasim.pdf

  16. Lu, Z., McKinley, K.: Partial collection replication versus caching for information retrieval systems. In: Proc. of the 25th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 248–255. ACM Press, New York (2000)

    Google Scholar 

  17. Moffat, A., Webber, W., Zobel, J.: Load Balancing for Term-Distributed Parallel Retrieval. In: Proc. of the 29th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 348–355. ACM Press, New York (2006)

    Chapter  Google Scholar 

  18. Moffat, A., et al.: A pipelined architecture for distributed text query evaluation. Information Retrieval, published on-line (2006)

    Google Scholar 

  19. Moffat, A., Zobel, J.: What does it mean to “measure performance”? In: Zhou, X., et al. (eds.) WISE 2004. LNCS, vol. 3306, pp. 1–12. Springer, Heidelberg (2004)

    Google Scholar 

  20. Ounis, I., et al.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of ACM SIGIR’06 Workshop on Open Source Information Retrieval, ACM Press, New York (2006)

    Google Scholar 

  21. Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: Proc. 3rd ACM Conf. on Digital Libraries, pp. 182–190. ACM Press, New York (1998)

    Chapter  Google Scholar 

  22. Spink, A., et al.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3), 107–109 (1998)

    Google Scholar 

  23. Tomasic, A., Garcia-Molina, H.: Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In: Proc. 2nd Inter. Conf. on Parallel and Distributed Info. Systems, San Diego, California, pp. 8–17. IEEE Computer Society Press, Los Alamitos (1993)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I. (2007). Performance Comparison of Clustered and Replicated Information Retrieval Systems. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics