Skip to main content

A Hybrid Distributed Architecture for Indexing

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5714))

Included in the following conference series:

  • 1687 Accesses

Abstract

This paper presents a hybrid scavenger grid as an underlying hardware architecture for search services within digital libraries. The hybrid scavenger grid consists of both dedicated servers and dynamic resources in the form of idle workstations to handle medium- to large-scale search engine workloads. The dedicated resources are expected to have reliable and predictable behaviour. The dynamic resources are used opportunistically without any guarantees of availability. Test results confirmed that indexing performance is directly related to the size of the hybrid grid and intranet networking does not play a major role. A system-efficiency and cost-effectiveness comparison of a grid and a multiprocessor machine showed that for workloads of modest to large sizes, the grid architecture delivers better throughput per unit cost than the multiprocessor, at a system efficiency that is comparable to that of the multiprocessor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Asaduzzaman, S.: Managing Opportunistic and Dedicated Resources in a Bi-modal Service Deployment Architecture. PhD thesis. McGill University (2007)

    Google Scholar 

  2. Badue, C., Golgher, P., Barbosa, R., Ribeiro-Neto, B., Ziviani, N.: Distributed processing of conjunctive queries. In: Heterogeneous and Distributed IR workshop at the 28th ACM SIGIR Salvador,Brazil (2005)

    Google Scholar 

  3. Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The Google Cluster Architecture. IEEE Micro. 23(2), 22–28 (2003)

    Article  Google Scholar 

  4. Baru, C.K., Moore, R.W., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative Research,Toronto, Canada (1998)

    Google Scholar 

  5. Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges on distributed web retrieval. In: ICDE, Istanbul, Turkey, pp. 6–20. IEEE, Los Alamitos (2007)

    Google Scholar 

  6. Computerworld Inc. Storage power costs to approach $2B this year (2009), http://www.computerworld.com

  7. Das, S., Tewari, S., Kleinrock, L.: The case for servers in a peer-to-peer world. In: Proceedings of IEEE International Conference on Communications, Istanbul, Turkey (2006)

    Google Scholar 

  8. EPrints. Open access and institutional repositories with EPrints (2009), http://www.eprints.org/

  9. FAST. FAST enterprise search (2008), http://www.fastsearch.com

  10. FightAIDS@Home. Fight AIDS at Home (2008), http://fightaidsathome.scripps.edu/

  11. Intel Cooporation. Intel processor pricing (2009), http://www.intc.com/priceList.cfm

  12. Google. The Google Insights for Search (2008), http://www.google.com/insights/search/

  13. Google. The Google search appliance (2008), http://www.google.com/enterprise/index.html

  14. Hadoop. Apache Hadoop (2008), http://hadoop.apache.org/

  15. Litzkow, M., Livny, M.: Experience with the condor distributed batch system. In: Proceedings of the IEEE Workshop on Experimental Distributed Systems (1990)

    Google Scholar 

  16. Lucene. Lucence search engine (2008), http://lucene.apache.org/

  17. Meij, E., Rijke, M.: Deploying Lucene on the grid. In: Open Source Information Retrieval Workshop at the 29th ACM Conference on Research and Development on Information Retrieval, Seattle, Washington (2006)

    Google Scholar 

  18. Michel, S., Triantafillou, P., Weikum, G.: MINERVA: a scalable efficient peer-to-peer search engine. In: Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware. Grenoble, Greece (2005)

    Google Scholar 

  19. OmniFind. OmniFind search engine (2008), http://www-306.ibm.com/software/data/enterprise-search/omnifind-yahoo

  20. Pouwelse, J.A., Garbacki, P., Epema, D.H.J., Sips, H.J.: The bittorrent p2p file-sharing system: Measurements and analysis. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 205–216. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. SETI@Home. Search for extraterrestrial intelligence at home (2007), http://setiathome.berkeley.edu/

  22. Wood, D.A., Hill, M.D.: Cost-effective parallel computing. IEEE Computer 28, 69–72 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nakashole, N., Suleman, H. (2009). A Hybrid Distributed Architecture for Indexing. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04346-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04345-1

  • Online ISBN: 978-3-642-04346-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics