Abstract
One of the distributed solutions for scaling Web Search Engines (WSEs) may be peer-to-peer (P2P) structures. P2P structures are successfully being used in many systems with lower cost than ordinary distributed solutions. However, the fact that they can also be beneficial for large-scale WSEs is still a controversial subject. In this paper, we introduce challenges in using P2P structures to design a large-scale WSE. Considering different types of P2P systems, we introduce possible P2P models for this purpose. Using some quantitative evaluation, we compare these models from different aspects to find out which one is the best in order to construct a large-scale WSE. Our studies indicate that traditional P2P structures are not good choices in this area and the best model may be the use of a special case of Super-Peer Networks which is yet conditioned on the peers’ active and trustful contributions.
You can find the complete version of this paper in the first author’s website.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brewington, B.E., Cybenko, G.: How dynamic is the Web? In: Procs of 9th International World-Wide Web Conference (May 2000)
Cyveillance. Sizing the internet. White paper (July 2000), http://www.cyveillance.com/
Lyman, P., Varian, H.R., Charles, P., Good, N., Jordan, L.L., Pal, J.: How much information? (2003)
Li, J., Loo, B.T., Hellerstein, J., Kaashoek, F., Karger, D.R., Morris, R.: On the feasibility of P2P Web indexing and search. In: Procs of the 2nd Int. Workshop on P2P Systems (2003)
Ye, S., Lu, G., Li, X.: Workload-aware Web crawling and server workload detection. In: Network Research Workshop, 18th Asian Pacific Advanced Network Meeting (July 2004)
Cho, J., Garcia-Molina, H.: The evolution of the Web and implications for an incremental crawler. In: Procs of 26th International Conference on VLDB, Cairo, Egypt, pp. 200–209 (2000)
Wu, L.S., Akavipat, R., Menczer, F.: 6S: Distributing crawling and searching across Web peers. Web Technologies, Applications, and Services, pp. 159–164 (2005)
Papapetrou, O., Samaras, G.: Distributed location aware Web crawling. WWW (Alternate Track Papers & Posters,) pp. 468–469 (2004)
Wang, Y., DeWitt, D.: Computing PageRank in a distributed internet search system. In: Procs of the International Conference on Very Large Databases (August 2004)
Suel, T., Mathur, C., Wu, J., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasunderam, K.: Odissea: A peer-to-peer architecture for scalable Web search and information retrieval. Technical Report, Polytechnic University (2003)
Sankaralingam, K., Sethumadhavan, S., Browne, J.C.: Distributed PageRank for p2p systems. In: Procs of the 12th IEEE International Symposium on High Performance Distributed Computing, Seattle, Washington, USA (June 2003)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Procs of the 7th World Wide Web Conference, vol. 30(1/7), pp. 107–117 (1998)
Mousavi, H., Rafiei, M., Movaghar, A.: Characterizing the Web Using a New Uniform Sam-pling Approach. In: Procs. of Comsware 2007, India (2007)
Tang, C., Xu, Z., Mahalingam, M.: pSearch: Information retrieval in structured overlays. In: First Workshop on Hot Topics in Networks (HotNets I), Princeton, NJ (October 2002)
Dikaiakos, M., Stassopoulou, A., Papageorgiou, L.: An investigation of Web crawler behavior: characterization and metrics. Computer Communications 28(8), 880–897 (2005)
Gulli, A., Signorini, A.: The indexable Web is more than 11.5 billion pages. In: WWW (Special interest tracks and posters), pp. 902–903 (2005)
Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a very large Web search engine query log. SIGIR Forum 33(1), 6–12 (1999)
Balke, W.T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Procs. of 21st Int. Conf. on Data Engineering, Tokyo (2005)
Craswell, N., Crimmins, F., Hawking, D., Moffat, A.: Performance and cost tradeoffs in Web search. In: Procs. of the Australasian Database Conference ADC 2004 (2004)
The Search Engine Watch Website, http://www.searchenginewatch.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mousavi, H., Movaghar, A. (2008). Challenges in Using Peer-to-Peer Structures in Order to Design a Large-Scale Web Search Engine. In: Sarbazi-Azad, H., Parhami, B., Miremadi, SG., Hessabi, S. (eds) Advances in Computer Science and Engineering. CSICC 2008. Communications in Computer and Information Science, vol 6. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89985-3_57
Download citation
DOI: https://doi.org/10.1007/978-3-540-89985-3_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89984-6
Online ISBN: 978-3-540-89985-3
eBook Packages: Computer ScienceComputer Science (R0)