Abstract
Rich Internet Applications (RIAs) have been widely used in the web over the last decade as they were found to be responsive and user friendly compared to traditional web applications. Distributed RIA crawling has been introduced with the aim of decreasing the crawling time due to the large size of RIAs. However, the current RIA crawling systems do not allow for tolerance to failures that occur in one of their components. In this paper, we address the resilience problem when crawling RIAs in a distributed environment and we introduce an efficient RIA crawling system that is fault tolerant. Our approach is to partition the RIA model that results from the crawling over several storage devices in a peer-to-peer (P2P) network. This makes the distributed data structure invulnerable to the single point of failure. We introduce three data recovery mechanisms for crawling RIAs in an unreliable environment: The Retry, the Redundancy and the Combined mechanisms. We evaluate the performance of the recovery mechanisms and their impact on the crawling performance through analytical reasoning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)
Ratnasamy, S., et al.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM (2001)
Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Schollmeier, R.: A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing, Linkping, Sweden (2001)
Stoica, I., et al.: Chord: a scalable peer-to-peer look-up service for internet applications. In: Proceedings of ACM SIGCOMM, San Diego, California, USA (2001)
Cho, J., Garcia-Molina, H.: Parallel crawlers. In: Proceedings of the 11th International Conference on World Wide Web, WWW, vol. 2 (2002)
Fiat, A., Saia, J.: Censorship resistant peer-to-peer content addressable networks. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, Pennsylvania, USA, pp. 94–103 (2002)
Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the evolution of peer-to-peer systems. In: Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, pp. 233–242 (2002)
Shkapenyuk, V., Suel, T.: Design and implementation of a high performance distributed Web crawler. In: Proceedings of the 18th International Conference on Data Engineering (2002)
Hwang, S., Kesselman, C.: A flexible framework for fault tolerance in the grid. J. Grid Comput. 1, 251–272 (2003)
Boldi, P., et al.: UbiCrawler: a scalable fully distributed Web crawler. Softw. Pract. Exp. 34, 711–726 (2004)
Zhao, Y., et al.: Tapestry: a resilient global-scale overlay for service deployment. In: IEEE J. Sel. Areas Commun. (2004)
Paulson, L.D.: Building rich web applications with Ajax. Computer 38, 14–17. IEEE Computer Society (2005)
Li, X., Misra, J., Plaxton, C.G.: Concurrent maintenance of rings. In: proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 126–148 (2006)
Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Moosavi, A., Von Bochmann, G., Jourdan, G.V., Onut, I.V.: Crawling rich internet applications: the state of the art. In: Conference of the Center for Advanced Studies on Collaborative Research, Markham, Ontario, Canada, pp. 146–160 (2012)
Peng, Z., et al.: Graph-based AJAX crawl: mining data from rich internet applications. In: Proceedings of the International Conference on Computer Science and Electronic Engineering, pp. 590–594 (2012)
Mirtaheri, S.M., Von Bochmann, G., Jourdan, G.V., Onut, I.V.: GDist-RIA crawler: a greedy distributed crawler for rich internet applications. In: Noubir, G., Raynal, M. (eds.) NETYS 2014. LNCS, vol. 8593, pp. 200–214. Springer, Heidelberg (2014)
Mirtaheri, S.M., Bochmann, G.V., Jourdan, G.-V., Onut, I.V.: PDist-RIA crawler: a peer-to-peer distributed crawler for rich internet applications. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 365–380. Springer, Heidelberg (2014)
Ben Hafaiedh, K., Von Bochmann, G., Jourdan, G.V., Onut, I.V.: A scalable peer-to-peer RIA crawling system with partial knowledge. In: Noubir, G., Raynal, M. (eds.) NETYS 2014. LNCS, vol. 8593, pp. 185–199. Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Ben Hafaiedh, K., von Bochmann, G., Jourdan, GV., Onut, I.V. (2016). Fault Tolerant P2P RIA Crawling. In: Abdulla, P., Delporte-Gallet, C. (eds) Networked Systems. NETYS 2016. Lecture Notes in Computer Science(), vol 9944. Springer, Cham. https://doi.org/10.1007/978-3-319-46140-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-46140-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46139-7
Online ISBN: 978-3-319-46140-3
eBook Packages: Computer ScienceComputer Science (R0)