Skip to main content

Fault Tolerant P2P RIA Crawling

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9944))

Abstract

Rich Internet Applications (RIAs) have been widely used in the web over the last decade as they were found to be responsive and user friendly compared to traditional web applications. Distributed RIA crawling has been introduced with the aim of decreasing the crawling time due to the large size of RIAs. However, the current RIA crawling systems do not allow for tolerance to failures that occur in one of their components. In this paper, we address the resilience problem when crawling RIAs in a distributed environment and we introduce an efficient RIA crawling system that is fault tolerant. Our approach is to partition the RIA model that results from the crawling over several storage devices in a peer-to-peer (P2P) network. This makes the distributed data structure invulnerable to the single point of failure. We introduce three data recovery mechanisms for crawling RIAs in an unreliable environment: The Retry, the Redundancy and the Combined mechanisms. We evaluate the performance of the recovery mechanisms and their impact on the crawling performance through analytical reasoning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.alari.ch/people/derino/apps/bebop/index.php/ (Local version: http://ssrg.eecs.uottawa.ca/bebop/).

References

  1. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ratnasamy, S., et al.: A scalable content-addressable network. In: Proceedings of ACM SIGCOMM (2001)

    Google Scholar 

  3. Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  4. Schollmeier, R.: A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing, Linkping, Sweden (2001)

    Google Scholar 

  5. Stoica, I., et al.: Chord: a scalable peer-to-peer look-up service for internet applications. In: Proceedings of ACM SIGCOMM, San Diego, California, USA (2001)

    Google Scholar 

  6. Cho, J., Garcia-Molina, H.: Parallel crawlers. In: Proceedings of the 11th International Conference on World Wide Web, WWW, vol. 2 (2002)

    Google Scholar 

  7. Fiat, A., Saia, J.: Censorship resistant peer-to-peer content addressable networks. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, Pennsylvania, USA, pp. 94–103 (2002)

    Google Scholar 

  8. Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the evolution of peer-to-peer systems. In: Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, pp. 233–242 (2002)

    Google Scholar 

  9. Shkapenyuk, V., Suel, T.: Design and implementation of a high performance distributed Web crawler. In: Proceedings of the 18th International Conference on Data Engineering (2002)

    Google Scholar 

  10. Hwang, S., Kesselman, C.: A flexible framework for fault tolerance in the grid. J. Grid Comput. 1, 251–272 (2003)

    Article  MATH  Google Scholar 

  11. Boldi, P., et al.: UbiCrawler: a scalable fully distributed Web crawler. Softw. Pract. Exp. 34, 711–726 (2004)

    Article  Google Scholar 

  12. Zhao, Y., et al.: Tapestry: a resilient global-scale overlay for service deployment. In: IEEE J. Sel. Areas Commun. (2004)

    Google Scholar 

  13. Paulson, L.D.: Building rich web applications with Ajax. Computer 38, 14–17. IEEE Computer Society (2005)

    Google Scholar 

  14. Li, X., Misra, J., Plaxton, C.G.: Concurrent maintenance of rings. In: proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 126–148 (2006)

    Google Scholar 

  15. Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Moosavi, A., Von Bochmann, G., Jourdan, G.V., Onut, I.V.: Crawling rich internet applications: the state of the art. In: Conference of the Center for Advanced Studies on Collaborative Research, Markham, Ontario, Canada, pp. 146–160 (2012)

    Google Scholar 

  16. Peng, Z., et al.: Graph-based AJAX crawl: mining data from rich internet applications. In: Proceedings of the International Conference on Computer Science and Electronic Engineering, pp. 590–594 (2012)

    Google Scholar 

  17. Mirtaheri, S.M., Von Bochmann, G., Jourdan, G.V., Onut, I.V.: GDist-RIA crawler: a greedy distributed crawler for rich internet applications. In: Noubir, G., Raynal, M. (eds.) NETYS 2014. LNCS, vol. 8593, pp. 200–214. Springer, Heidelberg (2014)

    Google Scholar 

  18. Mirtaheri, S.M., Bochmann, G.V., Jourdan, G.-V., Onut, I.V.: PDist-RIA crawler: a peer-to-peer distributed crawler for rich internet applications. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 365–380. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  19. Ben Hafaiedh, K., Von Bochmann, G., Jourdan, G.V., Onut, I.V.: A scalable peer-to-peer RIA crawling system with partial knowledge. In: Noubir, G., Raynal, M. (eds.) NETYS 2014. LNCS, vol. 8593, pp. 185–199. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Ben Hafaiedh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ben Hafaiedh, K., von Bochmann, G., Jourdan, GV., Onut, I.V. (2016). Fault Tolerant P2P RIA Crawling. In: Abdulla, P., Delporte-Gallet, C. (eds) Networked Systems. NETYS 2016. Lecture Notes in Computer Science(), vol 9944. Springer, Cham. https://doi.org/10.1007/978-3-319-46140-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46140-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46139-7

  • Online ISBN: 978-3-319-46140-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics