ABSTRACT
We study the problem of graph tracking with limited information. In this paper, we focus on updating a social graph snapshot. Say we have an existing partial snapshot, G1, of the social graph stored at some system. Over time G1 becomes out of date. We want to update G1 through a public API to the actual graph, restricted by the number of API calls allowed. Periodically recrawling every node in the snapshot is prohibitively expensive. We propose a scheme where we exploit indegrees and outdegrees to discover changes to the actual graph. When there is ambiguity, we probe the graph and verify edges. We propose a novel strategy designed for limited information that can be adapted to different levels of staleness. We evaluate our strategy against recrawling on real datasets and show that it saves an order of magnitude of API calls while introducing minimal errors.
- https://plus.google.com.Google Scholar
- https://dev.twitter.com/docs/api/.Google Scholar
- https://snap.stanford.edu/data/.Google Scholar
- S. A. Catanese, P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti. Crawling facebook for social network analysis purposes. In Proceedings of the international conference on web intelligence, mining and semantics, page 52. ACM, 2011. Google ScholarDigital Library
- D. H. Chau, S. Pandit, S. Wang, and C. Faloutsos. Parallel crawling for online social networks. In Proceedings of the 16th international conference on World Wide Web, pages 1283--1284. ACM, 2007. Google ScholarDigital Library
- J. Cho and H. Garcia-Molina. Effective page refresh policies for web crawlers. ACM Trans. Database Syst., 28(4):390--426, Dec. 2003. Google ScholarDigital Library
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In INFOCOM, 2010 Proceedings IEEE, pages 1--9. IEEE, 2010. Google ScholarDigital Library
- M. Halldórsson and J. Radhakrishnan. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, STOC '94, pages 439--448, New York, NY, USA, 1994. ACM. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 591--600, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- G. Magno, G. Comarela, D. Saez-Trumper, M. Cha, and V. Almeida. New kid on the block: Exploring the google social graph. In Proceedings of the 2012 ACM Conference on Internet Measurement Conference, IMC '12, pages 159--170, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- M. Mondal, B. Viswanath, A. Clement, P. Druschel, K. P. Gummadi, A. Mislove, and A. Post. Defending against large-scale crawls in online social networks. In Proceedings of the 8th international conference on Emerging networking experiments and technologies, pages 325--336. ACM, 2012. Google ScholarDigital Library
- D. Schiöberg, F. Schneider, H. Schiöberg, S. Schmid, S. Uhlig, and A. Feldmann. Tracing the birth of an osn: Social graph and profile analysis in google. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, pages 265--274, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- N. Vesdapunt. Entity resolution and tracking on social networks (phd thesis). Technical report, Stanford University, {Online} http://ilpubs.stanford.edu:8090/1144/.Google Scholar
- N. Vesdapunt and H. Garcia-Molina. Identifying users in social networks with limited information. In J. Gehrke, W. Lehner, K. Shim, S. K. Cha, and G. M. Lohman, editors, 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pages 627--638. IEEE, 2015.Google ScholarCross Ref
- C. Wilson, A. Sala, J. Bonneau, R. Zablit, and B. Y. Zhao. Don't tread on me: Moderating access to osn data with spikestrip. In Proceedings of the 3rd ACM SIGCOMM Workshop on Social Networks (WOSN'10), 2010. Google ScholarDigital Library
- S. Ye, J. Lang, and F. Wu. Crawling online social graphs. In Web Conference (APWEB), 2010 12th International Asia-Pacific, pages 236--242. IEEE, 2010. Google ScholarDigital Library
Index Terms
- Updating an Existing Social Graph Snapshot via a Limited API
Recommendations
LSGraph: A Locality-centric High-performance Streaming Graph Engine
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsStreaming graph has been broadly employed across various application domains. It involves updating edges to the graph and then performing analytics on the updated graph. However, existing solutions either suffer from poor data locality and high ...
Brief Announcement: Non-blocking Dynamic Unbounded Graphs with Wait-Free Snapshot
Stabilization, Safety, and Security of Distributed SystemsAbstractIn this paper, we have implemented a dynamic unbounded concurrent graph which can perform the add, delete or lookup operations on vertices and edges concurrently and are linearizable. In addition to these operation s, we also have a wait-free ...
Comments