ABSTRACT
Graph has been a ubiquitous and essential data representation to model real world objects and their relationships. Today, large amounts of graph data have been generated by various applications. Graph summarization techniques are crucial in uncovering useful insights about the patterns hidden in the underlying data. However, all existing works in graph summarization are single-process solutions, and as a result cannot scale to large graphs. In this paper, we introduce three distributed graph summarization algorithms to address this problem. Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes. To the best of our knowledge, this is the first work to study distributed graph summarization methods.
- D. A. Bader and K. Madduri. Gtgraph: A suite of synthetic graph generators. www.cse.psu.edu/~madduri/software/GTgraph.Google Scholar
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC '98, pages 327--336. Google ScholarDigital Library
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM'04.Google Scholar
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC'98, pages 604--613. Google ScholarDigital Library
- J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce, volume 3. 2010. Google ScholarDigital Library
- X. Liu, Q. He, Y. Tian, W. Lee, J. McPherson, and J. Han. Event-based social networks: Linking the online and offline social worlds. In SIGKDD'12. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. PVLDB, 5(8):716--727, 2012. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD'10, pages 135--146. Google ScholarDigital Library
- S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD'08, pages 567--580. Google ScholarDigital Library
- J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD'13, 2013. Google ScholarDigital Library
- Y. Tian, R. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD'08, pages 419--432. Google ScholarDigital Library
- N. Zhang, Y. Tian, and J. M. Patel. Discovery-driven graph summarization. In ICDE'10, pages 880--891.Google Scholar
Index Terms
- Distributed Graph Summarization
Recommendations
Graph Summarization Methods and Applications: A Survey
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus ...
Distributed temporal graph analytics with GRADOOP
AbstractTemporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, ...
Big Graph Processing Systems: State-of-the-Art and Open Challenges
BIGDATASERVICE '15: Proceedings of the 2015 IEEE First International Conference on Big Data Computing Service and ApplicationsGraph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, ...
Comments