skip to main content
10.1145/2020408.2020580acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

GBASE: a scalable and general graph management system

Published: 21 August 2011 Publication History

Abstract

Graphs appear in numerous applications including cyber-security, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MapReduce/Hadoop. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.

References

[1]
Hadoop information. http://hadoop.apache.org/.
[2]
Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. Column oriented database systems. PVLDB, 2009.
[3]
Daniel J. Abadi, Samuel Madden, and Nabil Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD Conference, pages 967--980, 2008.
[4]
Louigi Addario-Berry, W. Sean Kennedy, Andrew D. King, Zhentao Li, and Bruce A. Reed. Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discrete Applied Mathematics, 158(7):765--770, 2010.
[5]
Charu C. Aggarwal, Yan Xie, and Philip S. Yu. Gconnect: A connectivity index for massive disk-resident graphs. PVLDB, 2(1):862--873, 2009.
[6]
Leman Akoglu, Mary McGlohon, and Christos Faloutsos. oddball: Spotting anomalies in weighted graphs. In PAKDD (2), pages 410--421, 2010.
[7]
I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and A. Vespignani. k-core decompositions: A tool for the visualization of large scale networks. http://arxiv.org/abs/cs.NI/0504107.
[8]
Periklis Andritsos, Renée J. Miller, and Panayiotis Tsaparas. Information-theoretic tools for mining database structure from large data sets. In SIGMOD, 2004.
[9]
David A. Bader, Shiva Kintali, Kamesh Madduri, and Milena Mihail. Approximating betweenness centrality. WAW, 2007.
[10]
Paolo Boldi and Sebastiano Vigna. The webgraph framework i: compression techniques. In WWW, 2004.
[11]
Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: easy and efficient parallel processing of massive data sets. VLDB, 2008.
[12]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. On compressing social networks. In KDD, 2009.
[13]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, December 2004.
[14]
P. Erdos and A. Rényi. On random graphs. Publicationes Mathematicae, 6:290--297, 1959.
[15]
Robert L. Grossman and Yunhong Gu. Data mining using high performance data clouds: experimental studies using sector and sphere. KDD, 2008.
[16]
Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter A. Boncz. Positional update handling in column stores. In SIGMOD, 2010.
[17]
Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD Conference, pages 987--994, 2009.
[18]
Milena Ivanova, Martin L. Kersten, Niels J. Nes, and Romulo Goncalves. An architecture for recycling intermediates in a column-store. In SIGMOD, 2009.
[19]
U Kang, C.E Tsourakakis, Ana Paula Appel, C Faloutsos, and Jure Leskovec. Radius plots for mining tera-byte scale graphs: Algorithms, patterns, and observations. SIAM International Conference on Data Mining, 2010.
[20]
U Kang, C.E Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. ICDM, 2009.
[21]
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999.
[22]
George Karypis and Vipin Kumar. Multilevel k-way hypergraph partitioning. In DAC, pages 343--348, 1999.
[23]
Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, and Christos Faloutsos. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. PKDD, pages 133--145, 2005.
[24]
Ching-Yung Lin, Nan Cao, Shixia Liu, Spiros Papadimitriou, Jimeng Sun, and Xifeng Yan. Smallblue: Social network analysis for expertise search and collective intelligence. In ICDE, pages 1483--1486, 2009.
[25]
Chao Liu, Fan Guo, and Christos Faloutsos. Bbm: bayesian browsing model from petabyte-scale data. In KDD, 2009.
[26]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135--146, 2010.
[27]
Hossein Maserrat and Jian Pei. Neighbor query friendly compression of social networks. In KDD, 2010.
[28]
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008.
[29]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[30]
Spiros Papadimitriou and Jimeng Sun. Disco: Distributed co-clustering with map-reduce. ICDM, 2008.
[31]
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 2005.
[32]
Purnamrita Sarkar and Andrew W. Moore. Fast nearest-neighbor search in disk-resident graphs. In KDD, pages 513--522, 2010.
[33]
Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. C-store: A column-oriented dbms. In VLDB, 2005.
[34]
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM, pages 418--425, 2005.
[35]
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast random walk with restart and its applications. In ICDM, pages 613--622, 2006.
[36]
Silke Trißl and Ulf Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD, 2007.
[37]
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005.
[38]
Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433--444, 2008.
[39]
Peixiang Zhao, Jeffrey Xu Yu, and Philip S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, 2007.

Cited By

View all

Index Terms

  1. GBASE: a scalable and general graph management system

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. compression
    2. distributed computing
    3. graph
    4. indexing

    Qualifiers

    • Poster

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
    • (2023)Hierarchical data structures for flowchartScientific Reports10.1038/s41598-023-31968-z13:1Online publication date: 9-Apr-2023
    • (2022)iRun: Horizontal and Vertical Shape of a Region-Based Graph CompressionSensors10.3390/s2224989422:24(9894)Online publication date: 15-Dec-2022
    • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
    • (2022)Schema Formalism for Semantic Summary Based on Labeled Graph from Heterogeneous DataRecent Challenges in Intelligent Information and Database Systems10.1007/978-981-19-8234-7_3(27-44)Online publication date: 24-Nov-2022
    • (2022)Graph Processing FrameworksEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_283-2(1-11)Online publication date: 10-Mar-2022
    • (2020)FlexGraph: Flexible partitioning and storage for scalable graph miningPLOS ONE10.1371/journal.pone.022703215:1(e0227032)Online publication date: 24-Jan-2020
    • (2020)GraphOneACM Transactions on Storage10.1145/336418015:4(1-40)Online publication date: 16-Jan-2020
    • (2020)A Hybrid Update Strategy for I/O-Efficient Out-of-Core Graph ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.297314331:8(1767-1782)Online publication date: 1-Aug-2020
    • (2020)Distributed Hypergraph Processing Using Intersection GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3022014(1-1)Online publication date: 2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media