research-article

Speedup Graph Processing by Graph Ordering

Authors:
Hao Wei

The Chinese University of Hong Kong, Hong Kong, Hong Kong

The Chinese University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Jeffrey Xu Yu

The Chinese University of Hong Kong, Hong Kong, Hong Kong

The Chinese University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Can Lu

The Chinese University of Hong Kong, Hong Kong, Hong Kong

The Chinese University of Hong Kong, Hong Kong, Hong Kong
View Profile

,
Xuemin Lin

The University of New South Wales, Sydney, Australia

The University of New South Wales, Sydney, Australia
View Profile

SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016Pages 1813–1828https://doi.org/10.1145/2882903.2915220

Published:26 June 2016Publication History

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 1813–1828

ABSTRACT

The CPU cache performance is one of the key issues to efficiency in database systems. It is reported that cache miss latency takes a half of the execution time in database systems. To improve the CPU cache performance, there are studies to support searching including cache-oblivious, and cache-conscious trees. In this paper, we focus on CPU speedup for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms. The approaches dealing with trees are not applicable to graphs which are complex in nature.

In this paper, we explore a general approach to speed up CPU computing, in order to further enhance the efficiency of the graph algorithms without changing the graph algorithms (implementations) and the data structures used. That is, we aim at designing a general solution that is not for a specific graph algorithm, neither for a specific data structure.

The approach studied in this work is graph ordering, which is to find the optimal permutation among all nodes in a given graph by keeping nodes that will be frequently accessed together locally, to minimize the CPU cache miss ratio.

We prove the graph ordering problem is NP-hard, and give a basic algorithm with a bounded approximation. To improve the time complexity of the basic algorithm, we further propose a new algorithm to reduce the time complexity and improve the efficiency with new optimization techniques based on a new data structure.

We conducted extensive experiments to evaluate our approach in comparison with other 9 possible graph orderings (such as the one obtained by METIS) using 8 large real graphs and 9 representative graph algorithms. We confirm that our approach can achieve high performance by reducing the CPU cache miss ratios.

References

The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley Longman Publishing Co., Inc., 2002. Google ScholarDigital Library
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In Proc. of VLDB'99, 1999. Google ScholarDigital Library
L. Auroux, M. Burelle, and R. Erra. Reordering very large graphs for fun & profit. In International Symposium on Web AlGorithms, 2015.Google Scholar
J. Banerjee, W. Kim, S. Kim, and J. F. Garza. Clustering a DAG for CAD databases. IEEE Trans. Software Eng., 1988. Google ScholarDigital Library
A. I. Barvinok, D. S. Johnson, G. J. Woeginger, and R. Woodroofe. The maximum traveling salesman problem under polyhedral norms. In Proc. of IPCO'98, 1998.Google ScholarCross Ref
V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. CoRR, cs.DS/0310049, 2003.Google Scholar
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In Proc. of WWW'11, 2011. Google ScholarDigital Library
P. Boldi, M. Santini, and S. Vigna. Permuting web graphs. In Proc. of WAW'09, 2009. Google ScholarDigital Library
S. Borkar, P. Dubey, K. Kahn, D. Kuck, H. Mulder, S. Pawlowski, and J. Rattner. Platform 2015: Intel processor and platform evolution for the next decade. Technology, 2005.Google Scholar
L. Chang, J. X. Yu, L. Qin, X. Lin, C. Liu, and W. Liang. Efficiently computing k-edge connected components via graph decomposition. In Proc. of SIGMOD'13, 2013. Google ScholarDigital Library
M. Charikar, M. T. Hajiaghayi, H. Karloff, and S. Rao. l2 spreading metrics for vertex ordering problems. In Proc. of SODA'06, 2006. Google ScholarDigital Library
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. In Proc. of ICDE'04, 2004. Google ScholarDigital Library
S. Chen, P. B. Gibbons, and T. C. Mowry. Improving index performance through prefetching. In Proc. of SIGMOD'01, 2001. Google ScholarDigital Library
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In Proc. of KDD'09, 2009. Google ScholarDigital Library
T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of PLDI, Atlanta, Georgia, USA, 1999. Google ScholarDigital Library
P. Z. Chinn, J. Chvatalova, A. K. Dewdney, and N. E. Gibbs. The bandwidth problem for graphs and matrices - a survey. Journal of Graph Theory, 6(3), 1982.Google ScholarCross Ref
V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of operations research, 4(3), 1979. Google ScholarDigital Library
J. Cieslewicz and K. Ross. Database optimizations for modern hardware. Proc of the IEEE, 96(5), 2008.Google ScholarCross Ref
E. Cockayne. Domination of undirected graphs - a survey. In Theory and Applications of Graphs, pages 141--147. Springer, 1978.Google ScholarCross Ref
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press Cambridge, 2 edition, 2001. Google ScholarDigital Library
M. Fisher, G. Nemhauser, and L. Wolsey. An analysis of approximations for finding a maximum weight hamiltonian circuit. Operations Research, 27(4), 1979. Google ScholarDigital Library
A. George and J. W. Liu. Computer solution of large sparse positive definite. 1981. Google ScholarDigital Library
A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y.-K. Chen, and P. Dubey. Cache-conscious frequent pattern mining on a modern processor. In Proc. of VLDB'05, 2005. Google ScholarDigital Library
N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, V. Sekar, and D. Song. Evolution of social-attribute networks: measurements, modeling, and implications using google+. In Proc. of IMC'12, 2012. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI Hollywood, CA, USA, 2012. Google ScholarDigital Library
C. M. Grinstead and J. L. Snell. Introduction to probability. American Mathematical Soc., 2012.Google Scholar
L. H. Harper. Optimal assignments of numbers to vertices. Journal of the Society for Industrial and Applied Mathematics, 1964.Google ScholarCross Ref
R. Hassin and S. Rubinstein. An approximation algorithm for the maximum traveling salesman problem. Inf. Process. Lett., 67(3), 1998. Google ScholarDigital Library
U. Kang and C. Faloutsos. Beyond 'caveman communities': Hubs and spokes for graph compression and mining. In Proc. of ICDM'11, 2011. Google ScholarDigital Library
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput., 48(1), 1998. Google ScholarDigital Library
M. G. Kendall. A new measure of rank correlation. Biometrika, 1938.Google Scholar
Y. Koren and D. Harel. A multi-scale algorithm for the linear arrangement problem. In Graph-Theoretic Concepts in Computer Science. Springer, 2002. Google ScholarDigital Library
A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a PC. In OSDI Hollywood, CA, USA, 2012. Google ScholarDigital Library
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In Proc. of WWW'08, 2008. Google ScholarDigital Library
P. Lindstrom and D. Rajan. Optimal hierarchical layouts for cache-oblivious search trees. In Proc. of ICDE'14, 2014.Google ScholarCross Ref
C.-K. Luk and T. C. Mowry. Compiler-based prefetching for recursive data structures. In Proc. of ASPLOS'96, 1996. Google ScholarDigital Library
The Apache Software Foundation. Giraph website. http://giraph.apache.org.Google Scholar
A. O. Mendelzon and C. G. Mendioroz. Graph clustering and caching. In Computer Science 2. Springer, 1994. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.Google Scholar
J. Park, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. IEEE Trans. Parallel Distrib. Syst., 15(9), 2004. Google ScholarDigital Library
J. Petit. Experiments on the minimum linear arrangement problem. Journal of Experimental Algorithmics (JEA), 2003. Google ScholarDigital Library
J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. In Proc. of VLDB'99, 1999. Google ScholarDigital Library
J. Rao and K. A. Ross. Making b+- trees cache conscious in main memory. In Proc. of SIGMOD'00, 2000. Google ScholarDigital Library
I. Safro, D. Ron, and A. Brandt. Multilevel algorithms for linear ordering problems. Journal of Experimental Algorithmics (JEA), 2009. Google ScholarDigital Library
I. Safro and B. Temkin. Multiscale approach for the network compression-friendly ordering. Journal of Discrete Algorithms, 2011. Google ScholarDigital Library
A. I. Serdyukov. An algorithm with an estimate for the traveling salesman problem of the maximum. Upravlyaemye Sistemy, 25:80--86, 1984.Google Scholar
Y. Shao, B. Cui, and L. Ma. PAGE: A partition aware engine for parallel graph computation. IEEE Trans. Knowl. Data Eng., 27(2), 2015.Google ScholarCross Ref
M. Sharir. A strong-connectivity algorithm and its applications in data flow analysis. Computers & Mathematics with Applications, 7(1), 1981.Google Scholar
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of KDD'12, 2012. Google ScholarDigital Library
M. Then, M. Kaufmann, F. Chirigati, T.-A. Hoang-Vu, K. Pham, A. Kemper, T. Neumann, and H. T. Vo. The more the merrier: Efficient multi-source graph traversal. PVLDB, 8(4), 2014. Google ScholarDigital Library
Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think like a vertex" to "think like a graph". PVLDB, 7(3), 2013. Google ScholarDigital Library
W. Xie, G. Wang, D. Bindel, A. Demers, and J. Gehrke. Fast iterative graph computation with block updates. PVLDB, 6(14), 2013. Google ScholarDigital Library
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14), 2014. Google ScholarDigital Library

Index Terms

Speedup Graph Processing by Graph Ordering

Recommendations

Line (block) size choice for CPU cache memories

The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we ...
Read More
Accelerating Depth-First Traversal by Graph Ordering
SSDBM '21: Proceedings of the 33rd International Conference on Scientific and Statistical Database Management

Cache efficiency is an important factor in the performance of graph processing due to the irregular memory access patterns caused by the sparse nature of graphs. To increase the cache hit rate, prior studies proposed a variety of preprocessing ...
Read More
Finding a chain graph in a bipartite permutation graph

We present a polynomial-time algorithm for solving Subgraph Isomorphism where the base graphs are bipartite permutation graphs and the pattern graphs are chain graphs. Subgraph Isomorphism is studied on graph classes.A polynomial-time algorithm is given ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CPU performance
graph algorithms
graph ordering
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 99
  Total Citations
  View Citations
- 2,435
  Total Downloads
- Downloads (Last 12 months)318
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Speedup Graph Processing by Graph Ordering

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Line (block) size choice for CPU cache memories

Accelerating Depth-First Traversal by Graph Ordering

Finding a chain graph in a bipartite permutation graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Speedup Graph Processing by Graph Ordering

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Line (block) size choice for CPU cache memories

Accelerating Depth-First Traversal by Graph Ordering

Finding a chain graph in a bipartite permutation graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media