skip to main content
10.1145/3132847.3133087acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

PMS: an Effective Approximation Approach for Distributed Large-scale Graph Data Processing and Mining

Published: 06 November 2017 Publication History

Abstract

Recently, large-scale graph data processing and mining has drawn great attention, and many distributed graph processing systems have been proposed. However, large-scale graph processing remains a challenging problem. Because the computation time in some cases is still unacceptable especially when the time is limited. As illustrated in Table 1, nearly three hours are needed when running Single-Source Shortest Path algorithm on the USA-road dataset using performant open-source distributed graph processing systems.
In this paper, we propose an effective priority-based message sampling (PMS ) approach to further improve the performance of distributed graph processing at the cost of some accuracy loss. Noticing that the passing and processing of messages dominates the computation time, our approach works by eliminating those less useful messages directly without passing them which can effectively reduce the computation overhead. We implement our approach basing on Apache Giraph, a popular open-source implementation of Google's Pregel and report the primary results of our prototype system. The experimental results show that our approach can achieve reasonable accuracy with much less computation time.

References

[1]
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2012. BlinkDB: queries with bounded errors and bounded response times on very large data EuroSys.
[2]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In WWW.
[3]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques WWW.
[4]
Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, and Haibing Guan. 2014. Computation and communication efficient graph processing with distributed immutable view HPDC.
[5]
Gal Elidan, Ian McGraw, and Daphne Koller. 2012. Residual belief propagation: Informed scheduling for asynchronous message passing. arXiv preprint arXiv:1206.6837 (2012).
[6]
The Apache Software Foundation. 2011. Apache Giraph. (2011). http://giraph.apache.org/
[7]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs OSDI.
[8]
Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. Graphx: Graph processing in a distributed dataflow framework OSDI.
[9]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. Wtf: The who to follow service at twitter. In WWW.
[10]
Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB.
[11]
Wuyang Ju, Jianxin Li, Weiren Yu, and Richong Zhang. 2016. iGraph: an incremental data processing system for dynamic graph. FCS (2016).
[12]
U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2009. Pegasus: A peta-scale graph mining system implementation and observations ICDM.
[13]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. PVLDB (2012).
[14]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. SIGMOD.
[15]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).
[16]
Natavsa Prvzulj. 2011. Protein-protein interactions: Making sense of networks via graph-theoretic modeling. Bioessays (2011).
[17]
Semih Salihoglu and Jennifer Widom. 2013. GPS: a graph processing system. In SSDBM.
[18]
Zechao Shang and Jeffrey Xu Yu. 2014. Auto-approximation of graph computing. PVLDB (2014).
[19]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM (1990).
[20]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB (2014).
[21]
Weiren Yu, Jianxin Li, Md Zakirul Alam Bhuiyan, Richong Zhang, and Jinpeng Huai. 2017. Ring: Real-Time Emerging Anomaly Monitoring System over Text Streams. IEEE Transactions on Big Data (2017).
[22]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. PrIter: a distributed framework for prioritized iterative computations SOCC.
[23]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2012. Accelerate large-scale iterative computation through asynchronous accumulative updates Proceedings of the 3rd workshop on Scientific Cloud Computing Date.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate computation
  2. distributed system
  3. large-scale graph processing

Qualifiers

  • Short-paper

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 136
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media