skip to main content
10.1145/1989493.1989505acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Filtering: a method for solving graph problems in MapReduce

Published: 04 June 2011 Publication History

Abstract

The MapReduce framework is currently the de facto standard used throughout both industry and academia for petabyte scale data analysis. As the input to a typical MapReduce computation is large, one of the key requirements of the framework is that the input cannot be stored on a single machine and must be processed in parallel. In this paper we describe a general algorithmic design technique in the MapReduce framework called filtering. The main idea behind filtering is to reduce the size of the input in a distributed fashion so that the resulting, much smaller, problem instance can be solved on a single machine. Using this approach we give new algorithms in the MapReduce framework for a variety of fundamental graph problems for sufficiently dense graphs. Specifically, we present algorithms for minimum spanning trees, maximal matchings, approximate weighted matchings, approximate vertex and edge covers and minimum cuts. In all of these cases, we parameterize our algorithms by the amount of memory available on the machines allowing us to show tradeoffs between the memory available and the number of MapReduce rounds. For each setting we will show that even if the machines are only given substantially sublinear memory, our algorithms run in a constant number of MapReduce rounds. To demonstrate the practical viability of our algorithms we implement the maximal matching algorithm that lies at the core of our analysis and show that it achieves a significant speedup over the sequential version.

References

[1]
E. Bakshy, J. Hofman, W. Mason, and D. J. Watts. Everyone's an influencer: Quantifying influence on twitter. In Proceedings of WSDM, 2011.
[2]
Bernard Chazelle. A minimum spanning tree algorithm with inverse-Ackerman type complexity. Journal of the ACM, 47(6):1028--1047, November 2000.
[3]
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of OSDI, pages 137--150, 2004.
[4]
Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207--216, December 2005.
[5]
Michael T. Goodrich. Simulating parallel algorithms in the mapreduce framework with applications to parallel computational geometry. Second Workshop on Massive Data Algorithmics (MASSIVE 2010), June 2010.
[6]
Hadoop Wiki - Powered By. http://wiki.apache.org/hadoop/PoweredBy.
[7]
Blake Irving. Big data and the power of hadoop. Yahoo! Hadoop Summit, June 2010.
[8]
Amos Israel and A. Itai. A fast and simple randomized parallel algorithm for maximal matching. Information Processing Letters, 22(2):77--80, 1986.
[9]
U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec. HADI: Fast diameter estimation and mining in massive graphs with hadoop. Technical Report Carnegie Mellon University-ML-08-117, Carnegie Mellon University, December 2008.
[10]
David R. Karger. Global min-cuts in RNC and other ramifications of a simple mincut algorithm. In Proceedings of SODA, pages 21--30, January 1993.
[11]
David R. Karger, Philip N. Klein, and Robert E. Tarjan. A randomized linear-time algorithm for finding minimum spanning trees. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, Proceedings of STOC, pages 9--15, New York, NY, USA, 1994. ACM.
[12]
David R. Karger and Clifford Stein. An O(n<sup>2</sup>) algorithm for minimum cuts. In Proceedings of STOC, pages 757--765, May 1993.
[13]
Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of SODA, pages 938--948, 2010.
[14]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking daimeters and possible explanations. In Proc. 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005.
[15]
Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. Number 7 in Synthesis Lectures on Human Language Technologies. Morgan and Claypool, April 2010.
[16]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of SIGMOD, pages 135--145, Indianapolis, IN, USA, June 2010. ACM.
[17]
Mike Schroepfer. Inside large-scale analytics at facebook. Yahoo! Hadoop Summit, June 2010.
[18]
Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. In Proceedings of STOC, pages 563--568, New York, NY, USA, 2008. ACM.
[19]
Mirjam Wattenhofer and Roger Wattenhofer. Distributed weighted matching. In Proceedings of DISC, pages 335--348. Springer, 2003.
[20]
Tom White. Hadoop: The Definitive Guide. O'Reilly Media, 2009.
[21]
Yahoo! Inc Press Release. Yahoo! partners with four top universities to advance cloud computing systems and applications research. http://research.yahoo.com/news/2743, April 2009.

Cited By

View all
  • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
  • (2024)Log Diameter Rounds MST Verification and Sensitivity in MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659984(269-280)Online publication date: 17-Jun-2024
  • (2024)O(log log n) Passes Is Optimal for Semi-streaming Maximal Independent SetProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649763(847-858)Online publication date: 10-Jun-2024
  • Show More Cited By

Index Terms

  1. Filtering: a method for solving graph problems in MapReduce

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
    June 2011
    404 pages
    ISBN:9781450307437
    DOI:10.1145/1989493
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • EATCS: European Association for Theoretical Computer Science

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MapReduce
    2. graph algorithms
    3. matchings

    Qualifiers

    • Research-article

    Conference

    SPAA '11

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Upcoming Conference

    SPAA '25
    37th ACM Symposium on Parallelism in Algorithms and Architectures
    July 28 - August 1, 2025
    Portland , OR , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
    • (2024)Log Diameter Rounds MST Verification and Sensitivity in MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659984(269-280)Online publication date: 17-Jun-2024
    • (2024)O(log log n) Passes Is Optimal for Semi-streaming Maximal Independent SetProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649763(847-858)Online publication date: 10-Jun-2024
    • (2024)On The Soundness of a Language for Large and Distributed Graph Processing2024 IEEE 18th International Conference on Application of Information and Communication Technologies (AICT)10.1109/AICT61888.2024.10740415(1-6)Online publication date: 25-Sep-2024
    • (2024)Component stability in low-space massively parallel computationDistributed Computing10.1007/s00446-024-00461-937:1(35-64)Online publication date: 8-Feb-2024
    • (2024)Web Of Synonyms: An Enhanced Keyword Extraction Model For Recommendation SystemsIntelligent Computing and Big Data Analytics10.1007/978-3-031-74682-6_13(193-206)Online publication date: 31-Dec-2024
    • (2023)A Hierarchical Grouping Algorithm for the Multi-Vehicle Dial-a-Ride ProblemProceedings of the VLDB Endowment10.14778/3579075.357909116:5(1195-1207)Online publication date: 6-Mar-2023
    • (2023)RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketchesGenome Biology10.1186/s13059-023-02961-624:1Online publication date: 17-May-2023
    • (2023)Exponentially Faster Massively Parallel Maximal MatchingJournal of the ACM10.1145/361736070:5(1-18)Online publication date: 11-Oct-2023
    • (2023)Engineering Massively Parallel MST Algorithms2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00075(691-701)Online publication date: May-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media