skip to main content
10.1145/1989493.1989540acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

On scheduling in map-reduce and flow-shops

Published: 04 June 2011 Publication History

Abstract

The map-reduce paradigm is now standard in industry and academia for processing large-scale data. In this work, we formalize job scheduling in map-reduce as a novel generalization of the two-stage classical flexible flow shop (FFS) problem: instead of a single task at each stage, a job now consists of a set of tasks per stage. For this generalization, we consider the problem of minimizing the total flowtime and give an efficient 12-approximation in the offline setting and an online (1+µ)-speed O(1/µ2)-competitive algorithm.
Motivated by map-reduce, we revisit the two-stage flow shop problem, where we give a dynamic program for minimizing the total flowtime when all jobs arrive at the same time. If there are fixed number of job-types the dynamic program yields a PTAS; it is also a QPTAS when the processing times of jobs are polynomially bounded. This gives the first improvement in approximation of flowtime for the two-stage flow shop problem since the trivial 2-approximation algorithm of Gonzalez and Sahni [29] in 1978, and the first known approximation for the FFS problem. We then consider the generalization of the two-stage FFS problem to the unrelated machines case, where we give an offline 6-approximation and an online (1+µ)-speed O(1/µ4)-competitive algorithm.

References

[1]
F. Afrati, E. Bampis, C. Chekuri, D. Karger, C. Kenyon, S. Khanna, I. Milis, M. Queyranne, M. Skutella, and C. Stein. Approximation schemes for minimizing average weighted completion time with release dates. In Proc. 40th FOCS, pages 32--44, 1999.
[2]
N. Avrahami and Y. Azar. Minimizing total flow time and total completion time with immediate dispatching. Algorithmica, 47(3):253--268, 2007.
[3]
N. Bansal, R. Krishnaswamy, and V. Nagarajan. Better scalable algorithms for broadcast scheduling. In Proc. 37th ICALP, pages 324--335, 2010.
[4]
J. S. Chadha, N. Garg, A. Kumar, and V. N. Muralidhara. A competitive algorithm for minimizing weighted flow time on unrelated machines with speed augmentation. In Proc. 41st STOC, pages 679--684, 2009.
[5]
C. Chekuri, A. Goel, S. Khanna, and A. Kumar. Multi-processor scheduling to minimize flow time with epsilon resource augmentation. In Proc. 36th STOC, pages 363--372, 2004.
[6]
C. Chekuri and B. Moseley. Online scheduling to minimize the maximum delay factor. In Proc. 19th SODA, pages 1116--1125, 2009.
[7]
F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In Proc. 19th WWW, pages 231--240, 2010.
[8]
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In Proc. 20th NIPS, pages 281--288, 2006.
[9]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. C. ACM, 51:107--113, 2008.
[10]
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. In Proc. 19th SODA, pages 710--719, 2008.
[11]
M. J. Fischer, X. Su, and Y. Yin. Assigning tasks for efficiency in hadoop: Extended abstract. In Proc. 22nd SPAA, pages 30--39, 2010.
[12]
A. Ganapathi, Y. Chen, A. Fox, R. Katz, and D. Patterson. Statistics-driven workload modeling for the Cloud. In Proc. Data Engineering Workshops at 26th ICDE, pages 87--92, 2010.
[13]
M. R. D. Garey, D. S. Johnson, and R. Sethi. The complexity of flowshop and jobshop scheduling. Mathematics of Operations Research, 1:1171--129, 1976.
[14]
N. Garg and A. Kumar. Minimizing average flow-time : Upper and lower bounds. In FOCS, pages 603--613, 2007.
[15]
N. Garg, A. Kumar, and V. N. Muralidhara. Minimizing total flow-time: The unrelated case. In Proc. 19th ISAAC, pages 424--435, 2008.
[16]
L. A. Hall. Approximability of flow shop scheduling. In Proc. 36th FOCS, pages 82--91, 1995.
[17]
C. Hepner and C. Stein. Implementation of a PTAS for scheduling with release dates. Algorithm Engineering and Experimentation, pages 202--215, 2001.
[18]
S. Im and B. Moseley. An online scalable algorithm for minimizing l_k-norms of weighted flow time on unrelated machines. In Proc. 21st SODA, pages 95--108, 2011.
[19]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In Proc. 22nd SOSP, pages 261--276, 2009.
[20]
S. M. Johnson. Optimal two- and three-stage production schedules with setup times included. Naval Research Logistics Quarterly, 1:69--81, 1954.
[21]
B. Kalyanasundaram and K. Pruhs. Speed is as powerful as clairvoyance. J. ACM, 47(4):617--643, 2000.
[22]
U. Kang, C. E. Tsourakakis, A. Appel, C. Faloutsos, and J. Leskovec. HADI: Fast diameter estimation and mining in massive graphs with Hadoop. Technical Report Carnegie Mellon University-ML-08-117, Carnegie Mellon University, 2008.
[23]
D. Karger, C. Stein, and J. Wein. Scheduling algorithms. In M. Atallah, editor, Handbook on Algorithms and Theory of Computation, chapter 34. Chapman and Hall/CRC, 1999.
[24]
H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In Proc. 20th SODA, pages 938--948, 2010.
[25]
S. Leonardi and D. Raz. Approximating total flow time on parallel machines. JCSS, 73(6):875--891, 2007.
[26]
J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Number 7 in Synthesis Lectures on Human Language Technologies. Morgan and Claypool, 2010.
[27]
K. Pruhs, J. Sgall, and E. Torng. Handbook of Scheduling: Algorithms, Models, and Performance Analysis, chapter Online Scheduling. CRC Press, 2004.
[28]
T. Sandholm and K. Lai. MapReduce optimization using regulated dynamic prioritization. In Proc. 11th SIGMETRICS, pages 299--310, 2009.
[29]
P. Schuurman and G. J. Woeginger. Flowshop and jobshop schedules: Complexity and approximation. Operations Research, 26:136--152, 1978.
[30]
P. Schuurman and G. J. Woeginger. Polynomial time approximation algorithms for machine scheduling: ten open problems. Journal of Scheduling, 2(5):203--213, 1999.
[31]
P. Schuurman and G. J. Woeginger. A polynomial time approximation scheme for the two-stage multiprocessor flow shop problem. TCS, 237(1-2):105--122, 2000.
[32]
M. Skutella. Convex quadratic and semidefinite programming relaxations in scheduling. J. ACM, 48(2):206--242, 2001.
[33]
J. L. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K.-L. Wu, and A. Balmin. FLEX: A slot allocation scheduling optimizer for MapReduce workloads. In Middleware, pages 1--20, 2010.
[34]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proc. 5th EuroSys, pages 265--278, 2010.
[35]
M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proc. USENIX OSDI, 2008.

Cited By

View all
  • (2024)Online makespan minimization for MapReduce scheduling on multiple parallel machinesDemonstratio Mathematica10.1515/dema-2024-004057:1Online publication date: 1-Nov-2024
  • (2024)Q-scheduler: Optimize Job Scheduling in Hadoop with Reinforcement Learning2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580207(2503-2508)Online publication date: 8-May-2024
  • (2023)Scheduling distributed multiway spatial join queries: optimization models and algorithmsInternational Journal of Geographical Information Science10.1080/13658816.2023.217038037:6(1388-1419)Online publication date: 6-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
June 2011
404 pages
ISBN:9781450307437
DOI:10.1145/1989493
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithm analysis
  2. approximation algorithms
  3. flow-shops
  4. map-reduce
  5. on-line problems
  6. scheduling and resource allocation

Qualifiers

  • Research-article

Conference

SPAA '11

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Online makespan minimization for MapReduce scheduling on multiple parallel machinesDemonstratio Mathematica10.1515/dema-2024-004057:1Online publication date: 1-Nov-2024
  • (2024)Q-scheduler: Optimize Job Scheduling in Hadoop with Reinforcement Learning2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580207(2503-2508)Online publication date: 8-May-2024
  • (2023)Scheduling distributed multiway spatial join queries: optimization models and algorithmsInternational Journal of Geographical Information Science10.1080/13658816.2023.217038037:6(1388-1419)Online publication date: 6-Feb-2023
  • (2023)Improvement of Makespan and TCTime in Dynamic Job Ordering and Slot Utilization for MapReduce WorkloadsIntelligent Computing and Networking10.1007/978-981-99-0071-8_9(95-110)Online publication date: 22-Mar-2023
  • (2022)Total weighted tardiness for scheduling MapReduce jobs on parallel batch machinesJournal of Industrial and Management Optimization10.3934/jimo.2022201(0)Online publication date: 2022
  • (2022)Multi Resource Scheduling with Task Cloning in Heterogeneous ClustersProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545093(1-11)Online publication date: 29-Aug-2022
  • (2022)Speed Scaling on Parallel Servers With MapReduce Type Precedence ConstraintsIEEE/ACM Transactions on Networking10.1109/TNET.2022.314209130:4(1509-1524)Online publication date: Aug-2022
  • (2022)Non-Intrusive and Workflow-Aware Virtual Network Function Scheduling in User-SpaceIEEE Transactions on Cloud Computing10.1109/TCC.2020.302423210:3(1975-1990)Online publication date: 1-Jul-2022
  • (2022)Reducing Average Job Completion Time for DAG-style Jobs by Adding Idle SlotsGLOBECOM 2022 - 2022 IEEE Global Communications Conference10.1109/GLOBECOM48099.2022.10001196(4504-4509)Online publication date: 4-Dec-2022
  • (2021)Energy and SLA-driven MapReduce Job Scheduling Framework for Cloud-based Cyber-Physical SystemsACM Transactions on Internet Technology10.1145/340977221:2(1-24)Online publication date: 3-May-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media