skip to main content
research-article

AlloX: Allocation across Computing Resources for Hybrid CPU/GPU clusters

Published:17 January 2019Publication History
Skip Abstract Section

Abstract

GPUs are considered as the accelerators for CPUs. We call these applications GPU applications. Some machine learning frameworks like Tensorflow support their machine learning (ML) jobs running either on CPUs or GPUs. Nvidia claims that Titan GPU K80 12GB can speed up 5-10x on average. Although GPUs offer the advantages on performance, they are very expensive. For example, a GPU K80 roughly costs $4000 while an Intel Xeon E5 Quad cores costs $350.

The coexist of traditional CPU and GPU applications urges cloud computing operators to build hybrid CPU/GPU clusters. While the traditional applications are executed on CPUs, the GPU applications can run on either CPUs or GPUs. In the CPU/GPU clusters, how to provision the hybrid CPU/GPU clusters for CPU and GPU applications and how to allocate the resources across CPUs and GPUs?

Interchangeable resources like CPUs and GPUs are not rare in large clusters. Some network I/O cards like wireless, ethernet, infinityband with different bandwidths can also be interchangeable.

In this paper, we focus on CPU/GPU systems. We develop a tool that estimates the performance and resource for an ML job in an online manner (§2). We implement AlloX system that supports resource allocation and places applications on right resources (CPU or GPU) to maximize the use of computational resource (§3). The proposed AlloX policy achieves up to 35% progress improvement compared to default DRF [2]. We build a model that minimizes the total cost of ownership for CPU/GPU data centers (§4).

References

  1. O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In NSDI, volume 2, pages 4--2, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. N. Le, Z. Liu, Y. Chen, and C. Bash. Joint capacity planning and operational management for sustainable data centers and demand response. In Proceedings of the Seventh International Conference on Future Energy Systems, page 16. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Venkataraman, Z. Yang, M. J. Franklin, B. Recht, and I. Stoica. Ernest: Efficient performance prediction for large-scale advanced analytics. In NSDI, pages 363--378, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AlloX: Allocation across Computing Resources for Hybrid CPU/GPU clusters
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGMETRICS Performance Evaluation Review
      ACM SIGMETRICS Performance Evaluation Review  Volume 46, Issue 2
      September 2018
      95 pages
      ISSN:0163-5999
      DOI:10.1145/3305218
      Issue’s Table of Contents

      Copyright © 2019 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 January 2019

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader