skip to main content
10.1145/2822332.2822335acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Towards efficient scheduling of data intensive high energy physics workflows

Authors Info & Claims
Published:15 November 2015Publication History

ABSTRACT

Data intensive high energy physics workflows executed on geographically distributed resources pose a tremendous challenge for efficient use of computing resources. In this early work paper, we present a hierarchical framework for efficient allocation of resources and energy-efficient assignment of tasks for a representative high energy physics application, the Belle II experiments. With an expected data rate of 25 peta bytes per year from experimental data and Monte Carlo simulations, the Belle II experiment provides an ideal platform for algorithmic development. Building on the analogy of the unit commitment problem in electric power grids, we present a novel cost-efficient method for resource allocation that feeds into energy-efficient assignment of tasks to resources using a novel semi-matching based algorithm. We demonstrate that this approach is both computationally efficient and effective. We expect the methods developed in this work to benefit Belle II and other complex workflows executed on distributed resources.

References

  1. Phoronix Media: Intel core i7 3770k power consumption, thermal, 1999. 7Google ScholarGoogle Scholar
  2. S. Alam and J. Vetter. A framework to develop symbolic performance models of parallel applications. In Proc. of the 20th IEEE International Parallel and Distributed Processing Symposium, page 8, 2006. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. R. Alam and J. S. Vetter. Hierarchical model validation of symbolic performance models of scientific kernels. In W. E. Nagel, W. V. Walter, and W. Lehner, editors, Proc. of the 12th International Euro-Par Conference, volume 4128 of Lecture Notes in Computer Science, pages 65--77. Springer, 2006. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Ali, H. J. Siegel, M. Maheswaran, and D. Hensgen. Task execution time modeling for heterogeneous computing systems. In Heterogeneous Computing Workshop, 2000.(HCW 2000) Proceedings. 9th, pages 185--199. IEEE, 2000. 5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Asner, E. Dart, and T. Hara. Belle II: Experiment network and computing. Technical Report arXiv:1308.0672. PNNL-SA-97204, Aug 2013. Contributed to CSS2013 (Snowmass). 1Google ScholarGoogle Scholar
  6. O. Beaumont, H. Casanova, A. Legrand, Y. Robert, and Y. Yang. Scheduling divisible loads on star and tree networks: results and open problems. Parallel and Distributed Systems, IEEE Transactions on, 16(3):207--218, March 2005. 5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Benoit, J. Langguth, and B. Ucar. Semi-matching algorithms for scheduling parallel tasks under resource constraints. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '13, pages 1744--1753, Washington, DC, USA, 2013. IEEE Computer Society. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Bharadwaj, D. Ghose, and T. G. Robertazzi. Divisible load theory: A new paradigm for load scheduling in distributed systems. Cluster Computing, 6(1):7--17, Jan. 2003. 7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Bruno, E. G. Coffman, Jr., and R. Sethi. Scheduling independent tasks to reduce mean finishing time. Commun. ACM, 17(7):382--387, July 1974. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Calotoiu, T. Hoeer, M. Poke, and F. Wolf. Using automated performance modeling to find scalability bugs in complex codes. In Proc. of the 2013 ACM/IEEE Conference on Supercomputing, Denver, CO, 2013. ACM, New York, NY. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. De La Torre and J. Seguel. A comparison of two master-worker scheduling methods. In High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on, pages 597--602, June 2009. 5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of discrete mathematics, 5(2):287--326, 1979. 8Google ScholarGoogle Scholar
  13. T. Hara. Belle II: Computing and network requirements. In Proc. of the Asia-Pacific Advanced Network, pages 115--122, 2014. 1, 2, 3Google ScholarGoogle Scholar
  14. N. J. A. Harvey, R. E. Ladner, L. Lovász, and T. Tamir. Semi-matchings for bipartite graphs and load balancing. J. Algorithms, 59(1):53--78, Apr. 2006. 5, 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Jeannot, E. Saule, and D. Trystram. Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par '08, pages 877--886, Berlin, Heidelberg, 2008. Springer-Verlag. 5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. C. Lee, D. M. Brooks, B. R. de Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In Proc. of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 249--258, San Jose, CA, 2007. ACM, New York, NY. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Meng, V. Morozov, K. Kumaran, V. Vishwanath, and T. Uram. GROPHECY: GPU performance projection from CPU code skeletons. In Proc. of the 2011 ACM/IEEE Conference on Supercomputing, Seattle, WA, 2011. IEEE Computer Society, Los Alamitos, CA. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. R. Nudd, D. J. Kerbyson, E. Papaefstathiou, S. C. Perry, J. S. Harper, and D. V. Wilcox. Pace: A toolset for the performance prediction of parallel and distributed systems. International Journal on High Performance Computing Applications, 14(3):228--251, 2000. 8 Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WORKS '15: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science
    November 2015
    98 pages
    ISBN:9781450339896
    DOI:10.1145/2822332

    Copyright © 2015 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 November 2015

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    WORKS '15 Paper Acceptance Rate9of13submissions,69%Overall Acceptance Rate30of54submissions,56%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader