research-article

Server-Side Workload Identification for HPC I/O Requests

Authors:
Lu Pang

Temple University, Philadelphia, PA, USA

Temple University, Philadelphia, PA, USA
View Profile

,
Krishna Kant

Temple University, Philadelphia, PA, USA

Temple University, Philadelphia, PA, USA
View Profile

PERMAVOST '22: Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn StrategyJune 2022Pages 15–22https://doi.org/10.1145/3526063.3535350

Published:27 June 2022Publication History

PERMAVOST '22: Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy

Pages 15–22

ABSTRACT

In this paper, we develop a method to identify High Performance Computing (HPC) workloads from a stream of incoming I/O requests. This characterization of workloads could then be used to intelligently schedule the I/O requests in the parallel file system (PFS) that most HPC systems use. We use a deep learning model for this purpose that is designed to pick up changes in the workload as they occur. We show that our method accurately determines the workload characteristics when evaluated on publicly available server-side HPC traces. We also show that the I/O scheduling based on such a characterization can substantially increase the available I/O bandwidth and thus reduce the latencies for the HPC workloads.

Supplemental Material

PERMAVOST22-perma02.mp4

mp4

180.2 MB

Download

References

2014. uppercaseMPI-IO Test. http://freshmeat.sourceforge.net/projects/mpiiotest/.Google Scholar
Abien Fred Agarap. 2019. Deep Learning using Rectified Linear Units (ReLU). arxiv: 1803.08375 [cs.NE]Google Scholar
Jean Luca Bez, Francieli Zanon Boito, Ramon Nou, Alberto Miranda, Toni Cortes, and Philippe OA Navaux. 2020. Adaptive request scheduling for the I/O forwarding layer using reinforcement learning. Future Generation Computer Systems, Vol. 112 (2020), 1156--1169.Google ScholarCross Ref
Jean Luca Bez, Francieli Zanon Boito, Lucas M Schnorr, Philippe OA Navaux, and Jean-Francc ois Méhaut. 2017. Twins: Server access coordination in the I/O forwarding layer. In 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). IEEE, 116--123.Google ScholarCross Ref
Francieli Zanon Boito, Rodrigo Virote Kassick, Philippe OA Navaux, and Yves Denneulin. 2016. Automatic I/O scheduling algorithm selection for parallel file systems. Concurrency and Computation: Practice and Experience, Vol. 28, 8 (2016), 2457--2472.Google ScholarCross Ref
Francieli Zanon Boito, Ramon Nou, Laércio Lima Pilla, Jean Luca Bez, Jean-Francc ois Méhaut, Toni Cortes, and Philippe OA Navaux. 2019. On server-side file access pattern matching. In 2019 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 217--224.Google ScholarCross Ref
Raphaël Bolze, Franck Cappello, Eddy Caron, Michel Daydé, Frédéric Desprez, Emmanuel Jeannot, Yvon Jégou, Stephane Lanteri, Julien Leduc, Noredine Melab, et al. 2006. Grid'5000: A large scale and highly reconfigurable experimental grid testbed. The International Journal of High Performance Computing Applications, Vol. 20, 4 (2006), 481--494.Google ScholarDigital Library
Y Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in vision algorithms. In Proc. International Conference on Machine learning (ICML'10), Vol. 28. 3.Google Scholar
Feng Chen, David A Koufaty, and Xiaodong Zhang. 2009. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. ACM SIGMETRICS Performance Evaluation Review, Vol. 37, 1 (2009), 181--192.Google ScholarDigital Library
Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. IEEE, 266--277.Google ScholarCross Ref
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, Vol. 51, 1 (2008), 107--113.Google ScholarDigital Library
Ana Gainaru, Guillaume Aupy, Anne Benoit, Franck Cappello, Yves Robert, and Marc Snir. 2015. Scheduling the I/O of HPC applications under congestion. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1013--1022.Google ScholarDigital Library
Apache Hadoop. 2022. Hadoop. https://hadoop.apache.org/.Google Scholar
Red Hat. 2022. NOOP. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/ch06s04s03.Google Scholar
John J Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, Vol. 79, 8 (1982), 2554--2558.Google ScholarCross Ref
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arxiv: 1502.03167 [cs.LG]Google Scholar
Diederik P. Kingma et al. 2017. Adam: A Method for Stochastic Optimization. arxiv: 1412.6980 [cs.LG]Google Scholar
Julian M Kunkel, Michaela Zimmer, Nathanael Hübbe, Alvaro Aguilera, Holger Mickler, Xuan Wang, Andriy Chut, Thomas Bönisch, Jakob Lüttgau, Roman Michel, et al. 2014. The SIOX architecture--coupling automatic monitoring and optimization of parallel I/O. In International Supercomputing Conference. Springer, 245--260.Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436--444.Google Scholar
Yang Liu, Raghul Gunasekaran, Xiaosong Ma, and Sudharshan S. Vazhkudai. 2014. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14). USENIX.Google Scholar
Ryan McKenna, Stephen Herbein, Adam Moody, Todd Gamblin, and Michela Taufer. 2016. Machine learning predictions of runtime and IO traffic on high-end clusters. In 2016 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 255--258.Google ScholarCross Ref
OrangeFS. 2022. The uppercaseOrangeuppercaseFS uppercaseProject. http://www.orangefs.org/.Google Scholar
Tirthak Patel, Suren Byna, Glenn K Lockwood, Nicholas J Wright, Philip Carns, Robert Ross, and Devesh Tiwari. 2020. Uncovering Access, Reuse, and Sharing Characteristics of {I/O-Intensive} Files on {Large-Scale} Production $$HPC$$ Systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 91--101.Google ScholarDigital Library
Arnab K Paul, Olaf Faaland, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, and Ali R Butt. 2020. Understanding hpc application i/o behavior using system level statistics. In 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, 202--211.Google ScholarCross Ref
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks, Vol. 61 (2015), 85--117.Google Scholar
Hongzhang Shan and John Shalf. 2007. Using IOR to analyze the I/O performance for HPC platforms. Technical Report. Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US).Google Scholar
Abraham Silberschatz, Peter B Galvin, and Greg Gagne. 2006. Operating system concepts. John Wiley & Sons.Google Scholar
Huaiming Song, Yanlong Yin, Xian-He Sun, Rajeev Thakur, and Samuel Lang. 2011. Server-side I/O coordination for parallel file systems. In SC'11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--11.Google ScholarDigital Library
Apache Spark. 2022. Spark. https://spark.apache.org/.Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
Sagar Thapaliya, Purushotham Bangalore, Jay Lofstead, Kathrn Mohror, and Adam Moody. 2014. IO-cop: Managing concurrent accesses to shared parallel file system. In 2014 43rd International Conference on Parallel Processing Workshops. IEEE, 52--60.Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A {Fault-Tolerant} Abstraction for {In-Memory} Cluster Computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 15--28.Google Scholar
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarCross Ref

Index Terms

Server-Side Workload Identification for HPC I/O Requests
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information storage systems

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Scientific workflows are increasingly common in the workloads of current High Performance Computing (HPC) systems. However, HPC schedulers do not incorporate workflow-specific mechanisms beyond the capacity to declare dependencies between their jobs. ...
Read More
An Edge Service for Managing HPC Workflows
HUST'17: Proceedings of the Fourth International Workshop on HPC User Support Tools

Large experimental collaborations, such as those at the Large Hadron Collider at CERN, have developed large job management systems running hundreds of thousands of jobs across worldwide computing grids. HPC facilities are becoming more important to ...
Read More
HPC on the Grid: The Theophys Experience

The Grid Virtual Organization (VO) "Theophys", associated to the INFN (Istituto Nazionale di Fisica Nucleare), is a theoretical physics community with various computational demands, spreading from serial, SMP, MPI and hybrid jobs. That has led, in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PERMAVOST '22: Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy
June 2022
30 pages
ISBN:9781450393140
DOI:10.1145/3526063
Program Chairs:
Connor Scully-Allison
University of Arizona, USA
,
Radita Liem
RWTH Aachen University, Germany
,
Ana Veroneze Solorzano
Northeastern University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hpc
scheduling
workload identification
Qualifiers
- research-article
Conference
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 76
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Server-Side Workload Identification for HPC I/O Requests

PERMAVOST '22: Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems

An Edge Service for Managing HPC Workflows

HPC on the Grid: The Theophys Experience

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Server-Side Workload Identification for HPC I/O Requests

PERMAVOST '22: Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems

An Edge Service for Managing HPC Workflows

HPC on the Grid: The Theophys Experience

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media