skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On Analytics of File Transfer Rates over Dedicated Wide-Area Connections

Conference ·

File transfers between the decentralized storage sites over dedicated wide-area connections are becoming increasingly important in high-performance computing and big data scenarios. Designing such scientific workflows for large file transfers is extremely challenging as they depend on the file, I/O, host, and local- and wide-area network subsystems, and their interactions. To gain insights into file-transfer rate profiles, we develop polynomial, bagging, and boosting regression models for Lustre and XFS file transfer measurements, which are collected using XDD over a suite of 10 Gbps connections with 0-366 ms round trip times (RTTs). In addition to overall trends and analytics, these regressions also provide file-transfer rate estimates for RTTs and number of parallel flows at which measurements might not have been collected. They show that bagging and boosting techniques provide closer data fits than the polynomial regression. We develop probabilistic bounds on the generalization error of these methods, which combined with the cross-validation error establish that former two are more accurate estimators than the polynomial regression. In addition, we present a method to efficiently determine the number of parallel flows to achieve a peak file-transfer rate using fewer than full sweep measurements; in our measurements, the peak is achieved in 96% of cases with 15-25% of measurements of a full sweep.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1435302
Resource Relation:
Conference: First International Workshop on Workflow Science (WoWS) - AUCKLAND, , New Zealand - 10/24/2017 4:00:00 AM-10/27/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

References (9)

Using Regression Techniques to Predict Large Data Transfers journal August 2003
Experimental analysis of 10Gbps transfers over physical and emulated dedicated connections conference January 2012
Automatic parameter configuration mechanism for data transfer protocol GridFTP conference January 2006
Improving GridFTP performance using the Phoebus session layer conference November 2009
Pattern-driven parallel I/O tuning conference January 2015
Using server-to-server communication in parallel file systems to simplify consistency and improve performance
  • Carns, Philip H.; Settlemyer, Bradley W.; Ligon, Walter B.
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2008.5214724
conference November 2008
A technique for moving large data sets over high-performance long distance networks conference May 2011
An Introduction to Statistical Learning book January 2013
Large-Scale Machine Learning with Stochastic Gradient Descent book January 2010