Co-sites: the autonomous distributed dataflows in collaborative scientific discovery

Published: 15 November 2015 Publication History


Online "big data" processing applications have seen increasing importance in the high performance computing domain, including online analytics of large volumes of data output by various scientific applications.
This work contributes to answering the question of how to promote efficient collaborative science in face of unpredictable analytics workloads and dynamics in available resources? It proposes the Co-Sites solution employing online resource management at the sites participating online collaboration, including geographically distributed sites that may spread across large distances. Co-Sites operates by each site observing its local progress and making its own decisions to better utilize local resources and to maintain acceptable rates of global progress. Co-Sites further enriches such distributed data flows to permit just-in-time data sharing to better leverage collaborators' diverse domain expertise.
Experiments with a combustion workflow demonstrate the Co-Sites solution with (i) improved end-to-end completion times, (ii) good scalability, and (iii) with good data sharing latencies.


WORKS '15: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science
November 2015
98 pages
