skip to main content
10.1145/2822332.2822337acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Co-sites: the autonomous distributed dataflows in collaborative scientific discovery

Published: 15 November 2015 Publication History

Abstract

Online "big data" processing applications have seen increasing importance in the high performance computing domain, including online analytics of large volumes of data output by various scientific applications.
This work contributes to answering the question of how to promote efficient collaborative science in face of unpredictable analytics workloads and dynamics in available resources? It proposes the Co-Sites solution employing online resource management at the sites participating online collaboration, including geographically distributed sites that may spread across large distances. Co-Sites operates by each site observing its local progress and making its own decisions to better utilize local resources and to maintain acceptable rates of global progress. Co-Sites further enriches such distributed data flows to permit just-in-time data sharing to better leverage collaborators' diverse domain expertise.
Experiments with a combustion workflow demonstrate the Co-Sites solution with (i) improved end-to-end completion times, (ii) good scalability, and (iii) with good data sharing latencies.

References

[1]
J. F. Lofstead, M. Polte, G. A. Gibson, S. Klasky, K. Schwan, R. Oldfield, M. Wolf, and Q. Liu, "Six degrees of scientific data: reading patterns for extreme scale science io," in HPDC, 2011, pp. 49--60.
[2]
W. X. Wang, Z. Lin, W. M. Tang, W. W. Lee, S. Ethier, J. L. V. Lewandowski, G. Rewoldt, T. S. Hahm, and J. Manickam, "Gyro-Kinetic simulation of global turbulent transport properties in tokamak experiments," Physics of Plasmas, vol. 13, no. 9, p. 092505, 2006.
[3]
S. Plimpton, R. Pollock, and M. Stevens, "Particle-mesh ewald and rrespa for parallel molecular dynamics simulations," in PPSC. SIAM, 1997.
[4]
M. Wolf, Z. Cai, W. Huang, and K. Schwan, "Smartpointers: personalized scientific data portals in your hand," in SC, 2002.
[5]
R. Bhartia, "Amazon kinesis and apache storm," https://d0.awsstatic.com/whitepapers/building-sliding-window-analysis-of-clickstream-data-kinesis.pdf, 2014.
[6]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy, "Storm@twitter," in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '14. New York, NY, USA: ACM, 2014.
[7]
"S4: Distributed stream computing platform," http://incubator.apache.org/s4.
[8]
T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle, "Millwheel: Fault-tolerant stream processing at internet scale," in Very Large Data Bases, 2013, pp. 734--746.
[9]
"Apache samza," http://samza.apache.org.
[10]
"Spark streaming," https://spark.apache.org/streaming.
[11]
E. Deelman, G. Singh, M. hui Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: a framework for mapping complex scientific workflows onto distributed systems," Scientific Programming Journal, vol. 13, pp. 219--237, 2005.
[12]
"Storm," https://storm.apache.org, 2014.
[13]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi, "Naiad: A timely dataflow system," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ser. SOSP '13. New York, NY, USA: ACM, 2013, pp. 439--455.
[14]
D. Peng and F. Dabek, "Large-scale incremental processing using distributed transactions and notifications," in Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI'10. USENIX Association, 2010.
[15]
Z. Qian, Y. He, C. Su, Z. Wu, H. Zhu, T. Zhang, L. Zhou, Y. Yu, and Z. Zhang, "Timestream: Reliable stream computation in the cloud," in Proceedings of the 8th ACM European Conference on Computer Systems, ser. EuroSys '13. ACM, 2013.
[16]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, "Discretized streams: Fault-tolerant streaming computation at scale," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ser. SOSP '13. ACM, 2013.
[17]
J. Dayal, J. Cao, G. Eisenhauer, K. Schwan, M. Wolf, F. Zheng, H. Abbasi, S. Klasky, N. Podhorszki, and J. Lofstead, "I/O Containers: Managing the Data Analytics and Visualization Pipelines of High End Codes," in IPDPSW, 2013.
[18]
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M. hui Su, K. Vahi, and M. Livny, "Pegasus: Mapping scientific workflows onto the grid," 2004.
[19]
R. Duan, R. Goh, Q. Zheng, and Y. Liu, "Scientific workflow partitioning and data flow optimization in hybrid clouds," Cloud Computing, IEEE Transactions on, 2014.
[20]
M. D. Beynon, R. Ferreira, T. M. Kurç, A. Sussman, and J. H. Saltz, "Datacutter: Middleware for filtering very large scientific datasets on archival storage systems," in IEEE Symposium on Mass Storage Systems, 2000, pp. 119--134.
[21]
S. Lakshminarasimhan, N. Shah, S. Ethier, S. Klasky, R. Latham, R. B. Ross, and N. F. Samatova, "Compressing the incompressible with isabela: In-situ reduction of spatio-temporal data," in Euro-Par (1), 2011.
[22]
H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng, "Datastager: scalable data staging services for petascale applications," Cluster Computing 2010, 2010.
[23]
A. Marshall et al., "Measurements of Leading Point Conditioned Statistics of High Hydrogen Content Fuels," in The 8th U.S. National Combustion Meeting, 2013.
[24]
Q. Liu et al., "Runtime I/O Re-Routing + Throttling on HPC Storage," ser. HotStorage, 2013.
[25]
"KSTAR," https://fusion.gat.com/global/KSTAR.
[26]
"Amazon S3," http://aws.amazon.com/s3.
[27]
"Openstack Swift," https://wiki.openstack.org/wiki/Swift.
[28]
San Diego Supuercomputer Center, "SDSC cloud storage services," http://cloud.sdsc.edu/hp/index.php.
[29]
J. Huang, X. Zhang, G. Eisenhauer, K. Schwan, M. Wolf, S. Ethier, and S. Klasky, "Scibox: Online sharing of scientific data via the cloud," in Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, ser. IPDPS '14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 145--154.
[30]
H. Yoon, A. Gavrilovska, K. Schwan, and J. Donahue, "Interactive use of cloud services: Amazon sqs and s3," in CCGrid'12, 2012.
[31]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI'12. USENIX Association, 2012.
[32]
"Otsu's Method," http://www.mathworks.com/help/images/\\ref/graythresh.html.
[33]
"Curve Fitting," http://www.mathworks.com/products/\\curvefitting/.
[34]
L. Amini et al., "SPC: A distributed, scalable platform for data mining," in DM-SSP, 2006.
[35]
J. Wolf et al., "SODA: An Optimizing Scheduler for Large-scale Stream-based Distributed Computer Systems," in Middleware, 2008.
[36]
R. Khandekar, K. Hildrum, S. Parekh, D. Rajan, J. Wolf, K.-L. Wu, H. Andrade, and B. Gedik, "COLA: Optimizing Stream Processing Applications via Graph Partitioning," in Middleware, 2009.
[37]
D. J. Abadi et al., "Aurora: A New Model and Architecture for Data Stream Management," The VLDB Journal, vol. 12, no. 2, pp. 120--139, Aug. 2003.
[38]
D. Abadi et al., "The design of the borealis stream processing engine," in CIDR, 2005.
[39]
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom, "STREAM: The stanford data stream management system." Springer, 2004.
[40]
"Streambase systems," http://www.streambase.com.
[41]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah, "TelegraphCQ: Continuous Dataflow Processing," in SIGMOD, 2003.
[42]
W. Thies, M. Karczmarek, and S. P. Amarasinghe, "StreamIt: A Language for Streaming Applications," in CC, 2002.
[43]
L. Hu, K. Schwan, H. Amur, and X. Chen, "Elf: efficient lightweight fast stream processing at scale," in 2014 USENIX Annual Technical Conference. USENIX Association, Jun. 2014.
[44]
B. Babcock, S. Babu, M. Datar, R. Motwani, and D. Thomas, "Operator Scheduling in Data Stream Systems," The VLDB Journal, 2004.
[45]
L. Amini, N. Jain, A. Sehgal, J. Silber, and O. Verscheure, "Adaptive control of extreme-scale stream processing systems," in Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, ser. ICDCS '06. IEEE Computer Society, 2006.
[46]
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu, "Elastic scaling of data parallel operators in stream processing," in Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, 2009.
[47]
Y. Xing, S. Zdonik, and J.-H. Hwang, "Dynamic load distribution in the borealis stream processor," in Proceedings of the 21st International Conference on Data Engineering, ser. ICDE '05. IEEE Computer Society, 2005.
[48]
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu, "Elastic scaling of data parallel operators in stream processing," in IPDPS, 2009.
[49]
B. Babcock, M. Datar, and R. Motwani, "Load shedding for aggregation queries over data streams," in Proceedings of the 20th International Conference on Data Engineering, ser. ICDE '04. IEEE Computer Society, 2004.
[50]
N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker, "Load shedding in a data stream manager," in Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29, ser. VLDB '03. VLDB Endowment, 2003.
[51]
S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, and R. Samtaney, "Grid-Based parallel data streaming implemented for the gyrokinetic toroidal code," in SC 2003. IEEE Computer Society, 2003.
[52]
C. Docan, M. Parashar, and S. Klasky, "Dataspaces: an interaction and coordination framework for coupled simulation workflows," in HPDC. ACM, 2010.
[53]
Y. Zhang, Q. Liu, S. Klasky, M. Wolf, K. Schwan, G. Eisenhauer, J. Choi, and N. Podhorszki, "Active workflow system for near real-time extreme-scale science," ser. PPAA '14. ACM, 2014.
[54]
J. Deslippe, A. Essiari, S. J. Patton, T. Samak, C. E. Tull, A. Hexemer, D. Kumar, D. Parkinson, and P. Stewart, "Workflow management for real-time analysis of lightsource experiments," in Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, ser. WORKS '14. IEEE Press, 2014.
[55]
S. Gesing, M. Atkinson, R. Filgueira, I. Taylor, A. Jones, V. Stankovski, C. S. Liew, A. Spinuso, G. Terstyanszky, and P. Kacsuk, "Workflows in a dashboard: A new generation of usability," in Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, ser. WORKS '14. IEEE Press, 2014.
[56]
I. Foster and C. Kesselman, Eds., The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers Inc., 1999.
[57]
I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, "Kepler: an extensible system for design and execution of scientific workflows," in Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, 2004.
[58]
P. Couvares, T. Kosar, A. Roy, J. Weber, and K. Wenger, Workflow Management in Condor. Scientific Workflows for Grids, 2007.
[59]
J. Blythe, E. Deelman, and Y. Gil, "Planning for workflow construction and maintenance on the grid," 2003.
[60]
T. Tannenbaum, D. Wright, K. Miller, and M. Livny, "Beowulf cluster computing with linux." MIT Press, 2002, ch. Condor: A Distributed Job Scheduler.
[61]
J. Yu and R. Buyya, "A novel architecture for realizing grid workflow using tuple spaces," Grid Computing, IEEE/ACM International Workshop on, 2004.
[62]
G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling, "Data sharing options for scientific workflows on amazon ec2," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1--9. {Online}. Available: http://dx.doi.org/10.1109/SC.2010.17
[63]
R. Agarwal, G. Juve, and E. Deelman, "Peer-to-peer data sharing for scientific workflows on amazon ec2," in High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, Nov 2012, pp. 82--89.
[64]
R. Tudoran, A. Costan, R. Rad, G. Brasche, and G. Antoniu, "Adaptive file management for scientific workflows on the azure cloud," in Big Data, 2013 IEEE International Conference on, Oct 2013, pp. 273--281.
[65]
W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster, "The globus striped gridftp framework and server," in SC'05, Seattle, Washington, USA, 2005.
[66]
Globus online, "Reliable, high-performance, secure file transfer," https://www.globusonline.org, 2013.
[67]
H. Abbasi, G. Eisenhauer, M. Wolf, K. Schwan, and S. Klasky, "Just in time: Adding value to the io pipelings of high performance applications with jitstaging," in HPDC'11, San Jose, California, 2011.

Cited By

View all
  • (2016)Picky: Efficient and Reproducible Sharing of Large Datasets Using Merkle-Trees2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2016.25(30-38)Online publication date: Sep-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WORKS '15: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science
November 2015
98 pages
ISBN:9781450339896
DOI:10.1145/2822332
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

WORKS '15 Paper Acceptance Rate 9 of 13 submissions, 69%;
Overall Acceptance Rate 30 of 54 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Picky: Efficient and Reproducible Sharing of Large Datasets Using Merkle-Trees2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2016.25(30-38)Online publication date: Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media