research-article

Design and evaluation of a self-healing Kepler for scientific workflows

Authors:
Arjun Hary

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

,
Ali Akoglu

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

,
Youssif AlNashif

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

,
Salim Hariri

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

,
Darrel Jenerette

University of California, Riverside

University of California, Riverside
View Profile

HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed ComputingJune 2010Pages 340–343https://doi.org/10.1145/1851476.1851525

Published:21 June 2010Publication History

HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Pages 340–343

ABSTRACT

Kepler is a popular open source scientific workflow (SWF) as it simplifies the effort required to construct complex data flow models through a visual interface. As the complexity of the workflow applications that will run on heterogeneous distributed systems increases, fault management becomes a critical design issue for large scale scientific and engineering applications. Due to the long execution times of these applications, it is important that they are fault tolerant; i.e. the workflow application can recover gracefully from faults without the need to restart the application from the beginning. The current implementation of Kepler tool does not support fault tolerance or recovery mechanisms. In this paper, we extend the Kepler capabilities to support fault tolerant scientific workflow (FT-SWF) with a checkpoint mechanism where corrective measures are taken seamlessly in an autonomic manner whenever a fault is detected. To the best of our knowledge, this is the first approach on adding autonomic operations to Kepler. We have evaluated the FT-Kepler on a distributed application used by ecosystem researchers. We evaluated the performance of the workflow with hardware and software based fault scenarios in terms of execution time, recovery time, and the checkpoint mechanism overhead. The experimental evaluations indicate that the checkpoint mechanism adds negligible overhead to the total execution time of the workflow and as the fault rate increases, the number of checkpoints should be increased.

References

}}I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludaescher, S. Mock, Kepler: An Extensible System for Design and Execution of Scientific Workflows, In the 16th Intl. Conference on Scientific and Statistical Database Management(SSDBM), Santorini Island, Greece, June 2004. Google ScholarDigital Library
}}http://kepler-project.org/Google Scholar
}}Y. Jararweh, A. Hary, Y. B Al-Nashif, S. Hariri, A. Akoglu, D. Jenerette. "Accelerated Discovery through Integration of Kepler with Data Turbine for Ecosystem Research". AICCSA, May, 2009, Rabat, Morocco.Google Scholar
}}A. Duda. The effects of checkpointing on program execution time. Information Processing Letters, 16:221--229, june 1983.Google ScholarCross Ref
}}Salim Hariri, S., Lizhi Xue, Huoping Chen, Ming Zhang, Pavuluri, S., Soujanya Rao; "AUTONOMIA: an autonomic computing environment"; 2003. Conference Proceedings of the 2003, IEEE IPCCCGoogle Scholar
}}Jenerette, G. D., R. L. Scott, G. A. Barron-Gafford, and T. E. Huxman. 2009. Gross primary production variability associated with meteorology, physiology, leaf area, and water supply in contrasting woodland and grassland semiarid riparian ecosystems. Journal of Geophysical Research - Biogeosciences 114, G04010: doi:10.1029/2009JG001074.Google Scholar
}}J. Eker, J. W. Janneck, E. A. Lee, J. Liu, X. Liu, J. Ludvig, S. Neuendorffer, S. Sachs, and Y. Xiong. Taming Heterogeneity - The Ptolemy Approach. In Proceedings of the IEEE, volume 91(1), January 2003.Google Scholar
}}http://ptolemy.eecs.berkeley.edu/publications/papers/03/overview/overview03.pdfGoogle Scholar
}}https://code.kepler-project.org/code/kepler-docs/trunk/outreach/documentation/shipping/UserManual.pdfGoogle Scholar
}}Plankensteiner, K., Prodan, R., Fahringer, T., Kertesz, A., Kacsuk,.: Fault-tolerant behavior in state-of-the-art Grid Workflow Management Systems. TR-0091, Core-GRID, 2007.Google Scholar
}}S. Hwang and C. Kesselman, "Grid Workflow: A Flexible Failure Handling Framework for the Grid", in 12th IEEE International Symposium on High Performance Distributed Computing (HPDC'03), Seattle, Washington, USA, IEEE CS, Los Alamitos, CA, USA, June 22Y24, 2003. Google ScholarDigital Library

Index Terms

Design and evaluation of a self-healing Kepler for scientific workflows
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Approaches to Distributed Execution of Scientific Workflows in Kepler
Scalable Workflow Enactment Engines and Technology

The Kepler scientific workflow system enables creation, execution and sharing of workflows across a broad range of scientific and engineering disciplines while also facilitating remote and distributed execution of workflows. In this paper, we present ...
Read More
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems
WORKS '09: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science

MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop, support parallel processing on large datasets with capabilities ...
Read More
Securing Scientific Workflows
QRS-C '15: Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security - Companion

This paper investigates security of Kepler scientific workflow engine. We are especially interested in Kepler-based scientific workflows that may operate in cloud environments. We find that (1) three security properties (i.e., input validation, remote ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
General Chairs:
Salim Hariri
University of Arizona
,
Kate Keahey
University of Chicago
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Kepler
autonomic
fault tolerant
scientific workflow
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate166of966submissions,17%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 194
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Design and evaluation of a self-healing Kepler for scientific workflows

HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Approaches to Distributed Execution of Scientific Workflows in Kepler

Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Securing Scientific Workflows

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Design and evaluation of a self-healing Kepler for scientific workflows

HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Approaches to Distributed Execution of Scientific Workflows in Kepler

Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Securing Scientific Workflows

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media