Abstract
Scientific workflow systems support various workflow representations, operational modes, and configurations. Regardless of the system used, end users have common needs: to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting, and automate the analysis of the workflow results. In this paper, we describe how the Stampede monitoring infrastructure was integrated with the Pegasus Workflow Management System and the Triana Workflow Systems, in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that provides interoperable monitoring using a three-layer model: (1) a common data model to describe workflow and job executions; (2) high-performance tools to load workflow logs conforming to the data model into a data store; and (3) a common query interface. This paper describes the integration of Stampede monitoring architecture with Pegasus and Triana and shows the new analysis capabilities that Stampede provides to these workflow systems. The successful integration of Stampede with these workflow engines demonstrates the generic nature of the Stampede monitoring infrastructure and its potential to provide a common platform for monitoring across scientific workflow engines.
Similar content being viewed by others
References
Advanced Message Queuing Protocol. http://www.amqp.org. Accessed Apr 2012
Ali, A.S., Rana, O.F., Taylor, I.J.: Web services composition for distributed data mining. In: ICPP 2005 Workshops, International Conference Workshops on Parallel Processing, pp. 11–18. IEEE, New York (2005)
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 423–424. IEEE Computer Society, New York (2004)
Amazon Elastic Cloud. http://aws.amazon.com/ec2. Accessed Apr 2012
Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.: Business Process Execution Language for Web Services Version 1.1 (2003)
Barga, R., Jackson, J., Araujo, N., Guo, D., Gautam, N., Simmhan, Y.: The Trident scientific workflow workbench. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, pp. 317–318. IEEE Computer Society, Washington, DC (2008)
Benson, T., Conley, E.C., Harrison, A.B., Taylor, I.: Sintero server—simplifying interoperability for distributed collaborative health care. In: IHIC 2011 Conference, Orlando (2011)
Callaghan, S., Deelman, E., Gunter, D., Juve, G., Maechling, P., Brooks, C.X., Vahi, K., Milner, K., Graves, R., Field, E., Okaya, D., Jordan, T.: Scaling up workflow-based applications. J. Comput. Syst. Sci. 76(6), 428–446 (2010)
Callaghan, S., Maechling, P., Small, P., Milner, K., Juve, G., Jordan, T., Deelman, E., Mehta, G., Vahi, K., Gunter, D., Beattie, K., Brooks, C.X.: Metrics for heterogeneous scientific workflows: a case study of an earthquake science application. Int. J. High Perform. Comput. Appl. 25(3), 274–285 (2011)
Couvares, P., Kosar, T., Roy, A., Weber, J., Wenger, K.: Workflow in Condor. In: Taylor, I., Deelman, E., Gannon, D., Shields, M. (eds.) Workflows for e-Science. Springer Press (2007)
Data Mining Tools and Services for Grid Computing Environments (DataMiningGrid). http://www.datamininggrid.org/. Accessed Apr 2012
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3), 219–237 (2005)
Deelman, E., Callaghan, S., Field, E., Francoeur, H., Graves, R., Gupta, N., Gupta, V., Jordan, T.H., Kesselman, C., Maechling, P., Mehringer, J., Mehta, G., Okaya, D., Vahi, K., Zhao, L.: Managing large-scale workflow execution from resource provisioning to provenance tracking: the cybershake example. In: Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, E-SCIENCE ’06. IEEE Computer Society, Washington, DC (2006)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25, 528–540 (2009)
Emmerich, W., Butchart, B., Chen, L., Wassermann, B., Price, S.L.: Grid service orchestration using the Business Process Execution Language (BPEL). J. Grid Computing 3(3), 283–304 (2005)
Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: ASKALON: a Grid application development and computing environment. In: 6th International Workshop on Grid Computing, pp. 122–131. IEEE Computer Society Press, New York (2005)
Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)
Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPCD-’01). IEEE Computer Society, New York (2001)
Glatard, T., Montagnat, J., Lingrand, D., Pennec, X.: Flexible and efficient workflow deployment of data-intensive applications on Grids with moteur. Int. J. High Perform. Comput. Appl. 22(3), 347–360 (2008)
Gunter, D., Tierney, B.: Netlogger: a toolkit for distributed system performance tuning and debugging. In: Integrated Network Management, IFIP/IEEE Eighth International Symposium on Integrated Network Management (IM 2003). IFIP Conference Proceedings, vol. 246, pp. 97–100. Kluwer (2003)
Gunter, D., Deelman, E., Samak, T., Brooks, C.X., Goode, M., Juve, G., Mehta, G., Moraes, P., Silva, F., Martin Swany, D., Vahi, K: Online workflow management and performance analysis with Stampede. In: CNSM, pp. 1–10. IEEE (2011)
Harrison, A., Taylor, I., Wang, I., Shields, M.: WS-RF workflow in Triana. Int. J. High Perform. Comput. Appl. 22(3), 268–283 (2008)
Huang, J., Kini, A., Paulson, E., Reilly, C., Robinson, E., Shankar, S., Shrinivas, L., DeWitt, D., Naughton, J.: An overview of Quill: a passive operational data logging system for Condor. Computer Sciences Technical Report, University of Wisconsin (2007)
Kacsuk, P.: P-grade portal family for Grid infrastructures. Concurr. Comput.: Pract. Exper. 23, 235–245 (2011)
Katz, D.S., Jacob, J.C., Bruce Berriman, G., Good, J., Laity, A.C., Deelman, E., Kesselman, C., Singh, G., Su, M.-H., Prince, T.A.: A comparison of two methods for building astronomical image mosaics on a Grid. In: ICPP Workshops, pp. 85–94. IEEE Computer Society (2005)
Maechling, P., Deelman, E., Zhao, L., Graves, R., Mehta, G., Gupta, N., Mehringer, J., Kesselman, C., Callaghan, S., Okaya, D., Francoeur, H., Gupta, V., Cui, Y., Vahi, K., Jordan, T., Field, E.: SCEC cybershake workflows—automating probabilistic seismic hazard analysis calculations. In: Taylor, I., Deelman, E., Gannon, D., Shield, M. (eds.) Worflows for e-Sciences. Springer (2006)
Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., Moreau, L.: Connecting Scientific Data to Scientific Experiments with Provenance (2007)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Oracle Technology Network: Oracle BPEL resources. See web site at http://www.oracle.com/technology/products/ias/bpel/. Accessed Apr 2012
Ostermann, S., Plankensteiner, K., Prodan, R., Fahringer, T., Iosup, A.: Workflow monitoring and analysis tool for askalon. In: Yahyapour, R., Talia, D., Meyer, N. (eds.) CoreGRID Workshop on Grid Middleware, pp. 1–14 (2008)
PREservation Metadata Implementation Strategies (PREMIS). http://www.loc.gov/standards/premis/v2/premis-2-0.pdf. Accessed Apr 2012
PYANG—An extensible YANG validator and converter in python. http://www.yang-central.org/twiki/pub/Main/YangTools/pyang.1.html. Accessed Apr 2012
R project. http://cran.r-project.org/. Accessed Apr 2012
RabbitMQ. http://www.rabbitmq.com. Accessed Apr 2012
Riposan, A., Taylor, I.J., Rana, O., Owens, D.R., Conley, E.C.: TRIACS workflows platform for distributed decision support processes. In: CBMS 2009, Albuquerque (2009)
RPy2. http://rpy.sourceforge.net/. Accessed Apr 2012
Samak, T., Gunter, D., Goode, M., Deelman, E., Juve, G., Mehta, G., Silva, F., Vahi, K.: Online fault and anomaly detection for large-scale scientific workflows. In: Thulasiraman, P., Yang, L.T., Pan, Q., Liu, X., Chen, Y.-C., Huang, Y.-P., L.h. Chang, Hung, C.-L., Lee, C.-R., Shi, J.Y., Zhang, Y. (eds.) HPCC, pp. 373–381. IEEE (2011)
Samak, T., Gunter, D., Goode, M., Deelman, E., Mehta, G., Silva, F., Vahi, K.: Failure prediction and localization in large scientific workflows. In: The Sixth Workshop on Workflows in Support of Large-Scale Science (WORKS11), Seattle, WA, USA (2011)
Semantic Web Applications in Neuromedicine (SWAN). http://swan.mindinformatics.org/spec/1.2/pav.html. Accessed Feb 2010
Singh, G., Kesselman, C., Deelman, E.: Optimizing Grid-based workflow execution. J. Grid Computing 3(3–4), 201–219 (2005)
SQLAlchemy. http://www.sqlalchemy.org. Accessed Apr 2012
Taylor, I.: Triana generations. In: Scientific Workflows and Business Workflow Standards in e-Science in Conjunction with Second IEEE International Conference on e-Science, Amsterdam, Netherlands, 2–4 December 2006
Taylor, I., Al-Shakarchi, E., Beck, S.D.: Distributed Audio Retrieval using Triana (DART). In: International Computer Music Conference (ICMC) 2006 at Tulane University, USA, 6–11 November, pp. 716–722 (2006)
The EU Wf4Ever Project. http://www.wf4ever-project.org/. Accessed May 2012
The Open Provenance Model (OPM). http://openprovenance.org/. Accessed May 2012
The Open Provenance Model Vocabulary Specification (OPM-V). http://open-biomed.sourceforge.net/opmv/ns.html. Accessed May 2012
The SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs project. http://www.shiwa-workflow.eu/. Accessed May 2012
Tierney, B., Gunter, D.: NetLogger: a toolkit for distributed system performance, tuning and debugging. In: Proceedings of the IFIP/IEEE Eighth International Symposium on Integrated Network Management (IM 2003). IFIP Conference Proceedings, vol. 246, pp. 97–100. Kluwer (2003)
Tierney, B., Gunter, D., Pearlman, L.: Grid Logging: Best Practices Guide (2008)
Vahi, K., Harvey, I., Samak, T., Gunter, D., Evans, K., Rogers, D., Taylor, I., Goode, M., Silva, F., Al-Shakarchi, E., Mehta, G., Jones, A., Deelman, E.: A general approach to real-time workflow monitoring. In: The Seventh Workshop on Workflows in Support of Large-Scale Science (WORKS12). IEEE/ACM (2012)
Voeckler, J.S., Mehta, G., Zhao, Y., Deelman, E., Wilde, M.: Kickstarting remote applications. In: 2nd International Workshop on Grid Computing Environments (2006)
YANG—a data modeling language for the network configuration protocol. http://tools.ietf.org/html/rfc6020. Accessed May 2012
Yang Schema for Stampede Netlogger Formatted Log Messages. http://acs.lbl.gov/projects/stampede/4.0/stampede-schema.html. Accessed May 2012
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vahi, K., Harvey, I., Samak, T. et al. A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows. J Grid Computing 11, 381–406 (2013). https://doi.org/10.1007/s10723-013-9265-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-013-9265-4