Skip to main content
Log in

A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Scientific workflow systems support various workflow representations, operational modes, and configurations. Regardless of the system used, end users have common needs: to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting, and automate the analysis of the workflow results. In this paper, we describe how the Stampede monitoring infrastructure was integrated with the Pegasus Workflow Management System and the Triana Workflow Systems, in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that provides interoperable monitoring using a three-layer model: (1) a common data model to describe workflow and job executions; (2) high-performance tools to load workflow logs conforming to the data model into a data store; and (3) a common query interface. This paper describes the integration of Stampede monitoring architecture with Pegasus and Triana and shows the new analysis capabilities that Stampede provides to these workflow systems. The successful integration of Stampede with these workflow engines demonstrates the generic nature of the Stampede monitoring infrastructure and its potential to provide a common platform for monitoring across scientific workflow engines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Advanced Message Queuing Protocol. http://www.amqp.org. Accessed Apr 2012

  2. Ali, A.S., Rana, O.F., Taylor, I.J.: Web services composition for distributed data mining. In: ICPP 2005 Workshops, International Conference Workshops on Parallel Processing, pp. 11–18. IEEE, New York (2005)

    Chapter  Google Scholar 

  3. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 423–424. IEEE Computer Society, New York (2004)

    Google Scholar 

  4. Amazon Elastic Cloud. http://aws.amazon.com/ec2. Accessed Apr 2012

  5. Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.: Business Process Execution Language for Web Services Version 1.1 (2003)

  6. Barga, R., Jackson, J., Araujo, N., Guo, D., Gautam, N., Simmhan, Y.: The Trident scientific workflow workbench. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, pp. 317–318. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  7. Benson, T., Conley, E.C., Harrison, A.B., Taylor, I.: Sintero server—simplifying interoperability for distributed collaborative health care. In: IHIC 2011 Conference, Orlando (2011)

  8. Callaghan, S., Deelman, E., Gunter, D., Juve, G., Maechling, P., Brooks, C.X., Vahi, K., Milner, K., Graves, R., Field, E., Okaya, D., Jordan, T.: Scaling up workflow-based applications. J. Comput. Syst. Sci. 76(6), 428–446 (2010)

    Article  Google Scholar 

  9. Callaghan, S., Maechling, P., Small, P., Milner, K., Juve, G., Jordan, T., Deelman, E., Mehta, G., Vahi, K., Gunter, D., Beattie, K., Brooks, C.X.: Metrics for heterogeneous scientific workflows: a case study of an earthquake science application. Int. J. High Perform. Comput. Appl. 25(3), 274–285 (2011)

    Article  Google Scholar 

  10. Couvares, P., Kosar, T., Roy, A., Weber, J., Wenger, K.: Workflow in Condor. In: Taylor, I., Deelman, E., Gannon, D., Shields, M. (eds.) Workflows for e-Science. Springer Press (2007)

  11. Data Mining Tools and Services for Grid Computing Environments (DataMiningGrid). http://www.datamininggrid.org/. Accessed Apr 2012

  12. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3), 219–237 (2005)

    Google Scholar 

  13. Deelman, E., Callaghan, S., Field, E., Francoeur, H., Graves, R., Gupta, N., Gupta, V., Jordan, T.H., Kesselman, C., Maechling, P., Mehringer, J., Mehta, G., Okaya, D., Vahi, K., Zhao, L.: Managing large-scale workflow execution from resource provisioning to provenance tracking: the cybershake example. In: Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, E-SCIENCE ’06. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  14. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25, 528–540 (2009)

    Article  Google Scholar 

  15. Emmerich, W., Butchart, B., Chen, L., Wassermann, B., Price, S.L.: Grid service orchestration using the Business Process Execution Language (BPEL). J. Grid Computing 3(3), 283–304 (2005)

    Article  Google Scholar 

  16. Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: ASKALON: a Grid application development and computing environment. In: 6th International Workshop on Grid Computing, pp. 122–131. IEEE Computer Society Press, New York (2005)

    Google Scholar 

  17. Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)

    Article  Google Scholar 

  18. Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPCD-’01). IEEE Computer Society, New York (2001)

    Google Scholar 

  19. Glatard, T., Montagnat, J., Lingrand, D., Pennec, X.: Flexible and efficient workflow deployment of data-intensive applications on Grids with moteur. Int. J. High Perform. Comput. Appl. 22(3), 347–360 (2008)

    Article  Google Scholar 

  20. Gunter, D., Tierney, B.: Netlogger: a toolkit for distributed system performance tuning and debugging. In: Integrated Network Management, IFIP/IEEE Eighth International Symposium on Integrated Network Management (IM 2003). IFIP Conference Proceedings, vol. 246, pp. 97–100. Kluwer (2003)

  21. Gunter, D., Deelman, E., Samak, T., Brooks, C.X., Goode, M., Juve, G., Mehta, G., Moraes, P., Silva, F., Martin Swany, D., Vahi, K: Online workflow management and performance analysis with Stampede. In: CNSM, pp. 1–10. IEEE (2011)

  22. Harrison, A., Taylor, I., Wang, I., Shields, M.: WS-RF workflow in Triana. Int. J. High Perform. Comput. Appl. 22(3), 268–283 (2008)

    Article  Google Scholar 

  23. Huang, J., Kini, A., Paulson, E., Reilly, C., Robinson, E., Shankar, S., Shrinivas, L., DeWitt, D., Naughton, J.: An overview of Quill: a passive operational data logging system for Condor. Computer Sciences Technical Report, University of Wisconsin (2007)

  24. Kacsuk, P.: P-grade portal family for Grid infrastructures. Concurr. Comput.: Pract. Exper. 23, 235–245 (2011)

    Article  Google Scholar 

  25. Katz, D.S., Jacob, J.C., Bruce Berriman, G., Good, J., Laity, A.C., Deelman, E., Kesselman, C., Singh, G., Su, M.-H., Prince, T.A.: A comparison of two methods for building astronomical image mosaics on a Grid. In: ICPP Workshops, pp. 85–94. IEEE Computer Society (2005)

  26. Maechling, P., Deelman, E., Zhao, L., Graves, R., Mehta, G., Gupta, N., Mehringer, J., Kesselman, C., Callaghan, S., Okaya, D., Francoeur, H., Gupta, V., Cui, Y., Vahi, K., Jordan, T., Field, E.: SCEC cybershake workflows—automating probabilistic seismic hazard analysis calculations. In: Taylor, I., Deelman, E., Gannon, D., Shield, M. (eds.) Worflows for e-Sciences. Springer (2006)

  27. Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., Moreau, L.: Connecting Scientific Data to Scientific Experiments with Provenance (2007)

  28. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  29. Oracle Technology Network: Oracle BPEL resources. See web site at http://www.oracle.com/technology/products/ias/bpel/. Accessed Apr 2012

  30. Ostermann, S., Plankensteiner, K., Prodan, R., Fahringer, T., Iosup, A.: Workflow monitoring and analysis tool for askalon. In: Yahyapour, R., Talia, D., Meyer, N. (eds.) CoreGRID Workshop on Grid Middleware, pp. 1–14 (2008)

  31. PREservation Metadata Implementation Strategies (PREMIS). http://www.loc.gov/standards/premis/v2/premis-2-0.pdf. Accessed Apr 2012

  32. PYANG—An extensible YANG validator and converter in python. http://www.yang-central.org/twiki/pub/Main/YangTools/pyang.1.html. Accessed Apr 2012

  33. R project. http://cran.r-project.org/. Accessed Apr 2012

  34. RabbitMQ. http://www.rabbitmq.com. Accessed Apr 2012

  35. Riposan, A., Taylor, I.J., Rana, O., Owens, D.R., Conley, E.C.: TRIACS workflows platform for distributed decision support processes. In: CBMS 2009, Albuquerque (2009)

  36. RPy2. http://rpy.sourceforge.net/. Accessed Apr 2012

  37. Samak, T., Gunter, D., Goode, M., Deelman, E., Juve, G., Mehta, G., Silva, F., Vahi, K.: Online fault and anomaly detection for large-scale scientific workflows. In: Thulasiraman, P., Yang, L.T., Pan, Q., Liu, X., Chen, Y.-C., Huang, Y.-P., L.h. Chang, Hung, C.-L., Lee, C.-R., Shi, J.Y., Zhang, Y. (eds.) HPCC, pp. 373–381. IEEE (2011)

  38. Samak, T., Gunter, D., Goode, M., Deelman, E., Mehta, G., Silva, F., Vahi, K.: Failure prediction and localization in large scientific workflows. In: The Sixth Workshop on Workflows in Support of Large-Scale Science (WORKS11), Seattle, WA, USA (2011)

  39. Semantic Web Applications in Neuromedicine (SWAN). http://swan.mindinformatics.org/spec/1.2/pav.html. Accessed Feb 2010

  40. Singh, G., Kesselman, C., Deelman, E.: Optimizing Grid-based workflow execution. J. Grid Computing 3(3–4), 201–219 (2005)

    Article  Google Scholar 

  41. SQLAlchemy. http://www.sqlalchemy.org. Accessed Apr 2012

  42. Taylor, I.: Triana generations. In: Scientific Workflows and Business Workflow Standards in e-Science in Conjunction with Second IEEE International Conference on e-Science, Amsterdam, Netherlands, 2–4 December 2006

  43. Taylor, I., Al-Shakarchi, E., Beck, S.D.: Distributed Audio Retrieval using Triana (DART). In: International Computer Music Conference (ICMC) 2006 at Tulane University, USA, 6–11 November, pp. 716–722 (2006)

  44. The EU Wf4Ever Project. http://www.wf4ever-project.org/. Accessed May 2012

  45. The Open Provenance Model (OPM). http://openprovenance.org/. Accessed May 2012

  46. The Open Provenance Model Vocabulary Specification (OPM-V). http://open-biomed.sourceforge.net/opmv/ns.html. Accessed May 2012

  47. The SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs project. http://www.shiwa-workflow.eu/. Accessed May 2012

  48. Tierney, B., Gunter, D.: NetLogger: a toolkit for distributed system performance, tuning and debugging. In: Proceedings of the IFIP/IEEE Eighth International Symposium on Integrated Network Management (IM 2003). IFIP Conference Proceedings, vol. 246, pp. 97–100. Kluwer (2003)

  49. Tierney, B., Gunter, D., Pearlman, L.: Grid Logging: Best Practices Guide (2008)

  50. Vahi, K., Harvey, I., Samak, T., Gunter, D., Evans, K., Rogers, D., Taylor, I., Goode, M., Silva, F., Al-Shakarchi, E., Mehta, G., Jones, A., Deelman, E.: A general approach to real-time workflow monitoring. In: The Seventh Workshop on Workflows in Support of Large-Scale Science (WORKS12). IEEE/ACM (2012)

  51. Voeckler, J.S., Mehta, G., Zhao, Y., Deelman, E., Wilde, M.: Kickstarting remote applications. In: 2nd International Workshop on Grid Computing Environments (2006)

  52. YANG—a data modeling language for the network configuration protocol. http://tools.ietf.org/html/rfc6020. Accessed May 2012

  53. Yang Schema for Stampede Netlogger Formatted Log Messages. http://acs.lbl.gov/projects/stampede/4.0/stampede-schema.html. Accessed May 2012

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karan Vahi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vahi, K., Harvey, I., Samak, T. et al. A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows. J Grid Computing 11, 381–406 (2013). https://doi.org/10.1007/s10723-013-9265-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-013-9265-4

Keywords

Navigation