skip to main content
article

XML database support for distributed execution of data-intensive scientific workflows

Published:01 September 2005Publication History
Skip Abstract Section

Abstract

In this paper we look at the application of XML data management support in scientific data analysis workflows. We describe a software infrastructure that aims to address issues associated with metadata management, data storage and management, and execution of data analysis workflows on distributed storage and compute platforms. This system couples a distributed, filter-stream based dataflow engine with a distributed XML-based data and metadata management system. We present experimental results from a biomedical image analysis use case that involves processing of digitized microscopy images for feature segmentation.

References

  1. M. Aeschlimann, P. Dinda, J. Lopez, B. Lowekamp, L. Kallivokas, and D. O'Hallaron. Preliminary report on the design of a framework for distributed visualization. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'99), pages 1833--1839, Las Vegas, NV, June 1999.Google ScholarGoogle Scholar
  2. M. D. Beynon, T. Kurc, U. Catalyurek, C. Chang, A. Sussman, and J. Saltz. Distributed processing of very large datasets with DataCutter. Parallel Computing, 27(11):1457--1478, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R. Cavanaugh, and S. Koranda. Mapping abstract complex workflows onto grid environments. Journal of Grid Computing, 1(1), 2003.Google ScholarGoogle ScholarCross RefCross Ref
  4. I. Foster, J. Voeckler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th Conference on Scientific and Statistical Database Management, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke. Condor-G: A computation management agent for multi-institutional grids. In Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10). IEEE Press, Aug 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Hastings, Distributed architectures: A java-based process management system. Master's thesis, Computer Science Department, Rensselear Polytechnic Institute, 2002.Google ScholarGoogle Scholar
  7. S. Hastings, S. Langella, S. Oster, and J. Saltz. Distributed data management and integration: The mobius project. In GGF Semantic Grid Workshop 2004, pages 20--38. GGF, June 2004.Google ScholarGoogle Scholar
  8. C. Isert and K. Schwan. ACDS: Adapting computational data streams for high performance. In 14th International Parallel & Distributed Processing Symposium (IPDPS 2000), pages 641--646, Cancun, Mexico, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Langella, S. Hastings, S. Oster, T. Kurc, U. Catalyurek, and J. Saltz. A distributed data management middleware for data-driven application systems. In Proceedings of 2004 IEEE International Conference on Cluster Computing, September 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows, to appear, 2005.Google ScholarGoogle Scholar
  11. L. Moreau, Y. Zhao, I. Foster, J. Voeckler, and M. Wilde. XDTM: the XML Dataset Typing and Mapping for Specifying Datasets. In Proceedings of the 2005 European Grid Conference (EGC'05), Amsterdam, Netherlands, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Thain, J. Bent, A. Arpaci-Dusseau, R. Arpaci-Dusseau, and M. Livny. Pipeline and batch sharing in grid workloads. In Proceedings of High-Performance Distributed Computing (HPDC-12), pages 152--161, Seattle, Washington, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XML database support for distributed execution of data-intensive scientific workflows

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGMOD Record
            ACM SIGMOD Record  Volume 34, Issue 3
            September 2005
            115 pages
            ISSN:0163-5808
            DOI:10.1145/1084805
            Issue’s Table of Contents

            Copyright © 2005 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 September 2005

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader