Skip to main content

Enabling Ad Hoc Queries over Low-Level Scientific Data Sets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

Abstract

Technological success has ushered in massive amounts of data for scientific analysis. To enable effective utilization of these data sets for all classes of users, supporting intuitive data access and manipulation interfaces is crucial. This paper describes an autonomous scientific workflow system that enables high-level, natural language based, queries over low-level data sets. Our technique involves a combination of natural language processing, metadata indexing, and a semantically-aware workflow composition engine which dynamically constructs workflows for answering queries based on service and data availability. A specific contribution of this work is a metadata registration scheme that allows for a unified index of heterogeneous metadata formats and service annotations. Our approach thus avoids a standardized format for storing all data sets or the implementation of a federated, mediator-based, querying framework. We have evaluated our system using a case study from the geospatial domain to show functional results. Our evaluation supports the potential benefits which our approach can offer to scientific workflow systems and other domain-specific, data intensive applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludscher, B., Mock, S.: Kepler: An extensible system for design and execution of scientific workflows (2004)

    Google Scholar 

  2. Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases–an introduction. Journal of Language Engineering 1(1), 29–81 (1995)

    Google Scholar 

  3. ANZLIC. Anzmeta xml document type definition (dtd) for geospatial metadata in australasia (2001)

    Google Scholar 

  4. Baru, C., Moore, R., Rajasekar, A., Wan, M.: The sdsc storage resource broker. In: CASCON 1998: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research, p. 5. IBM Press (1998)

    Google Scholar 

  5. Berglund, A., Boag, S., Chamberlin, D., Fernndez, M.F., Kay, M., Robie, J., Simon, J.: Xml path language (xpath) 2.0. w3c recommendation (January 23, 2007), http://www.w3.org/tr/xpath20

  6. Chinnici, R., Moreau, J.-J., Ryman, A., Weerawarana, S.: Web services description language (wsdl) 2.0. w3c recommendation (June 26, 2007), http://www.w3.org/tr/wsdl20/

  7. Chiu, D., Deshpande, S., Agrawal, G., Li, R.: Composing geoinformatics workflows with user preferences. In: GIS 2008: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems. ACM, New York (2008)

    Google Scholar 

  8. Chiu, D., Deshpande, S., Agrawal, G., Li, R.: Cost and accuracy sensitive dynamic workflow composition over grid environments. In: Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008) (2008)

    Google Scholar 

  9. Dean, M., Schreiber, G.: Owl web ontology language reference. w3c recommendation (2004)

    Google Scholar 

  10. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A.C., Jacob, J.C., Katz, D.S.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13(3), 219–237 (2005)

    Article  Google Scholar 

  11. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  12. FGDC. Metadata ad hoc working group. content standard for digital geospatial metadata (1998)

    Google Scholar 

  13. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration (2002)

    Google Scholar 

  14. Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: Integrating and Accessing Heterogenous Information Sources in TSIMMIS. In: Proceedings of the AAAI Symposium on Information Gathering (1995)

    Google Scholar 

  15. Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for pegasus: Creating large-scale scientific applications using semantic representations of computational workflows. In: Proceedings of the 19th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), Vancouver, British Columbia, Canada, July 22-26 (2007)

    Google Scholar 

  16. Jensen, C.S., Lin, D., Ooi, B.C.: Query and update efficient b+tree-based indexing of moving objects. In: Proceedings of Very Large Databases (VLDB), pp. 768–779 (2004)

    Google Scholar 

  17. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)

    Google Scholar 

  18. Li, Y., Yang, H., Jagadish, H.V.: Nalix: an interactive natural language interface for querying xml. In: SIGMOD Conference, pp. 900–902 (2005)

    Google Scholar 

  19. Majithia, S., Shields, M.S., Taylor, I.J., Wang, I.: Triana: A Graphical Web Service Composition and Execution Toolkit. In: Proceedings of the IEEE International Conference on Web Services (ICWS 2004), pp. 514–524. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  20. Manola, F., Miller, E.: Resource description framework (rdf) primer. w3c recommendation (2004)

    Google Scholar 

  21. No, J., Thakur, R., Choudhary, A.: Integrating parallel file i/o and database support for high-performance scientific data management. In: Supercomputing 2000: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), Washington, DC, USA, p. 57. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  22. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  23. Ponnekanti, S.R., Fox, A.: Sword: A developer toolkit for web service composition. In: WWW 2002: Proceedings of the 11th international conference on World Wide Web (2002)

    Google Scholar 

  24. Sheth, A., Larson, J.: Federated Database Systems for Managing Distributed, Heterogeneous and Autonomous Databases. ACM Computing Surveys 22(3), 183–236 (1990)

    Article  Google Scholar 

  25. Shklar, L., Sheth, A., Kashyap, V., Shah, K.: InfoHarness: Use of Automatically Generated Metadata for Search and Retrieval of Heterogeneous Information. In: Iivari, J., Rossi, M., Lyytinen, K. (eds.) CAiSE 1995. LNCS, vol. 932. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  26. Singh, G., Bharathi, S., Chervenak, A., Deelman, E., Kesselman, C., Manohar, M., Patil, S., Pearlman, L.: A metadata catalog service for data intensive applications. In: SC 2003: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 33. IEEE Computer Society, Washington (2003)

    Google Scholar 

  27. Traverso, P., Pistore, M.: Automated composition of semantic web services into executable processes. In: 3rd International Semantic Web Conference (2004)

    Google Scholar 

  28. Tuchinda, R., Thakkar, S., Gil, A., Deelman, E.: Artemis: Integrating scientific data on the grid. In: Proceedings of the 16th Conference on Innovative Applications of Artificial Intelligence (IAAI), pp. 25–29 (2004)

    Google Scholar 

  29. Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the askalon grid environment. SIGMOD Rec. 34(3), 56–62 (2005)

    Article  Google Scholar 

  30. Web services business process execution language (wsbpel) 2.0, oasis standard

    Google Scholar 

  31. Wu, D., Sirin, E., Hendler, J., Nau, D., Parsia, B.: Automatic web services composition using shop2. In: ICAPS 2003: International Conference on Automated Planning and Scheduling (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chiu, D., Agrawal, G. (2009). Enabling Ad Hoc Queries over Low-Level Scientific Data Sets. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02279-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02278-4

  • Online ISBN: 978-3-642-02279-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics