Skip to main content

Supporting SQL-3 Aggregations on Grid-Based Data Repositories

  • Conference paper
Languages and Compilers for High Performance Computing (LCPC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3602))

  • 928 Accesses

Abstract

There is an increasing trends towards distributed and shared repositories for storing scientific datasets. Developing applications that retrieve and process data from such repositories involves a number of challenges. First, these data repositories store data in complex, low-level layouts, which should be abstracted from application developers. Second, as data repositories are shared resources, part of the computations on the data must be performed at a different set of machines than the ones hosting the data. Third, because of the volume of data and the amount of computations involved, parallel configurations need to be used for both hosting the data and the processing on the retrieved data.

In this paper, we describe a system for executing SQL-3 queries over scientific data stored as flat-files. A relational table-based virtual view is supported on these flat-file datasets. The class of queries we consider involve data retrieval using Select and Where clauses, and processing with user-defined aggregate functions and group-bys. We use a middleware system STORM for providing much of the low-level functionality. Our compiler analyzes the SQL-3 queries and generates many of the functions required by this middleware. Our experimental results show good scalability with respect to the number of nodes as well as the dataset size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system rasdaman. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 575–577. ACM Press, New York (1998)

    Chapter  Google Scholar 

  2. Baumann, P., Furtado, P., Ritsch, R.: Geo/environmental and medical data management in the RasDaMan system. In: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), August 1997, pp. 548–552 (1997)

    Google Scholar 

  3. Chang, C., Ferreira, R., Sussman, A., Saltz, J.: Infrastructure for building parallel database systems for multi-dimensional data. In: Proceedings of the Second Merged IPPS/SPDP Symposiums, April 1999. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  4. Ferreira, R., Agrawal, G., Saltz, J.: Compiling object-oriented data intensive computations. In: Proceedings of the 2000 International Conference on Supercomputing (May 2000)

    Google Scholar 

  5. Ferreira, R., Agrawal, G., Saltz, J.: Compiler supported high-level abstractions for sparse disk-resident datasets. In: Proceedings of the International Conference on Supercomputing (ICS) (June 2002)

    Google Scholar 

  6. Kurc, T., Lee, F., Agrawal, G., Catalyurek, U., Ferreira, R., Saltz, J.: Optimizing Reduction Computations in a Distributed Environment. In: Proceedings of SC 2003 (November 2003)

    Google Scholar 

  7. Narayanan, S., Catalyurek, U., Kurc, T., Zhang, X., Saltz, J.: Applying database support for large scale data driven science in distributed environments. In: Proceedings of the Fourth International Workshop on Grid Computing (Grid 2003), Phoenix, Arizona, November 2003, pp. 141–148 (2003)

    Google Scholar 

  8. Narayanan, S., Kurc, T., Catalyurek, U., Saltz, J.: Database support for data-driven scientific applications in the grid. Parallel Processing Letters 13(2), 245–271 (2003)

    Article  MathSciNet  Google Scholar 

  9. Saltz, J., Catalyurek, U., Kurc, T., Gray, M., Hastings, S., Langella, S., Narayanan, S., Martino, R., Bryant, S., Peszynska, M., Wheeler, M., Sussman, A., Beynon, M., Hansen, C., Stredney, D., Sessanna, D.: Driving scientific applications by data in distributed environments. In: Dynamic Data Driven Application Systems Workshop, held jointly with ICCS 2003, Melbourne, Australia (June 2003)

    Google Scholar 

  10. Sarawagi, S., Stonebraker, M.: Efficient organizations of large multidimensional arrays. In: Proceedings of the Tenth International Conference on Data Engineering (February 1994)

    Google Scholar 

  11. Shatdal, A.: Architectural considerations for parallel query evaluation algorithms. Technical Report CS-TR-1996-1321, University of Wisconsin (1999)

    Google Scholar 

  12. Shatdal, A., Naughton, J.F.: Adaptive parallel aggregation algorithms. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), San Jose, CA May 1995, pp. 104–114 (1995)

    Google Scholar 

  13. Stolte, C., Tang, D., Hanrahan, P.: Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics 8(1), 52–65 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Weng, L., Agrawal, G., Catalyurek, U., Saltz, J. (2005). Supporting SQL-3 Aggregations on Grid-Based Data Repositories. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_21

Download citation

  • DOI: https://doi.org/10.1007/11532378_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28009-5

  • Online ISBN: 978-3-540-31813-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics