Skip to main content
Log in

Algebraic manipulation of scientific datasets

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We investigate algebraic processing strategies for large numeric datasets equipped with a (possibly irregular) grid structure. Such datasets arise, for example, in computational simulations, observation networks, medical imaging, and 2-D and 3-D rendering. Existing approaches for manipulating these datasets are incomplete: The performance of SQL queries for manipulating large numeric datasets is not competitive with specialized tools. Database extensions for processing multidimensional discrete data can only model regular, rectilinear grids. Visualization software libraries are designed to process arbitrary gridded datasets efficiently, but no algebra has been developed to simplify their use and afford optimization. Further, these libraries are data dependent – physical changes to data representation or organization break user programs. In this paper, we present an algebra of gridfields for manipulating arbitrary gridded datasets, algebraic optimization techniques, and an implementation backed by experimental results. We compare our techniques to those of Geographic Information Systems (GIS) and visualization software libraries, using real examples from an Environmental Observation and Forecasting System. We find that our approach can express optimized plans inaccessible to other techniques, resulting in improved performance with reduced programming effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Baptista, A., Wilkin, M., Pearson, P., Turner, P.C.M., Barrett, P.: Coastal and estuarine forecast systems: a multi-purpose infrastructure for the columbia river. Earth System Monitor, NOAA 9(3), (1999)

  2. Baumann, P.: A database array algebra for spatio-temporal data and beyond. In: Next generation information technologies and systems, pp. 76–93 (1999). http://citeseer.nj.nec.com/baumann99database.html

  3. Marathe, A.P., Salem, K.: A language for manipulating arrays. VLDB J., pp. 46–55 (1997)

  4. Stonebraker, M., Rowe, L.A., Hirohama, M.: The implementation of postgres. TKDE 2(1), 125–142 (1990)

    Google Scholar 

  5. Thakar, A., Kunszt, P., Szalay, A., Gray, J.: The sdss science archive: object vs relational implementations of a multi-tb astronomical database. Comput. Sci. Eng. (2002)

  6. Schroeder, W.J., Martin, K.M., Lorensen, W.E.: The design and implementation of an object-oriented toolkit for 3D graphics and visualization. In: Visualization, IEEE Computer Society pp. 93–100. (1996). http://citeseer.nj.nec.com/schroeder96design.html

  7. Howe, B., Maier, D., Baptista, A.: A language for spatial data manipulation. J. Environ. Inform. 2(2), (2003)

  8. Butler, D.M., Bryson, S.: Vector-bundle classes form powerful tool for scientific visualization. Comput. Phys. 6(6), 576–584 (1992)

    Google Scholar 

  9. Haber, R., Lucas, B., Collins, N.: A data model for scientific visualization with provision for regular and irregular grids. In: Visualization. IEEE Computer Society Press (1991)

  10. Moran, P.: Field model: an object-oriented data model for fields. Technical report. NASA Ames Research Center (2001). http://citeseer.nj.nec.com/460402.html

  11. Berti, G.: Generic software components for scientific computing. Ph.D. thesis, Faculty of mathematics, computer science, and natural science, BTU Cottbus, Germany (2000). http://www.math.tu-cottbus.de/berti/diss

  12. Rhodes, P.J., Bergeron, R.D., Sparr, T.M.: Database support for multisource multiresolution scientific data. In: SOFSEM, pp. 94–114 (2002)

  13. Stolte, C., Tang, D., Hanrahan, P.: Query, analysis, and visualization of multidimensional relational databases. In: Proceedings of the Eight ACM International Conference on Knowledge Disovery and Data Mining (2002)

  14. DeWitt, D.J., Kabra, N., Luo, J., Patel, J.M., Yu, J.B.: Client–server paradise. VLDB J., pp. 558–569 (1994). http://citeseer.nj.nec.com/dewitt94clientserver.html

  15. Libkin, L., Machlin, R., Wong, L.: A query language for multidimensional arrays: design, implementation, and optimization techniques. In: SIGMOD, pp. 228–239 (1996). http://citeseer.nj.nec.com/libkin96query.html

  16. Musick, R., Critchlow, T.: Practical lessons in supporting large-scale computational science. SIGMOD Rec. 28(4), 49–57 (1999). doi: http://doi.acm.org/10.1145/344816.344860

    Google Scholar 

  17. Stolte, E., Alonso, G.: Efficient exploration of large scientific databases. VLDB J., pp. 622–633 (2002)

  18. Melton, J., Michels, J.E., Josifovski, V., Kulkarni, K., Schwarz, P., Zeidenstein, K.: SQL and management of external data. SIGMOD Rec. 30(1), 70–77 (2001)

    Article  Google Scholar 

  19. Papiani, M., Wason, J., Nicole, D.A.: An architecture for management of large, distributed, scientific data using SQL/MED and XML. In: Advances in Database Technology – EDBT, pp. 447–461 (2000)

  20. ESRI Corporation: ArcGIS: Working with geodatabase topology. Technical report, ESRI (2003). http://www.esri.com/library/whitepapers/pdfs/geo-database-topology.pdf

  21. Watson, P.: Topology and ORDBMS technology. Technical report, Laser-Scan (2002). http://www.radius.laser-scan.com/pdf/Technology_Whiteer_Radius_Topology.pdf

  22. Fritsch, R., Piccinini, R.: Cellular Structures in Topology. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  23. Howe, B., Maier, D.: Algebraic manipulation of scientific datasets. VLDB J. (2004)

  24. Nieminen, J.: Function parser for C++. http://www.students.tut.fi/warp/FunctionParser/

  25. Hinterberger, H., Meier, K.A., Gilgen, H.: Spatial data reallocation based on multidimensional range queries – a contribution to data management for the earth sciences. In: SSDBM, pp. 228–239. IEEE Computer Society (1994)

  26. IBM Corporation: IBM Visualization Data Explorer User Guide, 4th edn. (1993)

  27. Silva, C., Chiang, Y., El-Sana, J., Lindstrom, P.: Out-of-core algorithms for scientific visualization and computer graphics. In: Visualization. IEEE Computer Society (2002). http://citeseer.ist.psu.edu/silva02outcore.html. Course Notes for Tutorial 4

  28. Güting, R.H., Schneider, M.: Realm-based spatial data types: the ROSE algebra. VLDB J. 4(2), 243–286 (1995)

    Article  Google Scholar 

  29. Jenter, H.L., Signell, R.P.: NetCDF: A public-domain-software solution to data-access problems for numerical modelers. Unidata (1992). http://my.unidata.ucar.edu/

  30. Stepanov, A.A., Lee, M.: The Standard Template Library. Technical report, X3J16/94-0095, WG21/N0482 (1994). http://citeseer.nj.nec.com/article/stepanov95standard.html

  31. Schloegel, K., Karypis, G., Kumar, V.: Graph partitioning for high-performance scientific simulations pp. 491–541 (2003)

  32. Widmann, N., Baumann, P.: Efficient execution of operations in a dbms for multidimensional arrays. In: SSDBM, pp. 155–165 (1998)

  33. Bhattacharya, S., Mohan, C., Brannon, K.W., Narang, I., Hsiao, H.I., Subramanian, M.: Coordinating backup/recovery and data consistency between database and file systems. In: SIGMOD, pp. 500–511 (2002). doi: http://doi.acm.org/10.1145/564691.564749

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bill Howe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Howe, B., Maier, D. Algebraic manipulation of scientific datasets. The VLDB Journal 14, 397–416 (2005). https://doi.org/10.1007/s00778-005-0157-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0157-5

Keywords

Navigation