skip to main content
10.1145/1066677.1066793acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A hybrid approach for multiresolution modeling of large-scale scientific data

Published: 13 March 2005 Publication History

Abstract

Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate large-scale multidimensional data sets over the spatio-temporal region. Analyzing such massive data sets is an essential step in helping scientists glean new information. To this end, efficient and effective data models are needed. In this paper, we present a hybrid approach for constructing data models from large-scale multidimensional scientific data sets. Our models not only provide descriptive information about the data but also allow users to subsequently examine the data by querying the data models. Our approach combines a multiresolution-topological model of the data with a multivariate-physical model of the data to generate one hierarchical data model that efficiently captures both the spatio-temporal and the physical aspects of the data. In particular, this hybrid approach consists of three phases. In the first phase, we build a multiresolution model that encapsulates the data set's spatial information (i.e., topology and spatial connectivity). In the second phase, we build a multivariate model from the physical dimensions of the data set. Physical dimensions refer to those dimensions that are neither spatial (x, y, z) nor temporal (time). The exclusion of the spatial-temporal dimensions from the clustering phase is important since "similar" characteristics could be located (spatially) far from each other. Finally, in the third phase, we connect the multivariate-physical model to the multiresolution-topological model by utilizing ideas from information retrieval. The third phase is essential since the multivariate-physical model does not contain any topological information (without which the model does not have accurate spatial context information). Experimental evaluations on two large-scale multidimensional scientific data sets illustrate the value of our hybrid approach.

References

[1]
Abdulla, G., Critchlow, T., Arrighi, W. Simulation Data as Data Streams, In SIGMOD Record, 33, 1 (March 2004).
[2]
Abdulla, G., Baldwin, C., Critchlow, T, Kamimura, R., Lozares, I., Musick, R., Tang, N. A., Lee, B., and Snapp, R. Approximate ad-hoc query engine for simulation data, In JCDL 2001, 255--256.
[3]
Acharya, S., Gibbsons, P. B., Poosala, V., and Ramaswamy, S. The Aqua approximate query answering system, In ACM SIGMOD 1999, 574--576.
[4]
Baldwin, C., Eliassi-Rad, T., Abdulla, G., and Critchlow, T. The evolution of a hierarchical partitioning algorithm for large-scale scientific data: three steps of increasing complexity, In SSDBM 2003, 225--228.
[5]
Baldwin, C., Abdulla, G., Critchlow, T. Multi-resolution modeling of large scale scientific simulation data, In CIKM 2003, 40--48.
[6]
Dadgostar, H., Zarnegar, B., Hoffmann, A., Qin, X.-F., Truong, U., Rao, G., Baltimore, D., and Cheng, G., Cooperation of multiple signaling pathways in CD40-regulated gene expression in B lymphocytes. In Proc. of National Academy of Sciences of the U.S.A., 99, 3, 2002, 1497--1502.
[7]
DuMouchel, W., Volinsky, CH., Johnson, T., Cortes, C., and Pregibon, D., Squashing flat files flatter, In KDD 1999, 6--15.
[8]
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Sciences of the U.S.A., 95, 25, 1998, 14863--14868.
[9]
Eliassi-Rad, T., Baldwin, C., Abdulla, G., and Critchlow, T. Statistical modeling of large-scale scientific simulation data. New Generation of Data Mining Applications, Eds: Zurada J. and Kantardzie M., IEEE Press/Wiley, January 2005.
[10]
Eliassi-Rad, T., and Critchlow, T. Clustering with Uncentered Correlation Coefficients: Beware of Offsets, Lawrence Livermore Technical Report, 2004.
[11]
Freitag, L. A., and Loy, R. M. Adaptive, multi-resolution visualization of large data sets using a distributed memory octree, Supercomputing 1999, Article 60.
[12]
Hand, D., Mannila, H., and Smyth, P. Principles of Data Mining, MIT Press, Cambridge, MA, 2001.
[13]
Jolliffe, I. T. Principal Component Analysis, Springer-Verlag; 2nd edition, 2002.
[14]
Musick, R., and Critchlow, T. Practical lessons in supporting large-scale computational science, In SIGMOD Record, 28, 4 (December 1999).
[15]
Ng, R. T., and Han, J., Efficient and effective clustering methods for spatial data mining, In VLDB 1994, 144--155.
[16]
Parsons, L., Haque, E., and Liu, H. Subspace Clustering for High Dimensional Data: A Review. In SIGKDD Explorations, 6, 1 (June 2004), 90--105.
[17]
Wang, W, Yang, J., and Muntz, R. STING: A statistical information grid approach to spatial data mining, In VLDB 1997, 186--195.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
March 2005
1814 pages
ISBN:1581139640
DOI:10.1145/1066677
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. large-scale scientific data sets
  3. multiresolution indices
  4. multivariate clusters
  5. topological models

Qualifiers

  • Article

Conference

SAC05
Sponsor:
SAC05: The 2005 ACM Symposium on Applied Computing
March 13 - 17, 2005
New Mexico, Santa Fe

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 342
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media