Abstract
Scientific datasets are often stored on distributed archival storage systems, because geographically distributed sensor devices store the datasets in their local machines and also because the size of scientific datasets demands large amount of disk space. Multidimensional indexing techniques have been shown to greatly improve range query performance into large scientific datasets. In this paper, we discuss several ways of distributing a multidimensional index in order to speed up access to large distributed scientific datasets. This paper compares the designs, challenges, and problems for distributed multidimensional indexing schemes, and provides a comprehensive performance study of distributed indexing to provide guidelines to choose a distributed multidimensional index for a specific data analysis application.
References
Allcock B, Bester J, Bresnahan J, Chervenak A, Foster I, Kesselman C, Meder S, Nefedova V, Quesnel D, Tuecke S (2001) Secure, efficient data transport and replica management for high-performance data-intensive computing. In: Proceedings of IEEE mass storage conference
Baru C, Moore R, Rajasekar A, Wan M (1998) The SDSC storage resource broker. In: Proceedings of CASCON’98 conference, December
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R ∗-tree: An efficient and robust access method for points and rectangles. In: Proceedings of 1990 ACM SIGMOD international conference on management of data (SIGMOD), May, pp 322–331
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9)
Biomedical informatics research network, http:/www.nbirn.net
Chervenak A, Deelman E, Foster I, Guy L, Hoschek W (2001) Giggle: A framework for constructing scalable replica location services. In: Supercomputing (SC)
Crainiceanu A, Linga P, Gehrke J, Shanmugasundaram J (2004) Querying peer-to-peer networks using P-trees. In: Proceedings of the 7th international workshop on the web and databases (WebDB)
Ganesan P, Yang B, Garcia-Molina H (2004) One torus to rule them all: Multi-dimensional queries in p2p systems. In: WebDB ’04: proceedings of the 7th international workshop on the web and databases (WebDB). LNCS, pp 19–24
Gnutella website. http://www.gnutella.org
GriPhyn: Grid physic network. http:/www.griphyn.org
Guttman A (1984) R-trees: A dynamic index structure for spatial searching. In: Proceedings of 1984 ACM SIGMOD international conference on management of data (SIGMOD), pp 47–57
Henrich A, Six HW, Widmayer P (1989) The LSD tree: Spatial access to multidimensional point and non-point objects. In: Proceedings of the 15th international conference on very large data bases (VLDB), pp 45–53
Kamel I, Faloutsos C (1992) Parallel R-trees. In: Proceedings of 1992 ACM SIGMOD international conference on management of data (SIGMOD), pp 195–204
Koudas N, Faloutsos C, Kamel I (1996) Declustering spatial databases on a multi-computer architecture. In: Proceedings of the 5th international conference on extending databases technology (EDBT)
Kroll B, Widmayer P (1994) Distributing a search tree among a growing number of processors. In: Proceedings of 1994 ACM SIGMOD international conference on management of data (SIGMOD), pp 265–276
Kubiatowicz J, Bindel D, Chen Y, Eaton P, Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W, Wells C, Zhao B (2000) Oceanstore: An architecture for global-scale persistent storage. In: Proceedings of ACM ASPLOS
Lamehamedi H, Szymanski B, Shentu Z, Deelman E (2002) Data replication strategies in grid environments. In: The 5th international conference on algorithms and architectures for parallel processing (ICA3PP)
Litwin W, Neimat MA, Schneider DA (1993) LH ∗: Linear hashing for distributed files. In: Proceedings of 1993 ACM SIGMOD international conference on management of data (SIGMOD), pp 327–336
Menasce DA, Almeida VAF (2000) Scaling for E-business: technologies, models, performance, and capacity planning. Prentice Hall PTR, New York
Nam B, Sussman A (2005) Spatial indexing of distributed multidimensional datasets. In: Proceedings of the 5th IEEE/ACM international symposium on cluster computing and the grid (CCGrid), May
Nam B, Sussman A (2006) DiST: Fully decentralized indexing for querying distributed multidimensional datasets. In: Proceedings of 20th IEEE international parallel and distributed processing symposium (IPDPS)
National Oceanic and Atmospheric Administration. (1998) NOAA polar orbiter user’s guide—November 1998 revision. Compiled and edited by Katherine B. Kidwell. Available at http://www2.ncdc.noaa.gov/docs/podug/cover.htm
NOAA satellite and information service (2005) Advanced very high resolution radiometer—AVHRR, http://noaasis.noaa.gov/NOAASIS/ml/avhrr.html
Rajasekar A, Wan M, Moore R (2002) MySRB & SRB—components of a data grid. In: Proceedings of the 11th IEEE international symposium on high performance distributed computing (HPDC), July
Rajasekar A, Wan M, Moore R, Schroeder W (2004) Data grid federation. In: Proceedings of the international conference on parallel and distributed processing techniques and applications (PDPTA)
Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content addressable network. In: Proceedings of the 2001 ACM SIGCOMM conference
Sagan H (1994) Space-filling curves. Springer, Berlin
Schnitzer B, Leutenegger ST (1999) Master-client R-trees: A new parallel R-tree architecture. In: Proceedings of 11th international conference on scientific and statistical database management (SSDBM), pp 68–77
Stoica I, Morris R, Karger D, Kaashoek F, Balakrishnan H (2001) Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM conference, pp 149–160
Townshend JRG (1994) Global data sets for land applications from the advanced very high resolution radiometer: an introduction. Int J Remote Sens 15:3319–3332
Vazhkudai S, Tuecke S, Foster I (2001) Replica selection in the globus data grid. In: Proceedings of the 1st IEEE international symposium on cluster computing and the grid (CCGrid), pp 106–113
Wolfson O, Jajodia S, Huang Y (1997) An adaptive data replication algorithm. ACM Trans Database Syst 22(2):255–314
Zhang C, Krishnamurthy A, Wang RY (2004) SkipIndex: Towards a scalable peer-to-peer index service for high dimensional data. Technical Report TR-703-04, Princeton University
Zhang Z, JáJá J, Bader D, Kalluri S, Song H, Saleous NE, Vermote E, Townshend JRG (1999) Kronos: A Java-based software system for the processing and retrieval of large scale AVHRR data sets. Technical Report EECE-TR-99-006, University of New Mexico, November
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nam, B., Sussman, A. Analyzing design choices for distributed multidimensional indexing. J Supercomput 59, 1552–1576 (2012). https://doi.org/10.1007/s11227-011-0567-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0567-7