Skip to main content
Log in

Analyzing design choices for distributed multidimensional indexing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Scientific datasets are often stored on distributed archival storage systems, because geographically distributed sensor devices store the datasets in their local machines and also because the size of scientific datasets demands large amount of disk space. Multidimensional indexing techniques have been shown to greatly improve range query performance into large scientific datasets. In this paper, we discuss several ways of distributing a multidimensional index in order to speed up access to large distributed scientific datasets. This paper compares the designs, challenges, and problems for distributed multidimensional indexing schemes, and provides a comprehensive performance study of distributed indexing to provide guidelines to choose a distributed multidimensional index for a specific data analysis application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allcock B, Bester J, Bresnahan J, Chervenak A, Foster I, Kesselman C, Meder S, Nefedova V, Quesnel D, Tuecke S (2001) Secure, efficient data transport and replica management for high-performance data-intensive computing. In: Proceedings of IEEE mass storage conference

    Google Scholar 

  2. Baru C, Moore R, Rajasekar A, Wan M (1998) The SDSC storage resource broker. In: Proceedings of CASCON’98 conference, December

    Google Scholar 

  3. Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R -tree: An efficient and robust access method for points and rectangles. In: Proceedings of 1990 ACM SIGMOD international conference on management of data (SIGMOD), May, pp 322–331

    Chapter  Google Scholar 

  4. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9)

  5. Biomedical informatics research network, http:/www.nbirn.net

  6. Chervenak A, Deelman E, Foster I, Guy L, Hoschek W (2001) Giggle: A framework for constructing scalable replica location services. In: Supercomputing (SC)

    Google Scholar 

  7. Crainiceanu A, Linga P, Gehrke J, Shanmugasundaram J (2004) Querying peer-to-peer networks using P-trees. In: Proceedings of the 7th international workshop on the web and databases (WebDB)

    Google Scholar 

  8. Ganesan P, Yang B, Garcia-Molina H (2004) One torus to rule them all: Multi-dimensional queries in p2p systems. In: WebDB ’04: proceedings of the 7th international workshop on the web and databases (WebDB). LNCS, pp 19–24

    Google Scholar 

  9. Gnutella website. http://www.gnutella.org

  10. GriPhyn: Grid physic network. http:/www.griphyn.org

  11. Guttman A (1984) R-trees: A dynamic index structure for spatial searching. In: Proceedings of 1984 ACM SIGMOD international conference on management of data (SIGMOD), pp 47–57

    Chapter  Google Scholar 

  12. Henrich A, Six HW, Widmayer P (1989) The LSD tree: Spatial access to multidimensional point and non-point objects. In: Proceedings of the 15th international conference on very large data bases (VLDB), pp 45–53

    Google Scholar 

  13. Kamel I, Faloutsos C (1992) Parallel R-trees. In: Proceedings of 1992 ACM SIGMOD international conference on management of data (SIGMOD), pp 195–204

    Chapter  Google Scholar 

  14. Koudas N, Faloutsos C, Kamel I (1996) Declustering spatial databases on a multi-computer architecture. In: Proceedings of the 5th international conference on extending databases technology (EDBT)

    Google Scholar 

  15. Kroll B, Widmayer P (1994) Distributing a search tree among a growing number of processors. In: Proceedings of 1994 ACM SIGMOD international conference on management of data (SIGMOD), pp 265–276

    Chapter  Google Scholar 

  16. Kubiatowicz J, Bindel D, Chen Y, Eaton P, Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W, Wells C, Zhao B (2000) Oceanstore: An architecture for global-scale persistent storage. In: Proceedings of ACM ASPLOS

    Google Scholar 

  17. Lamehamedi H, Szymanski B, Shentu Z, Deelman E (2002) Data replication strategies in grid environments. In: The 5th international conference on algorithms and architectures for parallel processing (ICA3PP)

    Google Scholar 

  18. Litwin W, Neimat MA, Schneider DA (1993) LH : Linear hashing for distributed files. In: Proceedings of 1993 ACM SIGMOD international conference on management of data (SIGMOD), pp 327–336

    Chapter  Google Scholar 

  19. Menasce DA, Almeida VAF (2000) Scaling for E-business: technologies, models, performance, and capacity planning. Prentice Hall PTR, New York

    Google Scholar 

  20. Nam B, Sussman A (2005) Spatial indexing of distributed multidimensional datasets. In: Proceedings of the 5th IEEE/ACM international symposium on cluster computing and the grid (CCGrid), May

    Google Scholar 

  21. Nam B, Sussman A (2006) DiST: Fully decentralized indexing for querying distributed multidimensional datasets. In: Proceedings of 20th IEEE international parallel and distributed processing symposium (IPDPS)

    Google Scholar 

  22. National Oceanic and Atmospheric Administration. (1998) NOAA polar orbiter user’s guide—November 1998 revision. Compiled and edited by Katherine B. Kidwell. Available at http://www2.ncdc.noaa.gov/docs/podug/cover.htm

  23. NOAA satellite and information service (2005) Advanced very high resolution radiometer—AVHRR, http://noaasis.noaa.gov/NOAASIS/ml/avhrr.html

  24. Rajasekar A, Wan M, Moore R (2002) MySRB & SRB—components of a data grid. In: Proceedings of the 11th IEEE international symposium on high performance distributed computing (HPDC), July

    Google Scholar 

  25. Rajasekar A, Wan M, Moore R, Schroeder W (2004) Data grid federation. In: Proceedings of the international conference on parallel and distributed processing techniques and applications (PDPTA)

    Google Scholar 

  26. Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content addressable network. In: Proceedings of the 2001 ACM SIGCOMM conference

    Google Scholar 

  27. Sagan H (1994) Space-filling curves. Springer, Berlin

    Book  MATH  Google Scholar 

  28. Schnitzer B, Leutenegger ST (1999) Master-client R-trees: A new parallel R-tree architecture. In: Proceedings of 11th international conference on scientific and statistical database management (SSDBM), pp 68–77

    Chapter  Google Scholar 

  29. Stoica I, Morris R, Karger D, Kaashoek F, Balakrishnan H (2001) Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM conference, pp 149–160

    Google Scholar 

  30. Townshend JRG (1994) Global data sets for land applications from the advanced very high resolution radiometer: an introduction. Int J Remote Sens 15:3319–3332

    Article  Google Scholar 

  31. Vazhkudai S, Tuecke S, Foster I (2001) Replica selection in the globus data grid. In: Proceedings of the 1st IEEE international symposium on cluster computing and the grid (CCGrid), pp 106–113

    Chapter  Google Scholar 

  32. Wolfson O, Jajodia S, Huang Y (1997) An adaptive data replication algorithm. ACM Trans Database Syst 22(2):255–314

    Article  Google Scholar 

  33. Zhang C, Krishnamurthy A, Wang RY (2004) SkipIndex: Towards a scalable peer-to-peer index service for high dimensional data. Technical Report TR-703-04, Princeton University

  34. Zhang Z, JáJá J, Bader D, Kalluri S, Song H, Saleous NE, Vermote E, Townshend JRG (1999) Kronos: A Java-based software system for the processing and retrieval of large scale AVHRR data sets. Technical Report EECE-TR-99-006, University of New Mexico, November

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beomseok Nam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nam, B., Sussman, A. Analyzing design choices for distributed multidimensional indexing. J Supercomput 59, 1552–1576 (2012). https://doi.org/10.1007/s11227-011-0567-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0567-7

Keywords

Navigation