Disk Allocation for Fast Range and Nearest-Neighbor Queries

Prabhakar, Sunil; Agrawal, Divyakant; Abbadi, Amr El

doi:10.1023/A:1024895525526

Disk Allocation for Fast Range and Nearest-Neighbor Queries

Published: September 2003

Volume 14, pages 107–135, (2003)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Sunil Prabhakar¹,
Divyakant Agrawal² &
Amr El Abbadi²

51 Accesses
1 Citation
Explore all metrics

Abstract

As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitudeand latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content.

In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Quantization to speedup approximate nearest neighbor search

Article Open access 08 August 2023

Coresets for kernel clustering

Article 22 April 2024

References

K.A.S. Abdel-Ghaffar and A. El Abbadi, “Optimal disk allocation for partial match queries,” Transactions of Database Systems, vol. 18, no. 1, pp. 132–156, 1993.
Google Scholar
K.A.S. Abdel-Ghaffar and A. El Abbadi, “Optimal allocation of two-dimensional data,” in Int. Conf. on Database Theory, Delphi, Greece, Jan. 1997, pp. 409–418.
R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” in 4th Int. Conference on Foundations of Data Organization and Algorithms, 1993, pp. 69–84.
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: An efficient and robust access method for points and rectangles,” in Proc.ACMSIGMOD Int. Conf. on Management of Data, May 23–25 1990, pp. 322–331.
S. Berchtold, C. Bohm, B. Braunmuller, D.A. Keim, and H.-P. Kriegel, “Fast parallel similarity search in multimedia databases,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, Arizona, USA, 1997, pp. 1–12.
S. Berchtold, D.A. Keim, and H.P. Kreigel, “The X-tree: An index structure for high-dimensional data,” in 22nd Conference on Very Large Databases, Bombay, India, 1996, pp. 28–39.
R. Bhatia, R.K. Sinha, and C.-M. Chen, “Declutering using golden ratio sequences,” in Proc. of Int'l. Conference on Data Engineering (ICDE), San Diego, California, March 2000.
T. Brinkhoff, H. Horn, H.P. Kriegel, and R. Schneider, “A storage and access architecture for efficient query processing in spatial database systems,” Lecture Notes in Computer Science, vol. 692, pp. 357–376, 1993.
Google Scholar
B. Chor, C.E. Leiserson, R.L. Rivest, and J.B. Shearer, “An application of number theory to the organization of raster-graphics memory,” Journal of the Association for Computing Machinery, vol. 33, no. 1, pp. 86–104, 1986.
Google Scholar
H.C. Du and J.S. Sobolewski, “Disk allocation for cartesian product files on multiple-disk systems,” ACM Transactions of Database Systems, vol. 7, no. 1, pp. 82–101, 1982.
Google Scholar
C. Faloutsos and P. Bhagwat, “Declustering using fractals,” in Proc. of the 2nd Int. Conf. on Parallel and Distributed Information Systems, San Diego, CA, Jan. 1993, pp. 18–25.
C. Faloutsos and D. Metaxas, “Declustering using error correcting codes,” in Proc. ACMSymp. on Principles of Database Systems, 1989, pp. 253–258.
S. Ghandeharizadeh and D.J. DeWitt, “Amultiuser performance analysis of alternative declustering strategies,” in Proceedings of the International Conference on Data Engineering (ICDE), Los Angeles, California, Feb. 1990, pp. 466–475.
S. Ghandeharizadeh and D.J. DeWitt, “A performance analysis of alternative multi-attribute declustering strategies,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, San Diego, 1992, pp. 29–38.
O. Gunther, “The design of the cell tree: An object-oriented index structure for geometric databases,” in Proceedings of the International Conference on Data Engineering (ICDE), 1989, pp. 598–605.
A. Guttman, “R-trees: A dynamic index structure for spatial searching,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1984, pp. 47–57.
J. Hellerstein, J. Naughton, and A. Pfeffer, “Generalized search trees for database systems,” in Proceedings of the Int. Conf. on Very Large Data Bases, Sept. 1995, pp. 562–573.
H.V. Jagdish, “A retrieval technique for similar shapes,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208–217.
M.H. Kim and S. Pramanik, “Optimal file distribution for partial match retrieval,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, Chicago, 1988, pp. 173–182.
C. Kolovson and M. Stonebraker, “Segment indexes: Dynamic indexing techniques for multidimensional interval data,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 138–147.
J. Li, J. Srivastava, and D. Rotem, “CMD: A multidimensional declustering method for parallel database systems,” in Proceedings of the Int. Conf. on Very Large Data Bases, Vancouver, Canada, Aug. 1992, pp. 3–14.
D.B. Lomet and B. Salzberg, “The hB-tree: A multi-attribute indexing method with good guaranteed performance,” Transactions of Database Systems, vol. 15, no. 4, pp. 625–658, 1990.
Google Scholar
B.S. Manjunath and W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837–842, 1996.
Google Scholar
S. Prabhakar, K. Abdel-Ghaffar, D. Agrawal, and A. El Abbadi, “Cyclic allocation of two-dimensional data,” in Proc. of the International Conference on Data Engineering (ICDE'98), Orlando, Florida, Feb. 1998, pp. 94–101.
S. Prabhakar, D. Agrawal, and A. El Abbadi, “Data declustering for efficient range and similarity searching,” in Proc. Multimedia Storage and Archiving Systems III (SPIE Symposium on Voice, Video, and Data Communications), Boston, Massachusetts, Nov. 1998.
S. Prabhakar, D. Agrawal, and A. El Abbadi, “Efficient disk allocation for fast similarity searching,” in Proc. of the 10th Int. Sym. on Parallel Algorithms and Architectures (SPAA'98), Puerto Vallarta, Mexico, June 1998, pp. 78–87.
S. Prabhakar, D. Agrawal, and A. El Abbadi, “Efficient retrieval of multidimensional datasets through parallel I/O,” in Proc. of the 5th International Conference on High Performance Computing (HiPC'98), Chennai, India, Dec. 1998.
J.T. Robinson, “The kdb-tree: A search structure for large multi-dimensional dynamic indexes,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1981, pp. 10–18.
A.S. Slazay, P.Z. Kunst, A. Thakar, J. Gray, D. Slutz, and R.J. Brunner, “Designing and mining multi-terabyte astronomy archives: The sloan digital sky survey,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, Dallas, Texas, May 2000, pp. 451–462.
D. White and R. Jain, “Similarity indexing with the SS-tree,” in Proceedings of the International Conference on Data Engineering (ICDE), 1996, pp. 516–523.

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, Purdue University, West Lafayette, IN, 47907, USA
Sunil Prabhakar
Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA
Divyakant Agrawal & Amr El Abbadi

Authors

Sunil Prabhakar
View author publications
You can also search for this author in PubMed Google Scholar
Divyakant Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Amr El Abbadi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prabhakar, S., Agrawal, D. & Abbadi, A.E. Disk Allocation for Fast Range and Nearest-Neighbor Queries. Distributed and Parallel Databases 14, 107–135 (2003). https://doi.org/10.1023/A:1024895525526

Download citation

Issue Date: September 2003
DOI: https://doi.org/10.1023/A:1024895525526

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Disk Allocation for Fast Range and Nearest-Neighbor Queries

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Quantization to speedup approximate nearest neighbor search

Coresets for kernel clustering

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Disk Allocation for Fast Range and Nearest-Neighbor Queries

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Quantization to speedup approximate nearest neighbor search

Coresets for kernel clustering

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation