Skip to main content

Partition-Based Similarity Join in High Dimensional Data Spaces

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2453))

Included in the following conference series:

Abstract

It is not desirable in the performance perspective of search algorithms to partition a high dimensional data space by dividing all the dimensions. This is because the number of cells resulted from partitioning explodes as the number of partitioning dimensions increases, thus making any search method based on space partitioning impractical. To address this problem, we propose an algorithm to dynamically select partitioning dimensions based on a data sampling method for efficient similarity join processing. Futhermore, a disk-based plane sweeping method is proposed to minimize the cost of joins between the partitioned cells. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Böhm, B. Braunmuller, F. Krebs, and H.-P. Kriegel. Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data. In Proceedings of the 2001 ACM-SIGMOD Conference, 2001.

    Google Scholar 

  2. Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R.-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM-SIGMOD Conference, pages 322–331, Atlantic City, NJ, May 1990.

    Google Scholar 

  3. S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd VLDB Conference, Bombay, India, September 1996.

    Google Scholar 

  4. Thomas Brinkho., Hans-Peter Kriegel, and Bernhard Seeger. Efficient processing of spatial joins using R-Trees. In Proceedings of the 1993 ACM-SIGMOD Conference, pages 237–246, Washington, DC, May 1993.

    Google Scholar 

  5. Antonin Guttman. R-Trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM-SIGMOD Conference, pages 47–57, Boston, MA, June 1984.

    Google Scholar 

  6. Gisli R. Hjaltason and Hanan Samet. Incremental distance join algorithms for spatial databases. In Proceedings of the 1998 ACM-SIGMOD Conference, pages 237–248, Seattle, WA, June 1998.

    Google Scholar 

  7. Y.W. Huang, N. Jing, and E. Rundensteiner. Spatial joins using R-trees: Breadth-first traversal with global optimizations. In Proceedings of the 23rd VLDB Conference, 1997.

    Google Scholar 

  8. N. Koudas and C. Sevcik. High dimensional similarity joins: Algorithms and performance evaluation. In Proceedings of the 1998 IEEE International Conference on Data Engineering, 1998.

    Google Scholar 

  9. K. Lin, H. V. Jagadish, and C. Faloutsos. The TV-tree: An index structure for high-dimensional data. VLDB Journal, 3:517–542, 1995.

    Article  Google Scholar 

  10. Ming-Ling Lo and Chinya V. Ravishankar. Spatial hash-join. In Proceedings of the 1996 ACM-SIGMOD Conference, pages 247–258, Montreal, Canada, June 1996.

    Google Scholar 

  11. Jignesh M. Patel and David J. DeWitt. Partition based spatial-merge join. In Proceedings of the 1996 ACM-SIGMOD Conference, pages 259–270, Montreal, Canada, June 1996.

    Google Scholar 

  12. K. Shim, R. Srikant, and R. Agrawal. High-dimensional similarity joins. In Proceedings of the 1997 IEEE International Conference on Data Engineering, 1997.

    Google Scholar 

  13. Hyoseop Shin, Bongki Moon, and Sukho Lee. Adaptive multi-stage distance join processing. In Proceedings of the 2000 ACM-SIGMOD Conference, pages 343–354, Dallas, TX, May 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shin, H., Moon, B., Lee, S. (2002). Partition-Based Similarity Join in High Dimensional Data Spaces. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_73

Download citation

  • DOI: https://doi.org/10.1007/3-540-46146-9_73

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44126-7

  • Online ISBN: 978-3-540-46146-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics