Skip to main content

Partition-Based Similarity Joins Using Diagonal Dimensions in High Dimensional Data Spaces

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

Distributions of very high dimensional data are, in most cases, not even, but skewed. For this reason, there can be more effective dimensions than others in partitioning a high dimensional data set. Effective dimensions can be used to partition the data set in more balanced way so that data are located in more evenly distributed. In this paper, we present schemes to select dimensions by which high dimensional data sets are partitioned for efficient similarity joins. Especially, in order to efficiently reduce the number of partition dimensions, we propose a novel scheme using diagonal dimensions compared with perpendicular dimensions. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.

This paper was supported by Konkuk University in 2006.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using R-trees. In: Proceedings of the 1996 VLDB Conference, Bombay, India (September 1996)

    Google Scholar 

  • Huang, Y.W., Jing, N., Rundensteiner, E.: Spatial joins using R-trees: Breadth-first traversal with global optimizations. In: Proceedings of the 1997 VLDB Conference (1997)

    Google Scholar 

  • Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: Proceedings of the 1998 ACM-SIGMOD Conference, Seattle, WA, June 1998, pp. 237–248 (1998)

    Google Scholar 

  • Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: Proceedings of the 2000 ACM-SIGMOD Conference, Dallas, TX, May 2000, pp. 343–354 (2000)

    Google Scholar 

  • Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM-SIGMOD Conference, Boston, MA, June 1984, pp. 47–57 (1984)

    Google Scholar 

  • Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM-SIGMOD Conference, Atlantic City, NJ, May 1990, pp. 322–331 (1990)

    Google Scholar 

  • Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for highdimensional data. In: Proceedings of the 1996 Conference, Bombay, India (September 1996)

    Google Scholar 

  • Patel, J.M., Dewitt, D.J.: Partition based spatial-merge join. In: Proceedings of the 1996 ACM-SIGMOD Conference, Montreal, Canada, June 1996, pp. 259–270 (1996)

    Google Scholar 

  • Lo, M.-L., Ravishankar, C.V.: Spatial hash join. In: Proceedings of the 1996 ACM-SIGMOD Conference, Montreal, Canada pp. 247–258 (June 1996)

    Google Scholar 

  • Shim, K., Srikant, R., Agrawal, R.: High-dimensional similarity joins. In: Proceedings of the 1997 IEEE International Conference on Data Engineering (1997)

    Google Scholar 

  • Koudas, N., Sevcik, C.: High dimensional similarity joins: Algorithms and performance evaluation. In: Proceedings of the 1998 IEEE International Conference on Data Engineering (1998)

    Google Scholar 

  • Bohm, C., Braunmuller, B., Krebs, F., Kriegel, H.-P.: Epsilon grid order. An algorithm for the similarity join on massive high-dimensional data. In: Proceedings of the 2001 ACM-SIGMOD Conference (2001)

    Google Scholar 

  • Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal 3(4), 517–542 (1994)

    Article  Google Scholar 

  • Shin, H., Moon, B., Lee, S.: Partition-Based Similarity Join in High Dimensional Data Spaces. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 741–750. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shin, H. (2006). Partition-Based Similarity Joins Using Diagonal Dimensions in High Dimensional Data Spaces. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_66

Download citation

  • DOI: https://doi.org/10.1007/11875581_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics