Partition-Based Similarity Joins Using Diagonal Dimensions in High Dimensional Data Spaces

Shin, Hyoseop

doi:10.1007/11875581_66

Partition-Based Similarity Joins Using Diagonal Dimensions in High Dimensional Data Spaces

Hyoseop Shin²⁰

Conference paper

1247 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

Distributions of very high dimensional data are, in most cases, not even, but skewed. For this reason, there can be more effective dimensions than others in partitioning a high dimensional data set. Effective dimensions can be used to partition the data set in more balanced way so that data are located in more evenly distributed. In this paper, we present schemes to select dimensions by which high dimensional data sets are partitioned for efficient similarity joins. Especially, in order to efficiently reduce the number of partition dimensions, we propose a novel scheme using diagonal dimensions compared with perpendicular dimensions. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.

This paper was supported by Konkuk University in 2006.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using R-trees. In: Proceedings of the 1996 VLDB Conference, Bombay, India (September 1996)
Google Scholar
Huang, Y.W., Jing, N., Rundensteiner, E.: Spatial joins using R-trees: Breadth-first traversal with global optimizations. In: Proceedings of the 1997 VLDB Conference (1997)
Google Scholar
Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: Proceedings of the 1998 ACM-SIGMOD Conference, Seattle, WA, June 1998, pp. 237–248 (1998)
Google Scholar
Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: Proceedings of the 2000 ACM-SIGMOD Conference, Dallas, TX, May 2000, pp. 343–354 (2000)
Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM-SIGMOD Conference, Boston, MA, June 1984, pp. 47–57 (1984)
Google Scholar
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM-SIGMOD Conference, Atlantic City, NJ, May 1990, pp. 322–331 (1990)
Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for highdimensional data. In: Proceedings of the 1996 Conference, Bombay, India (September 1996)
Google Scholar
Patel, J.M., Dewitt, D.J.: Partition based spatial-merge join. In: Proceedings of the 1996 ACM-SIGMOD Conference, Montreal, Canada, June 1996, pp. 259–270 (1996)
Google Scholar
Lo, M.-L., Ravishankar, C.V.: Spatial hash join. In: Proceedings of the 1996 ACM-SIGMOD Conference, Montreal, Canada pp. 247–258 (June 1996)
Google Scholar
Shim, K., Srikant, R., Agrawal, R.: High-dimensional similarity joins. In: Proceedings of the 1997 IEEE International Conference on Data Engineering (1997)
Google Scholar
Koudas, N., Sevcik, C.: High dimensional similarity joins: Algorithms and performance evaluation. In: Proceedings of the 1998 IEEE International Conference on Data Engineering (1998)
Google Scholar
Bohm, C., Braunmuller, B., Krebs, F., Kriegel, H.-P.: Epsilon grid order. An algorithm for the similarity join on massive high-dimensional data. In: Proceedings of the 2001 ACM-SIGMOD Conference (2001)
Google Scholar
Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal 3(4), 517–542 (1994)
Article Google Scholar
Shin, H., Moon, B., Lee, S.: Partition-Based Similarity Join in High Dimensional Data Spaces. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 741–750. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Internet and Multimedia Engineering, Konkuk University, Seoul, Korea
Hyoseop Shin

Authors

Hyoseop Shin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, GICAP Research Group, Universidad de Burgo, Calle Francisco de Vitoria S/N, Edifico C, Campus Vena, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
Department of Information Systems and Computation, Technical University of Valencia, Camino de Vera, Valencia, Spain
Vicente Botti
University of West Scotland, PA1 2BE, Paisley, Scotland
Colin Fyfe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shin, H. (2006). Partition-Based Similarity Joins Using Diagonal Dimensions in High Dimensional Data Spaces. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_66

Download citation

DOI: https://doi.org/10.1007/11875581_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics