Efficient top-k similarity join processing over multi-valued objects

Zhang, Wenjie; Zhan, Liming; Zhang, Ying; Cheema, Muhammad Aamir; Lin, Xuemin

doi:10.1007/s11280-012-0201-5

Efficient top-k similarity join processing over multi-valued objects

Published: 07 February 2013

Volume 17, pages 285–309, (2014)
Cite this article

World Wide Web Aims and scope Submit manuscript

Wenjie Zhang¹,
Liming Zhan¹,
Ying Zhang¹,
Muhammad Aamir Cheema¹ &
…
Xuemin Lin¹

327 Accesses
Explore all metrics

Abstract

The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects $\mathcal U$ and $\mathcal V$, a top-k similarity join returns k pairs of most similar objects from $\mathcal U \times \mathcal V$. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity is measured by some simple distance metrics like Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join over multi-valued objects. We apply two types of quantile based distance measures, ϕ-quantile distance and ϕ-quantile group-base distance, to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity Histogram Estimation Based Top-k Similarity Join Algorithm on High-Dimensional Data

Top-k spatial distance joins

Article 12 February 2020

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

References

Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)
Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient processing of spatial joins using r-trees. In: SIGMOD (1993)
Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified approach for computing top-k pairs in multidimensional space. In: ICDE (2011)
Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: CIKM (2006)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn., chapter 9: medians and order statistics. In: The MIT Press (2009)
Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD (2000)
Elmasri, R., Navathe, S.: Fundamentals of database systems, 6th edn. (2011)
Guntzer, M.M., Jungnickel, D.: Approximate minimization algorithms for the 0/1 knapsack and subset-sum problem. In: Operations Research Letters (2000)
Han, W.S., Kim, J., Lee, B.S., Tao, Y., Rantzau, R., Markl, V.: Cost-based predictive spatiotemporal join. In: TKDE (2009)
Hjaltason, G., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD (1998)
Huang, Y.W., Ning, J., Rundensteiner, E.A.: Spatial joins using r-trees: breadth-first traversal with global optimizations. In: VLDB (1997)
Knorr, E.M., Ng, R.T.: Finding aggregate proximity relationships and commonalities in spatial data mining. In: TKDE (1996)
Kriegel, H.P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity search on uncertain data. In: DASFAA (2006)
Lee, M.J., Whang, K.Y., Han, W.S., I.-Y, S.: Transform-space view: performing spatial join in the transform space using original-space indexes. In: TKDE (2006)
Lin, X., Zhang, Y., Zhang, W., Cheema, M.A.: Stochastic skyline operator. In: ICDE (2011)
Ljosa, V., Singh, A.K.: Top-k spatial join of probabilistic objects. In: ICDE (2008)
Mamoulis, N., Papadias, D.: Multiway spatial joins. In: TODS (2001)
Meester, R.: A natural introduction to probability theory. Springer (2008)
Musial, K., Budka, M., Juszczyszyn, K.: Creation and growth of online social network. In: World Wide Web Journal (2012)
Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP operations in spatial data warehouses. In: SSTD (2001)
Rigaux, P., Scholl, M., Voisard, A.: Spatial databases: with applications to gis. Morgan Kaufmann (2002)
Sankaranarayanan, J., Alborzi, H., Samet, H.: Distance join queries on spatial networks. In: GIS (2006)
Shen, Z., Cheema, M.A., Lin, X., Zhang, W., Wang, H.: Efficiently monitoring top-k pairs over sliding windows. In: ICDE, pp. 798–809 (2012)
Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. In: World Wide Web Journal (2009)
Yiu, M.L., Mamoulis, N., Tao, Y.: Efficient quantile retrieval on multi-dimensional data. In: EDBT (2006)
Zhang, W., Lin, X., Cheema, M.A., Zhang, Y., Wang, W.: Quantile-based knn over multi-valued objects. In: ICDE (2010)
Zhang, R., Lin, D., Ramamohanarao, K., Bertino, E.: Continuous intersection joins over moving objects. In: ICDE (2008)
Zheng, K., Fung, P., Zhou, X.: K nearest neighbor search for fuzzy objects. In: SIGMOD (2010)

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, University of New South Wales, Sydney, Australia
Wenjie Zhang, Liming Zhan, Ying Zhang, Muhammad Aamir Cheema & Xuemin Lin

Authors

Wenjie Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Liming Zhan
View author publications
You can also search for this author inPubMed Google Scholar
Ying Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Muhammad Aamir Cheema
View author publications
You can also search for this author inPubMed Google Scholar
Xuemin Lin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wenjie Zhang.

Additional information

Wenjie Zhang was partially supported by ARC DE120102144 and DP120104168. Ying Zhang was partially supported by ARC DP110104880 and UNSW ECR grant PSE1799. Muhammad Aamir Cheema was partially supported by ARC DE130101002 and DP130103405. Xuemin Lin was partially supported by ARC DP0987557, ARC DP110102937, ARC DP120104168, and NSFC61021004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Zhan, L., Zhang, Y. et al. Efficient top-k similarity join processing over multi-valued objects. World Wide Web 17, 285–309 (2014). https://doi.org/10.1007/s11280-012-0201-5

Download citation

Received: 06 September 2012
Revised: 17 December 2012
Accepted: 28 December 2012
Published: 07 February 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11280-012-0201-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient top-k similarity join processing over multi-valued objects

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Similarity Histogram Estimation Based Top-k Similarity Join Algorithm on High-Dimensional Data

Top-k spatial distance joins

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now