Skip to main content
Log in

Efficient batch similarity join processing of social images based on arbitrary features

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In this paper, we identify and solve a multi-join optimization problem for Arbitrary Feature-based social image Similarity JOINs(AFS-JOIN). Given two collections(i.e., R and S) of social images that carry both visual, spatial and textual(i.e., tag) information, the multiple joins based on arbitrary features retrieves the pairs of images that are visually, textually similar or spatially close from different users. To address this problem, in this paper, we have proposed three methods to facilitate the multi-join processing: 1) two baseline approaches(i.e., a naïve join approach and a maximal threshold(MT)-based), and 2) a Batch Similarity Join(BSJ) method. For the BSJ method, given m users’ join requests, they are first conversed and grouped into m″ clusters which correspond to m″ join boxes, where m > m″. To speedup the BSJ processing, a feature distance space is first partitioned into some cubes based on four segmentation schemes; the image pairs falling in the cubes are indexed by the cube tree index; thus BSJ processing is transformed into the searching of the image pairs falling in some affected cubes for m″ AFS-JOINs with the aid of the index. An extensive experimental evaluation using real and synthetic datasets shows that our proposed BSJ technique outperforms the state-of-the-art solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16

Similar content being viewed by others

Notes

  1. Note that, the total number of features in the AFS-JOIN is three (e.g., visual, textual and spatial features) in this paper, it can be easily extended to support multiple features AFS-JOINs such as textual, shape and temporal features, etc.

References

  1. Alamery, M., Faraahi, A., Javadi, H.H.S., et al.: Multi-joins query optimization using the bees algorithm. In: Advances in Intelligent and Soft Computing. 79. pp.449–457. (2010)

  2. Arasu, A., Ganti, V., Kaushik, R,: Efficient exact set-similarity joins. In: VLDB, (2006)

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, 1st edition, (1999)

  4. Ballesteros, J., Cary, A., Rishe, N.: Spsjoin: parallel spatial similarity joins. In: GIS, pp. 481–484. (2011)

  5. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW. (2007)

  6. Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. In: VLDB. (2013)

  7. Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In SIGMOD, (1993)

  8. Broder A.Z.: On the resemblance and containment of documents. In: SEQS. (1997)

  9. Chan, E.P.F.: Buffer queries. TKDE 15(4), 895–910 (2003)

    Google Scholar 

  10. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC. (2002)

  11. Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE. (2006)

  12. Chowdhury, A., Frieder, O., Grossman, D.A., et al.: Collection statistics for fast duplicate document detection. In TOIS. 20(2): 171–191, (2002)

  13. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. In: TKDE, 19(1):1–16. (2007)

  14. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB. (2001)

  15. Kementsietsidis, A., Neven, F., Van de Craen, D.: Scalable multi query optimization for exploratory queries over federated scientific databases. In: VLDB. (2008)

  16. Li, G.L., Deng, D., Wang, J.N., Feng, J.H.: Pass-join: a partition-based method for similarity joins. In VLDB. (2012)

  17. Lu, H.J., Shan, M.C., Tan, K.L.: Optimization of multi-way join queries for parallel execution. In VLDB. (1991)

  18. Roy, P., Seshadri S., Sudarshan, S., et al.: Efficient and extensible algorithms for multi query optimization. In: SIGOMOD. (2000)

  19. Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: SIGMOD, (2004)

  20. Sarma, A. D., He, Y, Chaudhuri, S.: ClusterJoin: a similarity joins framework using MapReduce. In: VLDB. (2014)

  21. Sellis, T.K., Multi-query optimization. In: TODS. 13(1). (1988)

  22. Shan, M.C., Yu, P., Wu, K.L.: Optimization of parallel execution for multi-join queries. In: TKDE. 8(3) pp. 416–428. (1996)

  23. Shekita, E.J., Young, H.C., Tan, K.L.: Multi-join optimization for symmetric multiprocessors. In: VLDB. (1993)

  24. Sun, A.X., Bhowmick, S.S., Nguyen, K.T.N. et al.: Tag-based social image retrieval: an empirical evaluation. In: JASIST. 62(12): 2364–2381. (2011)

  25. Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. In: TODS. 36(3):15, (2011)

  26. Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: WWW. (2008)

  27. Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k set similarity joins. In: ICDE. (2009)

  28. Zhuang, Y., Li, Q., Chen, L.: Multi-query optimization for distributed similarity query processing. In: ICDCS. (2008)

Download references

Acknowledgments

This paper is partially supported by the Program of National Natural Science Foundation of China under Grant No. 61272188; the Program of Natural Science Foundation of Zhejiang Province under Grant Nos. LY13F020008, and LY13F020010; the Ministry of Education of Humanities and Social Sciences Project under Grant No. 14YJCZH235. National Center for International Joint Research on E-Business Information Processing (2013B01035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhuang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhuang, Y., Jiang, N., Wu, ZA. et al. Efficient batch similarity join processing of social images based on arbitrary features. World Wide Web 19, 725–753 (2016). https://doi.org/10.1007/s11280-015-0355-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0355-z

Keywords

Navigation