Efficient batch similarity join processing of social images based on arbitrary features

Zhuang, Yi; Jiang, Nan; Wu, Zhi-Ang; Cao, Jie; Ju, Chunhua

doi:10.1007/s11280-015-0355-z

Efficient batch similarity join processing of social images based on arbitrary features

Published: 09 July 2015

Volume 19, pages 725–753, (2016)
Cite this article

World Wide Web Aims and scope Submit manuscript

Yi Zhuang¹,
Nan Jiang²,
Zhi-Ang Wu³,
Jie Cao³ &
…
Chunhua Ju¹

492 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we identify and solve a multi-join optimization problem for Arbitrary Feature-based social image Similarity JOINs(AFS-JOIN). Given two collections(i.e., R and S) of social images that carry both visual, spatial and textual(i.e., tag) information, the multiple joins based on arbitrary features retrieves the pairs of images that are visually, textually similar or spatially close from different users. To address this problem, in this paper, we have proposed three methods to facilitate the multi-join processing: 1) two baseline approaches(i.e., a naïve join approach and a maximal threshold(MT)-based), and 2) a Batch Similarity Join(BSJ) method. For the BSJ method, given m users’ join requests, they are first conversed and grouped into m″ clusters which correspond to m″ join boxes, where m > m″. To speedup the BSJ processing, a feature distance space is first partitioned into some cubes based on four segmentation schemes; the image pairs falling in the cubes are indexed by the cube tree index; thus BSJ processing is transformed into the searching of the image pairs falling in some affected cubes for m″ AFS-JOINs with the aid of the index. An extensive experimental evaluation using real and synthetic datasets shows that our proposed BSJ technique outperforms the state-of-the-art solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bundling centre for landmark image discovery

Article 01 December 2015

Original Image Tracing with Image Relational Graph for Near-Duplicate Image Elimination

A decentralised approach to scene completion using distributed feature hashgram

Article 02 December 2019

Notes

Note that, the total number of features in the AFS-JOIN is three (e.g., visual, textual and spatial features) in this paper, it can be easily extended to support multiple features AFS-JOINs such as textual, shape and temporal features, etc.

References

Alamery, M., Faraahi, A., Javadi, H.H.S., et al.: Multi-joins query optimization using the bees algorithm. In: Advances in Intelligent and Soft Computing. 79. pp.449–457. (2010)
Arasu, A., Ganti, V., Kaushik, R,: Efficient exact set-similarity joins. In: VLDB, (2006)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, 1st edition, (1999)
Ballesteros, J., Cary, A., Rishe, N.: Spsjoin: parallel spatial similarity joins. In: GIS, pp. 481–484. (2011)
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW. (2007)
Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. In: VLDB. (2013)
Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using r-trees. In SIGMOD, (1993)
Broder A.Z.: On the resemblance and containment of documents. In: SEQS. (1997)
Chan, E.P.F.: Buffer queries. TKDE 15(4), 895–910 (2003)
Google Scholar
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: STOC. (2002)
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE. (2006)
Chowdhury, A., Frieder, O., Grossman, D.A., et al.: Collection statistics for fast duplicate document detection. In TOIS. 20(2): 171–191, (2002)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. In: TKDE, 19(1):1–16. (2007)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: VLDB. (2001)
Kementsietsidis, A., Neven, F., Van de Craen, D.: Scalable multi query optimization for exploratory queries over federated scientific databases. In: VLDB. (2008)
Li, G.L., Deng, D., Wang, J.N., Feng, J.H.: Pass-join: a partition-based method for similarity joins. In VLDB. (2012)
Lu, H.J., Shan, M.C., Tan, K.L.: Optimization of multi-way join queries for parallel execution. In VLDB. (1991)
Roy, P., Seshadri S., Sudarshan, S., et al.: Efficient and extensible algorithms for multi query optimization. In: SIGOMOD. (2000)
Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: SIGMOD, (2004)
Sarma, A. D., He, Y, Chaudhuri, S.: ClusterJoin: a similarity joins framework using MapReduce. In: VLDB. (2014)
Sellis, T.K., Multi-query optimization. In: TODS. 13(1). (1988)
Shan, M.C., Yu, P., Wu, K.L.: Optimization of parallel execution for multi-join queries. In: TKDE. 8(3) pp. 416–428. (1996)
Shekita, E.J., Young, H.C., Tan, K.L.: Multi-join optimization for symmetric multiprocessors. In: VLDB. (1993)
Sun, A.X., Bhowmick, S.S., Nguyen, K.T.N. et al.: Tag-based social image retrieval: an empirical evaluation. In: JASIST. 62(12): 2364–2381. (2011)
Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. In: TODS. 36(3):15, (2011)
Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: WWW. (2008)
Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k set similarity joins. In: ICDE. (2009)
Zhuang, Y., Li, Q., Chen, L.: Multi-query optimization for distributed similarity query processing. In: ICDCS. (2008)

Download references

Acknowledgments

This paper is partially supported by the Program of National Natural Science Foundation of China under Grant No. 61272188; the Program of Natural Science Foundation of Zhejiang Province under Grant Nos. LY13F020008, and LY13F020010; the Ministry of Education of Humanities and Social Sciences Project under Grant No. 14YJCZH235. National Center for International Joint Research on E-Business Information Processing (2013B01035).

Author information

Authors and Affiliations

College of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, People’s Republic of China
Yi Zhuang & Chunhua Ju
Hangzhou First People’s Hospital, Hangzhou, People’s Republic of China
Nan Jiang
Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, People’s Republic of China
Zhi-Ang Wu & Jie Cao

Authors

Yi Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Nan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Ang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Cao
View author publications
You can also search for this author in PubMed Google Scholar
Chunhua Ju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Zhuang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhuang, Y., Jiang, N., Wu, ZA. et al. Efficient batch similarity join processing of social images based on arbitrary features. World Wide Web 19, 725–753 (2016). https://doi.org/10.1007/s11280-015-0355-z

Download citation

Received: 18 February 2014
Revised: 19 May 2015
Accepted: 25 May 2015
Published: 09 July 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s11280-015-0355-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient batch similarity join processing of social images based on arbitrary features

Abstract

Access this article

Similar content being viewed by others

Bundling centre for landmark image discovery

Original Image Tracing with Image Relational Graph for Near-Duplicate Image Elimination

A decentralised approach to scene completion using distributed feature hashgram

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient batch similarity join processing of social images based on arbitrary features

Abstract

Access this article

Similar content being viewed by others

Bundling centre for landmark image discovery

Original Image Tracing with Image Relational Graph for Near-Duplicate Image Elimination

A decentralised approach to scene completion using distributed feature hashgram

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation