An Efficient Top-k Spatial Join Query Processing Algorithm on Big Spatial Data

Qiao, Baiyou; Hu, Bing; Qiao, Xiyu; Yao, Laigang; Zhu, Junhai; Wu, Gang

doi:10.1007/978-3-030-26075-0_21

Baiyou Qiao¹⁴,
Bing Hu¹⁴,
Xiyu Qiao¹⁴,
Laigang Yao¹⁴,
Junhai Zhu¹⁴ &
…
Gang Wu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11642))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1100 Accesses

Abstract

Based on Spark platform, we propose an efficient top-k spatial join query processing algorithm on big spatial data, in which, the whole data space is divided into same-sized cells by using a grid partitioning method. Then spatial objects in two data sets are projected and replicated to these cells by projection and replication operations respectively, meanwhile a filtering operation is used to speed up the processing. After that, an R-tree based local top-k spatial join algorithm is proposed to compute the top-k candidate results in each cell, which extends the traditional R-tree index and combines threshold filtering techniques to reduce the communication and computation costs, therefore speeding up the query processing. Experimental results on synthetic data sets show that the proposed algorithm is significantly better than the existing top-k spatial join query processing algorithms in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

Spatial data management in apache spark: the GeoSpark perspective and beyond

Article 22 October 2018

Efficient large-scale distance-based join queries in spatialhadoop

Article 20 September 2017

References

Zhu, M., Papadias, D., Lun Lee, D., Zhang, J.: Top-k spatial joins. IEEE Trans. Knowl. Data Eng. 17(4), 567–579 (2005)
Article Google Scholar
Govindarajan, S., Agarwal, P.K., Arge, L.: CRB-tree: an efficient indexing scheme for range-aggregate queries. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 143–157. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36285-1_10
Chapter Google Scholar
Tao, Y., Papadias, D.: Range aggregate processing in spatial databases. IEEE Trans. Knowl. Data Eng. 16(12), 1555–1570 (2004)
Article Google Scholar
Ljosa, V., Singh, A.K.: Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering, pp. 566–575 (2008)
Google Scholar
Aji, A., et al.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. PVLDB 6(11), 1009–1020 (2013)
Google Scholar
Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)
Google Scholar
You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops, pp. 34–41 (2015)
Google Scholar
Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL Conference, pp. 70:1–70:4 (2015)
Google Scholar
Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: Locationspark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)
Google Scholar
You-Zhong, M.A., Xiang, C.I., Meng, X.-F.: Parallel top-k join on massive high-dimensional vectors. Chin. J. Comput. 38(1), 86–98 (2015). (in Chinese)
Google Scholar
Kim, Y., Shim, K.: Parallel top-k similarity join algorithms using MapReduce. In: Proceedings of the 28th International Conference on Data Engineering, pp. 510–521 (2012)
Google Scholar
Xu, H., Ding, X., Jin, H., Jiang, W.: Parallel top-k, query processing on uncertain strings using MapReduce. In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications, pp. 89–103 (2015)
Chapter Google Scholar
Liu, Y., Chen, L., Jing, N., Liu, L.: Parallel top-k spatial join query processing on massive spatial data. J. Comput. Res. Dev. 48(1), 163–172 (2011). (in Chinese)
Google Scholar
Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: parallelizing spatial join with Mapreduce on clusters. In: Proceedings of the IEEE International Conference on Cluster Computing, pp. 1–8 (2009)
Google Scholar

Download references

Acknowledgements

This research was supported by the National Key R&D Program of China (NO. 2016YFC1401900 and 2018YFB1004402) and National Natural Science Foundation of China (No. 61872072 and 61073063).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, China
Baiyou Qiao, Bing Hu, Xiyu Qiao, Laigang Yao, Junhai Zhu & Gang Wu

Authors

Baiyou Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Bing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiyu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Laigang Yao
View author publications
You can also search for this author in PubMed Google Scholar
Junhai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baiyou Qiao .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Jie Shao
Hong Kong Polytechnic University, Hong Kong, China
Man Lung Yiu
The University of Tokyo, Tokyo, Japan
Masashi Toyoda
Zhejiang University, Hangzhou, China
Dongxiang Zhang
National University of Singapore, Singapore, Singapore
Wei Wang
Peking University, Beijing, China
Bin Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, B., Hu, B., Qiao, X., Yao, L., Zhu, J., Wu, G. (2019). An Efficient Top-k Spatial Join Query Processing Algorithm on Big Spatial Data. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11642. Springer, Cham. https://doi.org/10.1007/978-3-030-26075-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-26075-0_21
Published: 17 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26074-3
Online ISBN: 978-3-030-26075-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics