Data-Aware Partitioning Schema in MapReduce

Junjie, Liang; Qiongni, Liu; Li, Yin; Dunhui, Yu

doi:10.1007/978-3-662-46248-5_12

Data-Aware Partitioning Schema in MapReduce

Liang Junjie¹⁸,
Liu Qiongni¹⁸,
Yin Li¹⁸ &
…
Yu Dunhui¹⁸

Conference paper

1985 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 503))

Abstract

With the advantages of MapReduce programming model in parallel computing and processing of data and tasks on large-scale clusters, a Data-aware partitioning schema in MapReduce for large-scale high-dimensional data is proposed. It optimizes partition method of data blocks with the same contribution to computation in MapReduce. Using a two-stage data partitioning strategy, the data are uniformly distributed into data blocks by clustering and partitioning. The experiments show that the data-aware partitioning schema is very effective and extensible for improving the query efficiency of high-dimensional data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, C., Li, F.F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proc. of the 15th Int’l Conf. on Extending Database Technology (EDBT), pp. 38–49 (2012), doi:10.1145/2247596.2247602
Google Scholar
Doulkeridis, C., Norvag, K.: A survey of large-scale analytical query processing in MapReduce. The VLDB Journal 23, 355–380 (2014), doi:10.1007/s00778-013-0319-9
Article Google Scholar
Vlachou, A., Doulkeridis, C., Norvag, K.: Distributed top-k query processing by exploiting skyline summaries. Distrib. Parallel Database 30, 239–271 (2012), doi:10.1007/s10619-012-7094-2
Article Google Scholar
Yingjie, S., Xiaofeng, M.: A survey of query techniques in cloud data management systems. Chinese Journal of Computers 36(2), 209–225 (2013)
Google Scholar
Eltabakh, M.Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: flexible data placement and its exploitation in Hadoop. Proc.VLDB Endow. (PVLDB) 4(9), 575–585 (2011)
Google Scholar
Zaschke, T., Zimmerli, C., Norrie, M.C.: The PH-Tree: a space-efficient storage structure and multi-dimensional index. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 397–408 (2014), doi:10.1007/s00778-013-0319-9
Google Scholar
Yi, L., Ning, J., Luo, C., Wei, X.: Algorithm for Processing k-Nearest Join Based on R-Tree in MapReduce. Journal of Software 24, 1836–1851 (2013), doi:10.3724/SP.J.1001.2013.04377
MATH Google Scholar
Junjie, L., Yucai, F.: BC-iDistance: an Optimized High-Dimensional Index for KNN Processing. Journal of Harbin Institute of Technology 6(15), 856–861 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Hubei University, Wuhan, 430062, China
Liang Junjie, Liu Qiongni, Yin Li & Yu Dunhui

Authors

Liang Junjie
View author publications
You can also search for this author in PubMed Google Scholar
Liu Qiongni
View author publications
You can also search for this author in PubMed Google Scholar
Yin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu Dunhui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Hongzhi Wang & Wanxiang Che &
School of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
Haoliang Qi & Zhongyuan Han &
Northeast Forestry University, Harbin, China
Zhaowen Qiu
Heilongjiang Institute of Technology, Harbin, China
Leilei Kong
Harbin Engineering University, China
Junyu Lin
Zhongkeyunhai Company, Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Junjie, L., Qiongni, L., Li, Y., Dunhui, Y. (2015). Data-Aware Partitioning Schema in MapReduce. In: Wang, H., et al. Intelligent Computation in Big Data Era. ICYCSEE 2015. Communications in Computer and Information Science, vol 503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46248-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-662-46248-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46247-8
Online ISBN: 978-3-662-46248-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics