Abstract
With the advantages of MapReduce programming model in parallel computing and processing of data and tasks on large-scale clusters, a Data-aware partitioning schema in MapReduce for large-scale high-dimensional data is proposed. It optimizes partition method of data blocks with the same contribution to computation in MapReduce. Using a two-stage data partitioning strategy, the data are uniformly distributed into data blocks by clustering and partitioning. The experiments show that the data-aware partitioning schema is very effective and extensible for improving the query efficiency of high-dimensional data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhang, C., Li, F.F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proc. of the 15th Int’l Conf. on Extending Database Technology (EDBT), pp. 38–49 (2012), doi:10.1145/2247596.2247602
Doulkeridis, C., Norvag, K.: A survey of large-scale analytical query processing in MapReduce. The VLDB Journal 23, 355–380 (2014), doi:10.1007/s00778-013-0319-9
Vlachou, A., Doulkeridis, C., Norvag, K.: Distributed top-k query processing by exploiting skyline summaries. Distrib. Parallel Database 30, 239–271 (2012), doi:10.1007/s10619-012-7094-2
Yingjie, S., Xiaofeng, M.: A survey of query techniques in cloud data management systems. Chinese Journal of Computers 36(2), 209–225 (2013)
Eltabakh, M.Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: flexible data placement and its exploitation in Hadoop. Proc.VLDB Endow. (PVLDB) 4(9), 575–585 (2011)
Zaschke, T., Zimmerli, C., Norrie, M.C.: The PH-Tree: a space-efficient storage structure and multi-dimensional index. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 397–408 (2014), doi:10.1007/s00778-013-0319-9
Yi, L., Ning, J., Luo, C., Wei, X.: Algorithm for Processing k-Nearest Join Based on R-Tree in MapReduce. Journal of Software 24, 1836–1851 (2013), doi:10.3724/SP.J.1001.2013.04377
Junjie, L., Yucai, F.: BC-iDistance: an Optimized High-Dimensional Index for KNN Processing. Journal of Harbin Institute of Technology 6(15), 856–861 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Junjie, L., Qiongni, L., Li, Y., Dunhui, Y. (2015). Data-Aware Partitioning Schema in MapReduce. In: Wang, H., et al. Intelligent Computation in Big Data Era. ICYCSEE 2015. Communications in Computer and Information Science, vol 503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46248-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-662-46248-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46247-8
Online ISBN: 978-3-662-46248-5
eBook Packages: Computer ScienceComputer Science (R0)