Skip to main content

Data-Aware Partitioning Schema in MapReduce

  • Conference paper
  • 1985 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 503))

Abstract

With the advantages of MapReduce programming model in parallel computing and processing of data and tasks on large-scale clusters, a Data-aware partitioning schema in MapReduce for large-scale high-dimensional data is proposed. It optimizes partition method of data blocks with the same contribution to computation in MapReduce. Using a two-stage data partitioning strategy, the data are uniformly distributed into data blocks by clustering and partitioning. The experiments show that the data-aware partitioning schema is very effective and extensible for improving the query efficiency of high-dimensional data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, C., Li, F.F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proc. of the 15th Int’l Conf. on Extending Database Technology (EDBT), pp. 38–49 (2012), doi:10.1145/2247596.2247602

    Google Scholar 

  2. Doulkeridis, C., Norvag, K.: A survey of large-scale analytical query processing in MapReduce. The VLDB Journal 23, 355–380 (2014), doi:10.1007/s00778-013-0319-9

    Article  Google Scholar 

  3. Vlachou, A., Doulkeridis, C., Norvag, K.: Distributed top-k query processing by exploiting skyline summaries. Distrib. Parallel Database 30, 239–271 (2012), doi:10.1007/s10619-012-7094-2

    Article  Google Scholar 

  4. Yingjie, S., Xiaofeng, M.: A survey of query techniques in cloud data management systems. Chinese Journal of Computers 36(2), 209–225 (2013)

    Google Scholar 

  5. Eltabakh, M.Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: flexible data placement and its exploitation in Hadoop. Proc.VLDB Endow. (PVLDB) 4(9), 575–585 (2011)

    Google Scholar 

  6. Zaschke, T., Zimmerli, C., Norrie, M.C.: The PH-Tree: a space-efficient storage structure and multi-dimensional index. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 397–408 (2014), doi:10.1007/s00778-013-0319-9

    Google Scholar 

  7. Yi, L., Ning, J., Luo, C., Wei, X.: Algorithm for Processing k-Nearest Join Based on R-Tree in MapReduce. Journal of Software 24, 1836–1851 (2013), doi:10.3724/SP.J.1001.2013.04377

    MATH  Google Scholar 

  8. Junjie, L., Yucai, F.: BC-iDistance: an Optimized High-Dimensional Index for KNN Processing. Journal of Harbin Institute of Technology 6(15), 856–861 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Junjie, L., Qiongni, L., Li, Y., Dunhui, Y. (2015). Data-Aware Partitioning Schema in MapReduce. In: Wang, H., et al. Intelligent Computation in Big Data Era. ICYCSEE 2015. Communications in Computer and Information Science, vol 503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46248-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46248-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46247-8

  • Online ISBN: 978-3-662-46248-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics