Atrak: a MapReduce-based data warehouse for big data

Barkhordari, Mohammadhossein; Niamanesh, Mahdi

doi:10.1007/s11227-017-2037-3

Atrak: a MapReduce-based data warehouse for big data

Published: 21 April 2017

Volume 73, pages 4596–4610, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mohammadhossein Barkhordari¹ &
Mahdi Niamanesh¹

564 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

As warehouse data volumes expand, single-node solutions can no longer analyze the immense volume of data. Therefore, it is necessary to use shared nothing architectures such as MapReduce. Inter-node data segmentation in MapReduce creates node connectivity issues, network congestion, improper use of node memory capacity and inefficient processing power. In addition, it is not possible to change dimensions and measures without changing previously stored data and big dimension management. In this paper, a method called Atrak is proposed, which uses a unified data format to make Mapper nodes independent to solve the data management problem mentioned earlier. The proposed method can be applied to star schema data warehouse models with distributive measures. Atrak increases query execution speed by employing node independence and the proper use of MapReduce. The proposed method was compared to established methods such as Hive, Spark-SQL, HadoopDB and Flink. Simulation results confirm improved query execution speed of the proposed method. Using data unification in MapReduce can be used in other fields, such as data mining and graph processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Mohammad Hossein Barkhordari Query Language.

References

Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Krishnan K (2013) Data warehousing in the age of big data. Newnes, p 23
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam, p 51
Doulkeridis C, Nørvåg K (2014) A survey of large-scale analytical query processing in MapReduce. VLDB J 23(3):355–380
Article Google Scholar
Eltabakh MY et al (2011) CoHadoop: flexible data placement and its exploitation in Hadoop. Proc VLDB Endow 4.9:575–585
Article Google Scholar
Lin Y et al (2011) Llama: leveraging columnar storage for scalable join processing in the mapreduce framework. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM
Chen S (2010) Cheetah: a high performance, custom data warehouse on top of MapReduce. Proc VLDB Endow 3(1–2):1459–1468
Article Google Scholar
He Y et al (2011) RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: IEEE 27th International Conference on Data Engineering (ICDE), 2011, IEEE
Floratou A et al (2011) Column-oriented storage techniques for MapReduce. Proc VLDB Endow 4.7:419–429
Article Google Scholar
Nykiel T et al (2010) MRShare: sharing across multiple queries in MapReduce. Proc VLDB Endow 3.1–2:494–505
Article Google Scholar
Elghandour I, Aboulnaba A (2012) ReStore: reusing results of MapReduce jobs. Proc VLD B Endow 5.6:586–597
Article Google Scholar
Olston C et al (2008) Pig latin: a not so foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM
Dittrich J et al (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc VLDB Endow 3.1–2:515–529
Article Google Scholar
Dittrich J et al (2012) Only aggressive elephants are fast elephants. Proc Endow 5.11:1591–16902
Article Google Scholar
Abouzeid A et al (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 2.1:922–933
Article Google Scholar
Vernica R et al (2012) Adaptive MapReduce using situation aware mappers. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM
Kaldewey T, Shekita EJ, Tata S (2012) Clydesdale: structured data processing on MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM
Thusoo A et al (2009) Hive: a warehousing solution over a MapReduce framework. Proc VLDB Endow 2.2:1626–1629
Article Google Scholar
Engle C et al (2012) Shark: fast data analysis using coarse-grained distributed memory. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, pp 689–692
Armbrust M et al (2015) Spark SQL: relational data processing in Spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM
Zaharia M et al (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, vol 10
Carbone P et al (2015) Apache flink: stream and batch processing in a single engine. Data Eng 28
http://redis.io/
http://www.postgresql.org
http://www.tpc.org/tpcds/
http://www.ubuntu.com/download/server
http://hadoop.apache.org/
https://hive.apache.org/downloads.html
http://spark.apache.org/
https://sourceforge.net/projects/hadoopdb/
http://flink.apache.org/
Barkhordari M, Niamanesh M (2017) Aras: a method with uniform distributed dataset to solve data warehouse problems for big data. Int J Distrib Sys Technol (IJDST) 8(2):47–60
Barkhordari M, Niamanesh M (2017) ScaDiGraph: a MapReduce-based method for solving graph problems. J Inf Sci Eng 33(1)
Barkhordari M, Niamanesh M (2014) ScadiBino: an effective MapReduce-based association rule mining method. In: Proceedings of the 16th International Conference on Electronic Commerce. ACM
Barkhordari M, Niamanesh M (2015) ScaDiPaSi: an effective scalable and distributable MapReduce-based method to find patient similarity on huge healthcare networks. Big Data Res 2(1):19–27
Article Google Scholar

Download references

Author information

Authors and Affiliations

Advance Information System Research Group for Information and Communication Technology Research Centre, No 5, Saeedialley, College Intersection, Enghelab Street, Tehran, 1599616313, Iran
Mohammadhossein Barkhordari & Mahdi Niamanesh

Authors

Mohammadhossein Barkhordari
View author publications
You can also search for this author inPubMed Google Scholar
Mahdi Niamanesh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohammadhossein Barkhordari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barkhordari, M., Niamanesh, M. Atrak: a MapReduce-based data warehouse for big data. J Supercomput 73, 4596–4610 (2017). https://doi.org/10.1007/s11227-017-2037-3

Download citation

Published: 21 April 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11227-017-2037-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Atrak: a MapReduce-based data warehouse for big data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Chabok: a Map-Reduce based method to solve data warehouse problems

SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Atrak: a MapReduce-based data warehouse for big data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Chabok: a Map-Reduce based method to solve data warehouse problems

SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now