Klassisches Multi-threading versus MapReduce zur Parallelisierung rechenintensiver Tasks in der Amazon Cloud

Mandl, Peter; Döschl, Alexander

doi:10.1365/s40702-017-0360-z

Klassisches Multi-threading versus MapReduce zur Parallelisierung rechenintensiver Tasks in der Amazon Cloud

Classical Multi-threading Versus MapReduce to Schedule CPU-intensive Tasks in parallel Using Amazon Cloud

Spektrum
Published: 14 September 2017

Volume 55, pages 445–461, (2018)
Cite this article

HMD Praxis der Wirtschaftsinformatik Aims and scope Submit manuscript

379 Accesses
Explore all metrics

Zusammenfassung

Der vorliegende Beitrag befasst sich mit dem Vergleich von rechenintensiven multi-threaded und MapReduce-Lösungen in einer Amazon Cloud unter Nutzung der Amazon AWS-Dienste EC2 und EMR. Als Fallbeispiel für unsere Experimente wurde ein einfaches, aber rechenintensives Geduldsspiel verwendet. Zur Ermittlung aller Lösungen mit der Brute-Force-Methode mussten 15! Permutationen ermittelt und jeweils auf die Lösungsregeln hin getestet werden. Die Implementierung unserer Experimentierlösung erfolgte in der Programmiersprache Java mit einem einfachen multi-threaded Algorithmus und alternativ mit einem MapReduce-Algorithmus. Die Lösungen wurden in Amazon-EC2/EMR-Clustern auf ihre Leistungsfähigkeit und Skalierbarkeit hin verglichen. Die Hadoop-Verarbeitungszeit skalierte annähernd linear (leicht sublinear). Für die Beurteilung der Skalierbarkeit sollten aber unseren Experimenten zufolge auch die Anzahl an Inputsplits, die Auslastung der Hardware und weitere Aspekte herangezogen werden. Der Vergleich der multi-threaded mit der MapReduce-Lösung unter Amazon EMR (Apache Hadoop) ergab, dass die Verarbeitungszeit gemessen in CPU-Minuten bei MapReduce um mehr als 30 % höher war.

Abstract

This article compares CPU-intensive multi-threaded with MapReduce solutions running in the Amazon Cloud using the AWS services EC2 and EMR. As a case study for our experiments a simple, but compute-intensive puzzle was used. In order to compute all solutions of the puzzle with a brute-force method, 15! permutations had to be calculated and tested against the rules of the puzzle. Java was used to implement our experimental solutions. We used a simple multi-threaded algorithm and a MapReduce algorithm alternatively. We compared our solutions concerning performance and scalability by using an Amazon ECR/EMR cluster. The processing time with Hadoop behaves approximately linear (slightly sublinear). In the assessment of scalability according to our experiments also the number of input splits, the hardware utilization and other aspects should be taken into account. The comparison based on Amazon EMR (Apache Hadoop) shows a 30 percent higher processing time of the MapReduce solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Siehe http://stackoverflow.com/a/14444037. Zugegriffen am: 03. Juni 2017.
Siehe https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/LongAdder.html. Zugegriffen: 19. Juni 2017.
Das Amdahlsche Gesetz besagt, dass mit steter Hinzunahme von Prozessoren der zu erzielende Zeitgewinn immer kleiner wird und sich sogar zu einem Zeitverlust entwickeln kann, da sich die Parallelisierung nicht auf die sequentiell auszuführenden Programmteile (Initialisierung, Synchronisation, usw.) auswirkt.

Literatur

Amazon AWS (2017) https://aws.amazon.com/de/. Zugegriffen: 4. Juni 2017
Amazon Calculator (2017) http://calculator.s3.amazonaws.com/index.html#s=EMR. Zugegriffen: 4. Juni 2017
Amazon EC2 (2017) https://aws.amazon.com/de/ec2/instance-types/. Zugegriffen: 4. Juni 2017
Amazon EMR (2017) https://aws.amazon.com/de/emr/. Zugegriffen: 4. Juni 2017
Amazon S3 (2017) https://aws.amazon.com/de/s3/. Zugegriffen: 19. Juni 2017
Apache Hadoop (2017) http://hadoop.apache.org/. Zugegriffen: 5. Juni 2017
CCWI GitHub (2017) https://github.com/CCWI. Zugegriffen: 30. Juni 2017
Gunther N, Puglia P, Tomasette K (2015) Hadoop Superlinear Scalability, The perpetual motion of parallel performance. ACM Queue 13(5). https://doi.org/10.1145/2773212.2789974
JoaCerreia (2017) www.joaocerreia.de. Zugegriffen: 4. Juni 2017
Ullenboom C (2017) http://openbook.rheinwerk-verlag.de/javainsel9/javainsel_14_001.htm#mj43de7374f351a9fba442f8d3b0f02d3e. Zugegriffen: 3. Juni 2017
Wartala R (2012) Hadoop Zuverlässige, verteilte und skalierbare Big-Data-Anwendungen. Open Source Press, München
Google Scholar
Wittig A, Wittig M (2016) Amazon web services in action. Manning Publications Co, Shelter Island
MATH Google Scholar
Wordaligned (2017) http://wordaligned.org/articles/next-permutation. Zugegriffen: 3. Juni 2017

Download references

Author information

Authors and Affiliations

Fakultät für Informatik und Mathematik, Competence Center Wirtschaftsinformatik, Hochschule für angewandte Wissenschaften München, Lothstraße 34, 80334, München, Deutschland
Peter Mandl & Alexander Döschl

Authors

Peter Mandl
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Döschl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Mandl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandl, P., Döschl, A. Klassisches Multi-threading versus MapReduce zur Parallelisierung rechenintensiver Tasks in der Amazon Cloud. HMD 55, 445–461 (2018). https://doi.org/10.1365/s40702-017-0360-z

Download citation

Received: 27 June 2017
Accepted: 06 September 2017
Published: 14 September 2017
Issue Date: April 2018
DOI: https://doi.org/10.1365/s40702-017-0360-z

Schlüsselwörter

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Klassisches Multi-threading versus MapReduce zur Parallelisierung rechenintensiver Tasks in der Amazon Cloud

Zusammenfassung

Abstract

Access this article

Notes

Literatur

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Schlüsselwörter

Keywords

Search

Navigation