Analysis and improvement of map-reduce data distribution in read mapping applications

Espinosa, A.; Hernandez, P.; Moure, J. C.; Protasio, J.; Ripoll, A.

doi:10.1007/s11227-012-0792-8

Analysis and improvement of map-reduce data distribution in read mapping applications

Published: 08 June 2012

Volume 62, pages 1305–1317, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

A. Espinosa¹,
P. Hernandez¹,
J. C. Moure¹,
J. Protasio¹ &
…
A. Ripoll¹

454 Accesses
5 Citations
Explore all metrics

Abstract

The map-reduce paradigm has shown to be a simple and feasible way of filtering and analyzing large data sets in cloud and cluster systems. Algorithms designed for the paradigm must implement regular data distribution patterns so that appropriate use of resources is ensured. Good scalability and performance on Map-Reduce applications greatly depend on the design of regular intermediate data generation-consumption patterns at the map and reduce phases. We describe the data distribution patterns found in current Map-Reduce read mapping bioinformatics applications and show some data decomposition principles to greatly improve their scalability and performance

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RNA-Seq Data Analysis in Galaxy

Big data analytics on Apache Spark

Article 13 October 2016

Big data analytics: a survey

Article Open access 01 October 2015

References

Dean J et al (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113
Article Google Scholar
Bialecki A, Cafarella M, Cutting D, O’Malley O (2005) Hadoop: a framework for running applications on large clusters built of commodity hardware. Wiki at http://hadoop.apache.org/
Shi X (2009) Evaluating MapReduce on virtual machines: the Hadoop case. In: CloudCom 2009. LNCS, vol 5931. Springer, Berlin, pp 519–528
Google Scholar
Schatz M (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369
Article Google Scholar
Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Searching for SNPs with cloud computing. Genome Biol 10:R134
Article Google Scholar
Matthews SJ, Williams TL (2010) MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinform 11:S15
Article Google Scholar
Ranger C, Raghurama R, Penmetsa A, Bradski G, Kozykaris C (2007) Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the 13th international symposium on high-performance computer architecture (HPCA), Phoenix, AZ
Google Scholar
Mao Y, Morris R, Kaashoek MF (2010) Optimizing MapReduce for multicore architectures. Tech Rep, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858
Article Google Scholar
Baeza-Yates RA et al (1992) Fast and practical approximate string matching. In: Proceedings of the combinatorial pattern matching. Third annual symposium, Tucson, pp 185–192
Chapter Google Scholar
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714
Article Google Scholar
Smith AD et al (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinform 9:128
Article Google Scholar
Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM symposium on cloud computing. ACM, New York
Google Scholar
Palla K (2009) A comparative analysis of join algorithms using the Hadoop Map/Reduce framework. Master of science thesis. School of informatics, University of Edinburgh

Download references

Acknowledgements

We want to thank Eduard Ayguade, David Carrera and the staff at Barcelona Supercomputing Center (BSC) for their help and support to the usage of the IBM Blade computer cluster.

This paper was supported by Consolider Project CSD2007-00050 of the Spanish Ministerio de Ciencia y Tecnologia.

Author information

Authors and Affiliations

Computer Architecture and Operating Systems Department, Universitat Autonoma de Barcelona, 08193, Bellaterra, Spain
A. Espinosa, P. Hernandez, J. C. Moure, J. Protasio & A. Ripoll

Authors

A. Espinosa
View author publications
You can also search for this author in PubMed Google Scholar
P. Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Moure
View author publications
You can also search for this author in PubMed Google Scholar
J. Protasio
View author publications
You can also search for this author in PubMed Google Scholar
A. Ripoll
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Espinosa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Espinosa, A., Hernandez, P., Moure, J.C. et al. Analysis and improvement of map-reduce data distribution in read mapping applications. J Supercomput 62, 1305–1317 (2012). https://doi.org/10.1007/s11227-012-0792-8

Download citation

Published: 08 June 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s11227-012-0792-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis and improvement of map-reduce data distribution in read mapping applications

Abstract

Access this article

Similar content being viewed by others

RNA-Seq Data Analysis in Galaxy

Big data analytics on Apache Spark

Big data analytics: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis and improvement of map-reduce data distribution in read mapping applications

Abstract

Access this article

Similar content being viewed by others

RNA-Seq Data Analysis in Galaxy

Big data analytics on Apache Spark

Big data analytics: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation