Abstract
In metagenomics, the population sequencing is an approach to recover the genomic sequences in the genetically diverse environment. Combined with the recently developed next generation sequencing platform, mategenomics data analysis has greatly enlarged the size of sequencing datasets and decreased the cost. The complete and accurate assembly of sequenced reads from an environmental sample improves the efficiency of genome functional and taxonomical classification. A common bottleneck of the available tools is the high computing requirement for efficiently assembling vast amounts of data generated from large-scale sequencing projects. To address these limitations, we developed a parallel strategy to accelerate computation and boost accuracy. We also presented an instance of this strategy for a state-of-the-art assembly tool, Genovo, on Apache hadoop platform. As a demonstration of the capability of our approach, we compared the performance of our method to two other short read assembly programs on a series of synthetic and real datasets created using the 454 platform, the largest of which has 683k reads. Under the parallel strategy, the ability of reconstruction of bases outperformed other tools both on speed and several assembly evaluation metrics
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wu, X., Cai, Z., Wan, X.F., Hoang, T., Goebel, R., Lin, G.: Nucleotide composition string selection in hiv-1 subtyping using whole genomes. Bioinformatics 23(14), 1744–1752 (2007)
Gill, S.R., Pop, M., DeBoy, R.T., Eckburg, P.B., Turnbaugh, P.J., Samuel, B.S., Gordon, J.I., Relman, D.A., Fraser-Liggett, C.M., Nelson, K.E.: Metagenomic analysis of the human distal gut microbiome. Science 312(5778), 1355–1359 (2006)
Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.H., Smith, H.O.: Environmental genome shotgun sequencing of the sargasso sea. Science 304(5667), 66–74 (2004)
Qin, J.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature (2009)
Khachatryan, Z.A., Ktsoyan, Z.A., Manukyan, G.P., Kelly, D., Ghazaryan, K.A., Aminov, R.I.: Predominant Role of Host Genetics in Controlling the Composition of Gut Microbiota. PLoS ONE 3(8), e3064 (2008)
Nguyen, K.D.: On the edge of web-based multiple sequence alignment services. Tsinghua Science and Technology 17(6), 629–637 (2012)
Turnbaugh, P.J.: A core gut microbiome in obese and lean twins. Nature (2009)
Pignatelli, M., Moya, A.: Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data. PLoS ONE 6(5), e19984 (2011)
Namiki, T., Hachiya, T., Tanaka, H., Sakakibara, Y.: Metavelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. In: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011, pp. 116–124. ACM, New York (2011)
Laserson, J., Jojic, V., Koller, D.: Genovo: de novo assembly for metagenomes. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 341–356. Springer, Heidelberg (2010)
Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: Meta-idba: a de novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Grillo, G., Attimonelli, M., Liuni, S., Pesole, G.: Cleanup: a fast computer program for removing redundancies from nucleotide sequence databases. Computer Applications in the Biosciences: CABIOS 12(1), 1–8 (1996)
Smith, T., Waterman, M., Fitch, W.: Comparative biosequence metrics. Journal of Molecular Evolution 18, 38–46 (1981)
Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2(3), 231–239 (1988)
Lasken, R., Stockwell, T.: Mechanism of chimera formation during the multiple displacement amplification reaction. BMC Biotechnology 7(1), 19 (2007)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim–A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10), e3373 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, X., Ding, X., Meng, Y., Pan, Y. (2013). Cloud Computing for De Novo Metagenomic Sequence Assembly. In: Cai, Z., Eulenstein, O., Janies, D., Schwartz, D. (eds) Bioinformatics Research and Applications. ISBRA 2013. Lecture Notes in Computer Science(), vol 7875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38036-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-38036-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38035-8
Online ISBN: 978-3-642-38036-5
eBook Packages: Computer ScienceComputer Science (R0)