Abstract
BLAST is one of the most popular computational biology tools. The execution cost of BLAST is highly dependent on database sizes, which have considerably increased following all recent advances in sequencing methods. The evaluation of BLAST in distributed and parallel environments like PC clusters and Grids has been largely investigated in order to obtain better performances. This work evaluates a replicated allocation of the (sequences) database, where each copy is also physically fragmented. We investigate two dynamic workload balancing methods that focus on our database allocation strategy. Preliminary practical results show that we achieve both a balanced workload and very good performances. We briefly discuss ideas that would make our approach feasible for Grid computational environments.
Work partially funded by CNPq-INRIA (GriData project).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afgan, E., Sathyanarayana, P., Bangalore, P.: Dynamic Task Distribution in the Grid for BLAST. In: Procs. IEEE Intl. Conference on Granular Computing, pp. 554–557 (2006)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: A Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403–410 (1990)
Chen, S.-N., Tsai, J.J.P., Huang, C.-W., Chen, R.-M., Lin, R.C.: Using Distributed Computing Platform to Solve High Computing and Huge Data Processing Problems in Bioinformatics. In: Procs. IEEE Intl. Symposium on Bioinformatics and Bioengineering (BIBE), pp. 142–148 (2004)
Costa, R.L.D.C., Lifschitz, S.: Database Allocation Strategies for Parallel BLAST Evaluation on Clusters. Distributed and Parallel Databases 13(1), 99–127 (2003)
Costa, R.L.D.C., Lifschitz, S.: Skew Handling for Parallel BLAST Processing. In: II Brazilian Workshop on Bioinformatics, pp. 173–176 (2003)
de Sousa, D.X.: Workload Balancing Strategies for BLAST Parallel Evaluation on Replicated Databases and Primary Fragments, MSc Dissertation, PUC-Rio Departamento de Informatica, p. 85 (2007), ftp://ftp.inf.pucrio.br/pub/docs/theses/07_MSc_sousa.zip
de Sousa, D.X., Lifschitz, S.: E-value Evaluation for BLAST Parallel Execution on Fragmented Databases, Tecnical Report MCC 17/07, PUC-Rio Departamento de Informatica, p.16 (2007), ftp://ftp.inf.pucrio.br/pub/docs/techreports/07_17_sousa.pdf
mpiBLAST, http://www.mpiblast.org/
NCBI-BLAST, http://www.ncbi.nlm.nih.gov/BLAST
Oehmen, C., Nieplocha, J.: ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis. IEEE Transactions of Parallel and Distributed Systems 17, 740–749 (2006)
Pacitti, E., Valduriez, P., Mattoso, M.: Grid Data Management: Open Problems and New Issues. Journal of Grid Computing 5, 273–281 (2007)
Sun, Y., Zhao, S., Yu, H., Gao, G., Luo, J.: ABCGrid: Application for Bioinformatics Computing Grid. Bioinformatics (Applications Note) 23(9), 1175–1177 (2007)
WU-BLAST, http://blast.wustl/edu/
Yang, C.-T., Han, T.-F., Kan, H.-C.: G-BLAST: a Grid-Based Solution for mpiBLAST on Computational Grids. In: Procs. IEEE TENCON 2007, pp. 1–5 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Sousa, D.X., Lifschitz, S., Valduriez, P. (2008). BLAST Distributed Execution on Partitioned Databases with Primary Fragments. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2008. VECPAR 2008. Lecture Notes in Computer Science, vol 5336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92859-1_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-92859-1_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92858-4
Online ISBN: 978-3-540-92859-1
eBook Packages: Computer ScienceComputer Science (R0)