Abstract
A crucial step in DNA sequence analysis is mapping short sequences generated by next-generation instruments to a reference genome. In this paper, we focus on efficient online scheduling of multi-user parallel short sequence mapping queries on a multiprocessor system. With the availability of parallel execution models, the problem at hand becomes a moldable task scheduling problem where the number of processors needed to execute a task is determined by the scheduler. We propose an online scheduling algorithm to minimize the stretch of the tasks in the system. This metric provides improved fairness to small tasks compared to flow time metric and suits well to the nature of the problem. Experimental evaluation on two workload scenarios indicate that the algorithm results in significantly smaller stretch compared to a recent algorithm and it is more fair to small sized tasks.
This work was supported in parts by the U.S. DOE SciDAC Institute Grant DE-FC02-06ER2775; by the U.S. National Science Foundation under Grants CNS-0643969, OCI-0904809, OCI-0904802 and CNS-0403342; and an allocation of computing time from the Ohio Supercomputer Center.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Applied Biosystems, MapReads: SOLiD System Color Space Mapping Tool, http://solidsoftwaretools.com/gf/project/mapreads/
Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of solexa read mapping. BMC Bioinformatics 9(1), 128 (2008)
Li, H., Ruan, J., Durbin, R.: Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)
Li, R., Yu, C., Li, Y., Lam, T.W.W., Yiu, S.M.M., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology 10(3), R25 (2009)
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. National Academy of Sciences 85, 2444–2448 (1988)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7(1/2), 203–214 (2000)
Davies, K.: Pacific Biosciences preparing the 15-minute genome by 2013. Bio IT World (2008)
Bozdag, D., Barbacioru, C.C., Catalyurek, U.: Parallel short sequence mapping for high throughput genome sequencing. In: Proc. of the International Parallel and Distributed Processing Symposium (2009)
Turek, J., Wolf, J.L., Yu, P.S.: Approximate algorithms scheduling parallelizable tasks. In: Proc. of the fourth Symposium on Parallel Algorithms and Architectures, pp. 323–332. ACM, New York (1992)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)
Bender, M., Muthukrishnan, S., Rajaraman, R.: Improved algorithms for stretch scheduling. In: Proc. of the Symposium on Discrete Algorithms, pp. 762–771 (2002)
Legrand, A., Su, A., Vivien, F.: Minimizing the stretch when scheduling flows of biological requests. In: Proc. of the Symposium on Parallelism in Algorithms and Architectures (2006)
Jansen, K., Porkolab, L.: Linear-time approximation schemes for scheduling malleable parallel tasks. In: Proc. of 10th SODA, pp. 490–498 (1999)
Mounie, G., Rapine, C., Trystram, D.: A 3/2-approximation algorithm for scheduling independent monotonic malleable tasks. SIAM J. Comput. 37(2), 401–412 (2007)
Drozdowski, M., Dell’Olmo, P.: Scheduling multiprocessor tasks for mean flow time criterion. Computers and Operations Research 27(6), 571–585 (2000)
Sabin, G., Lang, M., Sadayappan, P.: Moldable parallel job scheduling using job efficiency: An iterative approach. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2006. LNCS, vol. 4376, pp. 94–114. Springer, Heidelberg (2007)
Srinivasan, S., Subramani, V., Kettimuthu, R., Holenarsipur, P., Sadayappan, P.: Effective selection of partition sizes for moldable scheduling of parallel jobs. In: Sahni, S.K., Prasanna, V.K., Shukla, U. (eds.) HiPC 2002. LNCS, vol. 2552, pp. 174–183. Springer, Heidelberg (2002)
Muthukrishnan, S., Rajaraman, R., Shaheen, A., Gehrke, J.: Online scheduling to minimize average stretch. In: Proc. of FOCS, pp. 433–443 (1999)
Srinivasan, S., Krishnamoorthy, S., Sadayappan, P.: A robust scheduling technology for moldable scheduling of parallel jobs. In: Proc. of Cluster 2003, pp. 92–99 (2003)
Srinivasan, S., Kettimuthu, R., Subramani, V.: Selective reservation strategies for backfill job scheduling. In: Blaze, M. (ed.) FC 2002. LNCS, vol. 2357, pp. 55–71. Springer, Heidelberg (2003)
Garey, M.R., Johnson, D.S.: Computers and Intractability. Freeman, New York (1979)
Feitelson, D.: Parallel workloads archive, http://www.cs.huji.ac.il/labs/parallel/workload/
Downey, A.B.: A parallel workload model and its implications for processor allocation. Cluster Computing 1(1), 133–145 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saule, E., Bozdağ, D., Catalyurek, U.V. (2010). A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2010. Lecture Notes in Computer Science, vol 6253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16505-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-16505-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16504-7
Online ISBN: 978-3-642-16505-4
eBook Packages: Computer ScienceComputer Science (R0)