Skip to main content

Parallel Biological Sequence Comparison in Linear Space with Multiple Adjustable Bands

  • Conference paper
  • First Online:
  • 553 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10252))

Abstract

In this paper, we propose and evaluate Fickett-MM, a parallel strategy that combines the algorithms Fickett and Myers-Miller, splitting a pairwise sequence comparison into multiple comparisons of subsequences and calculating an appropriate Fickett band to each subsequence comparison (block). With this approach, we potentially reduce the number of cells calculated in the dynamic programming matrix when compared to Fickett, which uses a unique band to the whole comparison. Our adjustable multi-block strategy was integrated to the stage 4 of CUDAlign, a state-of-the-art parallel tool for optimal biological sequence comparison. Fickett-MM was used to compare real DNA sequences whose sizes ranged from 10KBP (Thousands of Base Pairs) to 47MBP (Millions of Base Pairs), reaching a speedup of 59.60\(\times \) in the 10MBP \(\times \) 10MBP comparison when compared to CUDAlign stage 4.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1999)

    MATH  Google Scholar 

  2. Fickett, J.W.: Fast optimal alignments. Nucleic Acids Res. 11, 175–179 (1984)

    Article  Google Scholar 

  3. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)

    Article  Google Scholar 

  4. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  5. Liu, Y., Tam, T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: IEEE International Conference on Cluster Computing, pp. 257–265 (2014)

    Google Scholar 

  6. Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics 14, 117 (2013)

    Article  Google Scholar 

  7. Maleki, S., Musuvathi, M., Mytcowicz, T.: Parallelizing dynamic programming through rank convergence. In: 19th ACM PPoPP, pp. 219–232 (2014)

    Google Scholar 

  8. Myers, E.W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. 4(1), 11–17 (1988)

    Google Scholar 

  9. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  10. de Oliveira Sandes, E.F., Miranda, G., Martorell, X., Ayguade, E., Teodoro, G., de Melo, A.C.M.: CUDAlign 4.0: incremental speculative traceback for exact chromosome-wide alignment in GPU clusters. IEEE Tran. Parallel Dist. Syst. 27(10), 2838–2850 (2016)

    Article  Google Scholar 

  11. de Oliveira Sandes, E.F., Miranda, G., Martorell, X., Ayguade, E., Teodoro, G., de Melo, A.C.M.: MASA: a multiplatform architecture for sequence aligners with block pruning. ACM Trans. Parallel Comput. 2(4), 28 (2016)

    Google Scholar 

  12. Rajko, S., Aluru, S.: Space and time optimal parallel sequence alignments. IEEE Trans. Parallel Distrib. Syst. 15(12), 1070–1081 (2004)

    Article  Google Scholar 

  13. Sarkar, S., Kulkarni, G.R., Pande, P.P., Kalyanaraman, A.: Network-on-chip hardware accelerators for biological sequence alignment. IEEE Trans. Comput. 59(1), 29–41 (2010)

    Article  MathSciNet  Google Scholar 

  14. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  15. Wang, L., Chan, Y., Duan, X., Lan, H., Meng, X., Liu, W.: XSW: accelerating biological database search on Xeon Phi. In: IEEE AsHES, pp. 950–957 (2014)

    Google Scholar 

  16. Wienbrandt, L.: The FPGA-based high-performance computer RIVYERA for applications in bioinformatics. In: Beckmann, A., Csuhaj-Varjú, E., Meer, K. (eds.) CiE 2014. LNCS, vol. 8493, pp. 383–392. Springer, Cham (2014). doi:10.1007/978-3-319-08019-2_40

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alba C. M. A. Melo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Silva, G.H.G., Sandes, E.F.O., Teodoro, G., Melo, A.C.M.A. (2017). Parallel Biological Sequence Comparison in Linear Space with Multiple Adjustable Bands. In: Figueiredo, D., Martín-Vide, C., Pratas, D., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2017. Lecture Notes in Computer Science(), vol 10252. Springer, Cham. https://doi.org/10.1007/978-3-319-58163-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58163-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58162-0

  • Online ISBN: 978-3-319-58163-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics