Skip to main content

Algorithms for Computing Bidirectional Best Hit r-Window Gene Clusters

  • Conference paper
  • 695 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6681))

Abstract

Genome rearrangements are large-scale mutations that result in a shuffling of the genes on a genome. Despite these rearrangements, whole genome analysis of modern species has revealed sets of genes that are found close to one another in multiple species. These conserved gene clusters provide useful information on gene function and genome evolution. In this paper, we consider a novel gene cluster model called bidirectional best hit r -window (BBHRW) in which the idea is to (a) capture the “frequency of common genes” in an r-window (interval of at most r consecutive genes) of each genome and (b) to further strengthen it by the bidirectional best hit criteria. We define two variants of BBHRW using two different similarity measures to define the “frequency of common genes” in two r-windows. Then the algorithmic problem is as follows: Give two genomes of length n and m, and an integer r, compute all the BBHRW clusters. A straight-forward algorithm for solving this problem is an O(nm) algorithm that compares all pairs of r-windows. In this paper, we present faster algorithms (SWBST and SWOT) for solving these two BBHRW variants. Algorithm SWBST is a simpler algorithm that solves the first variant of the BBHRW, while algorithm SWOT solves both variants of the BBHRW. Both algorithms have running time \(O((n+m) r \lg r)\). The algorithmic speed-up is achieved via a sliding window approach and with the use of efficient data structures. We implemented the algorithms and compare their running times for finding BBHRW clusters conserved in E. coli K-12 (2339 genes) and B. subtilis (2332 genes) with r from 1 to 30 to illustrate the speed-up achieved. We also compare the two similarity measures for these genomes to show that the choice of similarity measure is an important factor for this cluster model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Béal, M.P., Bergeron, A., Corteel, S., Raffinot, M.: An algorithmic view of gene teams. Theoretical Computer Science 320(2-3), 395–418 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bentley, J.: Solutions to Klee’s rectangle problems. Unpublished manuscript, Dept. of Comp. Sci., Carnegie-Mellon University, Pittsburgh (1977)

    Google Scholar 

  3. Didier, G., Schmidt, T., Stoye, J., Tsur, D.: Character sets of strings. Journal of Discrete Algorithms 5(2), 330–340 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of homology families. Journal of Computational Biology 12(6), 638–656 (2005)

    Article  Google Scholar 

  5. Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  6. Moreno-Hagelsieb, G., Latimer, K.: Choosing BLAST options for better detection of orthologs as reciprocal best hits. Bioinformatics 24(3), 319 (2008)

    Article  Google Scholar 

  7. Sankoff, D.: Rearrangements and chromosomal evolution. Current Opinion in Genetics & Development 13(6), 583–587 (2003)

    Article  Google Scholar 

  8. Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., Koonin, E.V.: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Research 29(1), 22–28 (2001)

    Article  Google Scholar 

  9. Uno, T., Yagiura, M.: Fast algorithms to enumerate all common intervals of two permutations. Algorithmica 26(2), 290–309 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Zhang, M., Leong, H.W.: Bidirectional best hit r-window gene clusters. BMC Bioinformatics 11(suppl. 1), S63 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Le, T.D., Zhang, M., Leong, H.W. (2011). Algorithms for Computing Bidirectional Best Hit r-Window Gene Clusters. In: Atallah, M., Li, XY., Zhu, B. (eds) Frontiers in Algorithmics and Algorithmic Aspects in Information and Management. Lecture Notes in Computer Science, vol 6681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21204-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21204-8_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21203-1

  • Online ISBN: 978-3-642-21204-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics