Skip to main content

Linear-Time Computation of Prefix Table for Weighted Strings

  • Conference paper
  • First Online:
Combinatorics on Words (WORDS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9304))

Included in the following conference series:

  • International Conference on Combinatorics on Words

Abstract

The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z, defined as the minimal probability of occurrence of factors in x, we present an \(\mathcal {O}(n)\)-time algorithm for computing the prefix table of x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nomenclature Committee of the International Union of Biochemistry: (NC-IUB). Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations (1984). Eur. J. Biochem. 150(1), 1–5 (1985)

    Google Scholar 

  2. Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 188–199. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted LCS. J. Discrete Algorithms 8(3), 273–281 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Amir, A., Iliopoulos, C.S., Kapah, O., Porat, E.: Approximate matching in weighted sequences. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 365–376. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Barton, C., Iliopoulos, C.S., Pissis, S.P.: Optimal computation of all tandem repeats in a weighted sequence. Algorithms Mol. Biol. 9(21), 1–8 (2014)

    Google Scholar 

  6. Barton, C., Iliopoulos, C.S., Pissis, S.P., Smyth, W.F.: Fast and simple computations using prefix tables under hamming and edit distance. In: Jan, K., Miller, M., Froncek, D. (eds.) IWOCA 2014. LNCS, vol. 8986, pp. 49–61. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  7. Christodoulakis, M., Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K., Tsichlas, K.: Computation of repetitions and regularities of biologically weighted sequences. J. Comput. Biol. 13(6), 1214–1231 (2006)

    Article  MathSciNet  Google Scholar 

  8. Iliopoulos, C.S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: The weighted suffix tree: an efficient data structure for handling molecular weighted sequences and its applications. Fundam. Inf. 71(2–3), 259–277 (2006)

    MathSciNet  MATH  Google Scholar 

  9. Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K.: Computing the repetitions in a biological weighted sequence. J. Automata Lang. Comb. 10(5/6), 687–696 (2005)

    MathSciNet  MATH  Google Scholar 

  10. Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Yan, T., Yoo, D., Berardini, T.Z., Mueller, L.A., Weems, D.C., Weng, S., Cherry, J.M., Rhee, S.Y.: PatMatch: a program for finding patterns in peptide and nucleotide sequences. Nucleic Acids Res. 33(suppl. 2), W262–W266 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Barton, C., Pissis, S.P. (2015). Linear-Time Computation of Prefix Table for Weighted Strings. In: Manea, F., Nowotka, D. (eds) Combinatorics on Words. WORDS 2015. Lecture Notes in Computer Science(), vol 9304. Springer, Cham. https://doi.org/10.1007/978-3-319-23660-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23660-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23659-9

  • Online ISBN: 978-3-319-23660-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics