Linear-Time Computation of Prefix Table for Weighted Strings

Barton, Carl; Pissis, Solon P.

doi:10.1007/978-3-319-23660-5_7

Carl Barton¹⁵ &
Solon P. Pissis¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9304))

Included in the following conference series:

International Conference on Combinatorics on Words

452 Accesses
2 Citations

Abstract

The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z, defined as the minimal probability of occurrence of factors in x, we present an \(\mathcal {O}(n)\)-time algorithm for computing the prefix table of x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nomenclature Committee of the International Union of Biochemistry: (NC-IUB). Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations (1984). Eur. J. Biochem. 150(1), 1–5 (1985)
Google Scholar
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 188–199. Springer, Heidelberg (2006)
Chapter Google Scholar
Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted LCS. J. Discrete Algorithms 8(3), 273–281 (2010)
Article MathSciNet MATH Google Scholar
Amir, A., Iliopoulos, C.S., Kapah, O., Porat, E.: Approximate matching in weighted sequences. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 365–376. Springer, Heidelberg (2006)
Chapter Google Scholar
Barton, C., Iliopoulos, C.S., Pissis, S.P.: Optimal computation of all tandem repeats in a weighted sequence. Algorithms Mol. Biol. 9(21), 1–8 (2014)
Google Scholar
Barton, C., Iliopoulos, C.S., Pissis, S.P., Smyth, W.F.: Fast and simple computations using prefix tables under hamming and edit distance. In: Jan, K., Miller, M., Froncek, D. (eds.) IWOCA 2014. LNCS, vol. 8986, pp. 49–61. Springer, Heidelberg (2015)
Chapter Google Scholar
Christodoulakis, M., Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K., Tsichlas, K.: Computation of repetitions and regularities of biologically weighted sequences. J. Comput. Biol. 13(6), 1214–1231 (2006)
Article MathSciNet Google Scholar
Iliopoulos, C.S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: The weighted suffix tree: an efficient data structure for handling molecular weighted sequences and its applications. Fundam. Inf. 71(2–3), 259–277 (2006)
MathSciNet MATH Google Scholar
Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K.: Computing the repetitions in a biological weighted sequence. J. Automata Lang. Comb. 10(5/6), 687–696 (2005)
MathSciNet MATH Google Scholar
Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008)
Chapter Google Scholar
Yan, T., Yoo, D., Berardini, T.Z., Mueller, L.A., Weems, D.C., Weng, S., Cherry, J.M., Rhee, S.Y.: PatMatch: a program for finding patterns in peptide and nucleotide sequences. Nucleic Acids Res. 33(suppl. 2), W262–W266 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
Carl Barton
Department of Informatics, King’s College London, London, UK
Solon P. Pissis

Authors

Carl Barton
View author publications
You can also search for this author in PubMed Google Scholar
Solon P. Pissis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

Universität Kiel, Kiel, Germany
Florin Manea
Universität Kiel, Kiel, Germany
Dirk Nowotka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barton, C., Pissis, S.P. (2015). Linear-Time Computation of Prefix Table for Weighted Strings. In: Manea, F., Nowotka, D. (eds) Combinatorics on Words. WORDS 2015. Lecture Notes in Computer Science(), vol 9304. Springer, Cham. https://doi.org/10.1007/978-3-319-23660-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-23660-5_7
Published: 27 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23659-9
Online ISBN: 978-3-319-23660-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics