Abstract
The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z, defined as the minimal probability of occurrence of factors in x, we present an \(\mathcal {O}(n)\)-time algorithm for computing the prefix table of x.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nomenclature Committee of the International Union of Biochemistry: (NC-IUB). Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations (1984). Eur. J. Biochem. 150(1), 1–5 (1985)
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 188–199. Springer, Heidelberg (2006)
Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted LCS. J. Discrete Algorithms 8(3), 273–281 (2010)
Amir, A., Iliopoulos, C.S., Kapah, O., Porat, E.: Approximate matching in weighted sequences. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 365–376. Springer, Heidelberg (2006)
Barton, C., Iliopoulos, C.S., Pissis, S.P.: Optimal computation of all tandem repeats in a weighted sequence. Algorithms Mol. Biol. 9(21), 1–8 (2014)
Barton, C., Iliopoulos, C.S., Pissis, S.P., Smyth, W.F.: Fast and simple computations using prefix tables under hamming and edit distance. In: Jan, K., Miller, M., Froncek, D. (eds.) IWOCA 2014. LNCS, vol. 8986, pp. 49–61. Springer, Heidelberg (2015)
Christodoulakis, M., Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K., Tsichlas, K.: Computation of repetitions and regularities of biologically weighted sequences. J. Comput. Biol. 13(6), 1214–1231 (2006)
Iliopoulos, C.S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: The weighted suffix tree: an efficient data structure for handling molecular weighted sequences and its applications. Fundam. Inf. 71(2–3), 259–277 (2006)
Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K.: Computing the repetitions in a biological weighted sequence. J. Automata Lang. Comb. 10(5/6), 687–696 (2005)
Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008)
Yan, T., Yoo, D., Berardini, T.Z., Mueller, L.A., Weems, D.C., Weng, S., Cherry, J.M., Rhee, S.Y.: PatMatch: a program for finding patterns in peptide and nucleotide sequences. Nucleic Acids Res. 33(suppl. 2), W262–W266 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Barton, C., Pissis, S.P. (2015). Linear-Time Computation of Prefix Table for Weighted Strings. In: Manea, F., Nowotka, D. (eds) Combinatorics on Words. WORDS 2015. Lecture Notes in Computer Science(), vol 9304. Springer, Cham. https://doi.org/10.1007/978-3-319-23660-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-23660-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23659-9
Online ISBN: 978-3-319-23660-5
eBook Packages: Computer ScienceComputer Science (R0)