Indexing Factors with Gaps

Iliopoulos, Costas S.; Rahman, M. Sohel

doi:10.1007/s00453-007-9141-3

Indexing Factors with Gaps

Published: 05 December 2007

Volume 55, pages 60–70, (2009)
Cite this article

Algorithmica Aims and scope Submit manuscript

Costas S. Iliopoulos¹ &
M. Sohel Rahman¹

129 Accesses
17 Citations
Explore all metrics

Abstract

Indexing of factors or substrings is a widely used and useful technique in stringology and can be seen as a tool in solving diverse text algorithmic problems. A gapped-factor is a concatenation of a factor of length k, a gap of length d and another factor of length k′. Such a gapped factor is called a (k−d−k′)-gapped-factor. The problem of indexing the gapped-factors was considered recently by Peterlongo et al. (In: Stringology, pp. 182–196, 2006). In particular, Peterlongo et al. devised a data structure, namely a gapped factor tree (GFT) to index the gapped-factors. Given a text \(\mathcal{T}\) of length n over the alphabet Σ and the values of the parameters k, d and k′, the construction of GFT requires O(n|Σ|) time. Once GFT is constructed, a given (k−d−k′)-gapped-factor can be reported in O(k+k′+Occ) time, where Occ is the number of occurrences of that factor in \(\mathcal{T}\) . In this paper, we present a new improved indexing scheme for the gapped-factors. The improvements we achieve come from two aspects. Firstly, we generalize the indexing data structure in the sense that, unlike GFT, it is independent of the parameters k and k′. Secondly, our data structure can be constructed in O(nlog ^1+ε n) time and space, where 0<ε<1. The only price we pay is a slight increase, i.e. an additional log log n term, in the query time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gapped Indexing for Consecutive Occurrences

Article 20 October 2022

Longest Previous Non-overlapping Factors Table Computation

Fast Indexes for Gapped Pattern Matching

References

Agarwal, P.K., Govindarajan, S., Muthukrishnan, S.: Range searching in categorical data: Colored range searching on grid. In: Möhring, R.H., Raman, R. (eds.) ESA. Lecture Notes in Computer Science, vol. 2461, pp. 17–28. Springer, New York (2002)
Google Scholar
Allali, J., Sagot, M.-F.: The at most k-deep factor tree. Report 2004-03, Institut Gaspard Monge, Université de Marne-la-Vallée (2004)
Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: FOCS, pp. 198–207 (2000)
Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Berlin (1985)
Google Scholar
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinf. 4, 66 (2003)
Article Google Scholar
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: Lagan and multi-lagan: Efficient tools for large-scale multiple alignment of genomic dna. Genome Res. 13(4), 721–731 (2003)
Article Google Scholar
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Babai, L. (ed.) STOC, pp. 91–100. ACM, Singapore (2004)
Chapter Google Scholar
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)
Google Scholar
Crochemore, M., Iliopoulos, C.S., Mohamed, M., Sagot, M.-F.: Longest repeats with a block of don’t cares. Theor. Comput. Sci. 362(1–3), 248–254 (2006)
Article MATH MathSciNet Google Scholar
Edgar, R.C.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)
Article Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS, pp. 137–143 (1997)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) VLDB, pp. 491–500. Morgan Kaufmann, San Mateo (2001)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences—Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
Höhl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. In: ISMB, pp. 312–320 (2002)
Iliopoulos, C.S., McHugh, J.A.M., Peterlongo, P., Pisanti, N., Rytter, W., Sagot, M.-F.: A first approach to finding common motifs with gaps. Int. J. Found. Comput. Sci. 16(6), 1145–1154 (2005)
Article MATH MathSciNet Google Scholar
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP. Lecture Notes in Computer Science, vol. 2719, pp. 943–955. Springer, New York (2003)
Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Constructing suffix arrays in linear time. J. Discrete Algorithms 3(2–4), 126–142 (2005)
Article MATH MathSciNet Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)
Article MATH MathSciNet Google Scholar
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter ii: Highly sensitive and fast homology search. Genome Inf. 14, 164–175 (2003)
Google Scholar
Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Article Google Scholar
Maaß, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM. Lecture Notes in Computer Science, vol. 3537, pp. 21–32. Springer, New York (2005)
Google Scholar
Maaß, M.G., Nowak, J.: Text indexing with errors. J. Discrete Algorithms 5(4), 662–681 (2007). doi:10.1016/j.jda.2006.11.001, selected papers from Combinatorial Pattern Matching (CPM) 2005, December 2007
Article MATH MathSciNet Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
MATH MathSciNet Google Scholar
Michael, M., Dieterich, C., Vingron, M.: Siteblast-rapid and sensitive local alignment of genomic sequences employing motif anchors. Bioinformatics 21(9), 2093–2094 (2005)
Article Google Scholar
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666 (2002)
Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM. Lecture Notes in Computer Science, vol. 1848, pp. 350–363. Springer, New York (2000)
Google Scholar
Peterlongo, P., Allali, J., Sagot, M.-F.: The gapped-factor tree. In: Holub, J., Zdárek, J. (eds.) Stringology, pp. 182–196. Czech Technical University, Prague (2006)
Google Scholar
Rahman, M.S., Iliopoulos, C.S.: Indexing factors with gaps. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plasil, F. (eds.) SOFSEM (1). Lecture Notes in Computer Science, vol. 4362, pp. 465–474. Springer, New York (2007)
Google Scholar
Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON. Lecture Notes in Computer Science, vol. 4112, pp. 146–155. Springer, New York (2006)
Google Scholar
Sutinen, E., Tarhio, J.: On using q-gram locations in approximate string matching. In: Spirakis, P.G. (ed.) ESA. Lecture Notes in Computer Science, vol. 979, pp. 327–340. Springer, Berlin (1995)
Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Algorithm Design Group, Department of Computer Science, King’s College London, Strand, London, WC2R 2LS, England, UK
Costas S. Iliopoulos & M. Sohel Rahman

Authors

Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
M. Sohel Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Sohel Rahman.

Additional information

Preliminary version appeared in [29].

C.S. Iliopoulos is supported by EPSRC and Royal Society grants.

M.S. Rahman is supported by the Commonwealth Scholarship Commission in the UK under the Commonwealth Scholarship and Fellowship Plan (CSFP).

M.S. Rahman is on leave from Department of CSE, BUET, Dhaka 1000, Bangladesh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iliopoulos, C.S., Rahman, M.S. Indexing Factors with Gaps. Algorithmica 55, 60–70 (2009). https://doi.org/10.1007/s00453-007-9141-3

Download citation

Received: 27 November 2006
Accepted: 21 November 2007
Published: 05 December 2007
Issue Date: September 2009
DOI: https://doi.org/10.1007/s00453-007-9141-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indexing Factors with Gaps

Abstract

Access this article

Similar content being viewed by others

Gapped Indexing for Consecutive Occurrences

Longest Previous Non-overlapping Factors Table Computation

Fast Indexes for Gapped Pattern Matching

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indexing Factors with Gaps

Abstract

Access this article

Similar content being viewed by others

Gapped Indexing for Consecutive Occurrences

Longest Previous Non-overlapping Factors Table Computation

Fast Indexes for Gapped Pattern Matching

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation