Efficient Algorithms for Two Extensions of LPF Table: The Power of Suffix Arrays

Crochemore, Maxime; Iliopoulos, Costas S.; Kubica, Marcin; Rytter, Wojciech; Waleń, Tomasz

doi:10.1007/978-3-642-11266-9_25

Efficient Algorithms for Two Extensions of LPF Table: The Power of Suffix Arrays

Maxime Crochemore^21,23,
Costas S. Iliopoulos^21,24,
Marcin Kubica²²,
Wojciech Rytter^22,25 &
…
Tomasz Waleń²²

Conference paper

903 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5901))

Abstract

Suffix arrays provide a powerful data structure to solve several questions related to the structure of all the factors of a string. We show how they can be used to compute efficiently two new tables storing different types of previous factors (past segments) of a string. The concept of a longest previous factor is inherent to Ziv-Lempel factorization of strings in text compression, as well as in statistics of repetitions and symmetries. The longest previous reverse factor for a given position i is the longest factor starting at i, such that its reverse copy occurs before, while the longest previous non-overlapping factor is the longest factor v starting at i which has an exact copy occurring before. The previous copies of the factors are required to occur in the prefix ending at position i − 1. We design algorithms computing the table of longest previous reverse factors (LPrF table) and the table of longest previous non-overlapping factors (LPnF table). The latter table is useful to compute repetitions while the former is a useful tool for extracting symmetries. These tables are computed, using two previously computed read-only arrays (SUF and LCP) composing the suffix array, in linear time on any integer alphabet. The tables have not been explicitly considered before, but they have several applications and they are natural extensions of the LPF table which has been studied thoroughly before. Our results improve on the previous ones in several ways. The running time of the computation no longer depends on the size of the alphabet, which drops a log factor. Moreover the newly introduced tables store additional information on the structure of the string, helpful to improve, for example, gapped palindrome detection and text compression using reverse factors.

Research supported in part by the Royal Society, UK.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bell, T.C., Clearly, J.G., Witten, I.H.: Text Compression. Prentice Hall Inc., New Jersey (1990)
Google Scholar
Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)
Chapter Google Scholar
Böckenhauer, H.-J., Bongartz, D.: Algorithmic Aspects of Bioinformatics. Springer, Berlin (2007)
MATH Google Scholar
Crochemore, M.: Transducers and Repetitions. Theoretical Computer Science 45(1), 63–86 (1986)
Article MATH MathSciNet Google Scholar
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)
MATH Google Scholar
Crochemore, M., Ilie, L., Iliopoulos, C., Kubica, M., Rytter, W., Waleń, T.: LPF Computation Revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 158–169. Springer, Heidelberg (2009)
Google Scholar
Fischer, J., Heun, V.: Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)
Chapter Google Scholar
Fischer, J., Heun, V.: A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 459–470. Springer, Heidelberg (2007)
Chapter Google Scholar
Gabow, H., Bentley, J., Tarjan, R.: Scaling and Related Techniques for Geometry Problems. In: Symposium on the Theory of Computing (STOC), pp. 135–143 (1984)
Google Scholar
Grumbach, S., Tahi, F.: Compression of DNA Sequences. In: Data Compression Conference, pp. 340–350 (1993)
Google Scholar
Hartman, A., Rodeh, M.: Optimal Parsing of Strings. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words, Computer and System Sciences, vol. 12, pp. 155–167. Springer, Berlin (1985)
Google Scholar
Kolpakov, R.M., Kucherov, G.: Finding Maximal Repetitions in a Word in Linear Time. In: FOCS, pp. 596–604 (1999)
Google Scholar
Kolpakov, R.M., Kucherov, G.: Searching for Gapped Palindromes. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 18–30. Springer, Heidelberg (2008)
Chapter Google Scholar
Main, M.G.: Detecting Leftmost Maximal Periodicities. Discret. Appl. Math. 25, 145–153 (1989)
Article MATH MathSciNet Google Scholar
Sadakane, K.: Succinct Data Structures for Flexible Text Retrieval Systems. Journal of Discrete Algorithms 5(1), 12–22 (2007)
Article MATH MathSciNet Google Scholar
Tischler, G.: Personal communication
Google Scholar
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 337–343 (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, King’s College London, London, WC2R 2LS, UK
Maxime Crochemore & Costas S. Iliopoulos
Institute of Informatics, University of Warsaw, Warsaw, Poland
Marcin Kubica, Wojciech Rytter & Tomasz Waleń
Université Paris-Est, France
Maxime Crochemore
Digital Ecosystems & Business Intelligence Institute, Curtin University of Technology, Perth, WA 6845, Australia
Costas S. Iliopoulos
Faculty of Math. and Informatics, Copernicus University, Torun, Poland
Wojciech Rytter

Authors

Maxime Crochemore
View author publications
You can also search for this author in PubMed Google Scholar
Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Kubica
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Rytter
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Waleń
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computing Sciences, Utrecht University, Padualaan 14, 3584 CH, Utrecht, The Netherlands
Jan van Leeuwen
LaBRI, Universit Bordeaux, 351 cours de la Libration, 1F-33405, Talence Cedex, France
Anca Muscholl
Department of Computer Science & Applied Mathematics, Weizmann Institute, Faculty of Mathematics and Computer Science, 76100, Rehovot, Israel
David Peleg
Department of Software Engineering Faculty of Mathematics and Physics Malostranské nám, Charles University, 25 11800, Prague 1, Czech Republic
Jaroslav Pokorný
Software Engineering, Department of Computer Science, RWTH Aachen University, 3, Ahornstrae 55, D-52074, Aachen, Germany
Bernhard Rumpe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Crochemore, M., Iliopoulos, C.S., Kubica, M., Rytter, W., Waleń, T. (2010). Efficient Algorithms for Two Extensions of LPF Table: The Power of Suffix Arrays. In: van Leeuwen, J., Muscholl, A., Peleg, D., Pokorný, J., Rumpe, B. (eds) SOFSEM 2010: Theory and Practice of Computer Science. SOFSEM 2010. Lecture Notes in Computer Science, vol 5901. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11266-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-11266-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11265-2
Online ISBN: 978-3-642-11266-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics