Skip to main content

On Context-Diverse Repeats and Their Incremental Computation

  • Conference paper
Language and Automata Theory and Applications (LATA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8370))

Abstract

The context in which a substring appears is an important notion to identify – for example – its semantic meaning. However, existing classes of repeats fail to take this into account directly. We present here xkcd-repeats, a new family of repeats characterized by the number of different symbols at the left and right of their occurrences. These repeats include as special extreme cases maximal and super-maximal repeats.

We give sufficient and necessary condition to bound their number linearly in the size of the sequence, and show an optimal algorithm that computes them in linear time – given a suffix array –, independent on the size of the alphabet, as well as two other algorithms that are faster in practice.

Additionally, we provide an independent and general framework that allows to compute these (and other) repeats incrementally; extending the application space of repeats in a streaming framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53–86 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Apostolico, A.: Of maps bigger than the empire (invited paper). In: SPIRE, pp. 2–9 (2001)

    Google Scholar 

  3. Bose, P., He, M., Maheshwari, A., Morin, P.: Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing. In: Dehne, F., Gavrilova, M., Sack, J.-R., Tóth, C.D. (eds.) WADS 2009. LNCS, vol. 5664, pp. 98–109. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Carrascosa, R., Coste, F., Gallé, M., Infante-Lopez, G.: The smallest grammar problem as constituents choice and minimal grammar parsing. MDPI Algorithms 4(4), 262–284 (2011)

    Article  Google Scholar 

  5. Clark, A.: Learning deterministic context free grammars: The Omphalos competition. Machine Learning, 93–110 (January 2007)

    Google Scholar 

  6. Clark, A., Eyraud, R., Habrard, A.: A polynomial algorithm for the inference of context free languages. In: ICGI, pp. 29–42 (July 2008)

    Google Scholar 

  7. Fenwick, P.M.: A New Data Structure for Cumulative Frequency Tables. Softw. Pract. Exper. 24, 327–336 (1994)

    Article  Google Scholar 

  8. Gallé, M.: Searching for Compact Hierarchical Structures in DNA by means of the Smallest Grammar Problem. Université de Rennes 1 (February 2011)

    Google Scholar 

  9. Gallé, M.: The bag-of-repeats representation of documents. In: SIGIR, pp. 1053–1056 (2013)

    Google Scholar 

  10. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press (January 1997)

    Google Scholar 

  11. Manning, C., Raghavan, P., Schütze, H.: Introduction to Inf Retrieval. Cambridge UP (2009)

    Google Scholar 

  12. Navarro, G.: Spaces, Trees and Colors: The Algorithmic Landscape of Document Retrieval on Sequences. arXiv (2013)

    Google Scholar 

  13. Ohlebusch, E., Beller, T., Abouelhoda, M.I.: Computing the Burrows Wheeler transform of a string and its reverse in parallel. Journal of Discrete Algorithms 1, 1–13 (2013), http://linkinghub.elsevier.com/retrieve/pii/S1570866713000397

    Google Scholar 

  14. Puglisi, S., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2) (July 2007), http://portal.acm.org/citation.cfm?id=1242471.1242472

  15. Puglisi, S.J., Smyth, W.F., Yusufu, M.: Fast optimal algorithms for computing all the repeats in a string. In: Prague Stringology Conference, pp. 161–169 (2008)

    Google Scholar 

  16. Schütze, H.: Automatic Word Sense Discrimination. Comput. Ling. 24(1) (1998)

    Google Scholar 

  17. Solan, Z., Horn, D., Ruppin, E., Edelman, S.: Unsupervised learning of natural languages. PNAS, 11629–11634 (January 2005)

    Google Scholar 

  18. Ukkonen, E.: Online construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  19. van Zaanen, M.: ABL: Alignment-based learning. In: International Conference on Computational Linguistics, pp. 961–967 (2000)

    Google Scholar 

  20. Zhang, S., Nong, G., Chan, W.H.: Fast and space efficient linear suffix array construction. In: DCC. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gallé, M., Tealdi, M. (2014). On Context-Diverse Repeats and Their Incremental Computation. In: Dediu, AH., Martín-Vide, C., Sierra-Rodríguez, JL., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2014. Lecture Notes in Computer Science, vol 8370. Springer, Cham. https://doi.org/10.1007/978-3-319-04921-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04921-2_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04920-5

  • Online ISBN: 978-3-319-04921-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics