Skip to main content

An Efficient Algorithm for Identifying the Most Contributory Substring

  • Conference paper
  • 1188 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Abstract

Detecting repeated portions of strings has important applications to many areas of study including data compression and computational biology. This paper defines and presents a solution for the Most Contributory Substring Problem, which identifies the single substring that represents the largest proportion of the characters within a set of strings. We show that a solution to the problem can be achieved with an O(n) running time (where n is the total number of characters in all of the input strings) when overlapping occurrences of the most contributory substring are permitted. Furthermore, we present an extended algorithm that does not permit occurrences of the most contributory substring to overlap. The expected running time of the extended algorithm is O(n logn) while its worst case performance is O(n 2).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. J. Algorithms 13(3), 446–467 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  2. Devroye, L., Szpankowski, W., Rais, B.: A note on the height of suffix trees. SIAM J. Comput. 21(1), 48–53 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  3. Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS 1997. Proceedings of the 38th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, 1997, p. 137. IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  4. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York, NY, USA (1997)

    MATH  Google Scholar 

  5. Jacquet, P., Szpankowski, W.: Autocorrelation on words and its applications: Analysis of suffix trees by string-ruler approach. JCTA: Journal of Combinatorial Theory, Series A, 66 (1994)

    Google Scholar 

  6. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  7. Ukkonen, E.: Constructing suffix trees on-line in linear time. In: Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing 1992, vol. 1, pp. 484–492. North-Holland, Amsterdam (1992)

    Google Scholar 

  8. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  9. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Symp on Switching and Automata Theory, pp. 1–11. IEEE Computer Society Press, Los Alamitos (1973)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stephenson, B. (2007). An Efficient Algorithm for Identifying the Most Contributory Substring. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74553-2_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74552-5

  • Online ISBN: 978-3-540-74553-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics