Skip to main content
Log in

A New Combinatorial Approach to Sequence Comparison

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Benedetto, D., Caglioti, E., Loreto, V.: Zipping out relevant information, Comput. Sci. Eng. 80–85 (2003)

  2. Bennett, C.H., Gács, P., Li, M., Vitányi, P., Zurek, W.: Information distance. IEEE Trans. Inf. Theory 44(4), 1407–1423 (1998)

    Article  MATH  Google Scholar 

  3. Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)

  4. Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Pääbo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47, 307–322 (1998)

    Article  Google Scholar 

  5. Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)

    Article  Google Scholar 

  6. Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theor. Comput. Sci. 332, 567–572 (2005)

    Article  MATH  Google Scholar 

  7. Ergun, F., Muthukrishnan, S., Sahinalp, C.: Comparing sequences with segment rearrangements. In: Proc. of the FSTTCS’03, Bombay, India. Lecture Notes in Computer Science, pp. 183–194. Springer, Berlin (2003)

    Google Scholar 

  8. Fenwick, P.: The Burrows-Wheeler transform for block sorting text compression: principles and improvements. Comput. J. 39(9), 731–740 (1996)

    Google Scholar 

  9. Gessel, I.M., Reutenauer, C.: Counting permutations with given cycle structure and descent set. J. Comb. Theory Ser. A 64(2), 189–215 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  10. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Proc. ICALP, pp. 943–955, 2003

  11. Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)

    Article  Google Scholar 

  12. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Inf. Theory 12(5), 3250–3264 (2004)

    Article  Google Scholar 

  13. Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics, vol. 17. Addison–Wesley, Reading (1983). Reprinted in the Cambridge Mathematical Library, Cambridge University Press (1997)

    Google Scholar 

  14. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  15. Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Inf. Proc. Lett. 86, 241–246 (2003)

    Article  MathSciNet  Google Scholar 

  16. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows Wheeler transform and applications to sequence comparison and data compression. In: Apostolico, A., Crochemore, M., Park, K. (eds.) Combinatorial Pattern Matching, 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19–22, 2005, Proceedings. Lecture Notes in Computer Science, vol. 3537, pp. 178–189. Springer, Berlin (2005)

    Google Scholar 

  17. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: A new combinatorial approach to sequence comparison. In: Coppo, M., Lodi, E., Pinna, G.M. (eds.) Proceedings of ICTCS 2005, Theoretical Computer Science, 9th Italian Conference, ICTCS 2005, Siena, Italy, October 12–14, 2005. Lecture Notes in Computer Science, vol. 3701, pp. 348–359. Springer, Berlin (2005)

    Google Scholar 

  18. Mantaci, S., Rosone, G., Restivo, A., Sciortino, M.: An extension of the burrows wheeler transform. Theor. Comput. Sci. (to appear)

  19. Manzini, G.: The Burrows-Wheeler transform: theory and practice. In: Proc. of the 24th International Symposium on Mathematical Foundations of Computer Science (MFCS ’99). Lecture Notes in Computer Science, vol. 1672, pp. 34–47. Springer, Berlin (1999)

    Google Scholar 

  20. Otu, H.H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16), 2122–2130 (2003)

    Article  Google Scholar 

  21. Pevzner, P.A.: Statistical distance between texts and filtration methods in sequence comparison. Comput. Appl. Biosci. (8), 121–127 (1992)

  22. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4

  23. Vinga, S., Almeida, J.: Alignment-free sequence comparison—a review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

  24. Wilf, H.S., Fine, N.J.: Uniqueness theorem for periodic functions. Proc. Am. Math. Soc. 16, 109–114 (1965)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Sciortino.

Additional information

This work was partially supported by the Italian MIUR PRIN project “Automi e Linguaggi Formali: aspetti matematici ed applicativi” and by MIUR FIRB Italy-Israel project “Pattern Matching and Discovery in Discrete Structures, with applications to Bioinformatics”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mantaci, S., Restivo, A., Rosone, G. et al. A New Combinatorial Approach to Sequence Comparison. Theory Comput Syst 42, 411–429 (2008). https://doi.org/10.1007/s00224-007-9078-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-007-9078-6

Keywords

Navigation