Skip to main content

A New Combinatorial Approach to Sequence Comparison

  • Conference paper
Theoretical Computer Science (ICTCS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3701))

Included in the following conference series:

Abstract

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we define a measure to compare sequences that takes into account how the characters coming from different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benedetto, D., Caglioti, E., Loreto, V.: Zipping out relevant information. Computing in Science and Engineering, 80–85 (2003)

    Google Scholar 

  2. Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)

    Google Scholar 

  3. Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Pääbo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47, 307–322 (1998)

    Article  Google Scholar 

  4. Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Information Theory 51(4), 1523–1545 (2005)

    Article  Google Scholar 

  5. Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theoret. Comput. Sci. 332, 567–572 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  6. Ergun, F., Muthukrishnan, S., Sahinalp, C.: Comparing sequences with segment rearrangements. In: Pandya, P.K., Radhakrishnan, J. (eds.) FSTTCS 2003. LNCS, vol. 2914, pp. 183–194. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Fenwick, P.: The Burrows-Wheeler transform for block sorting text compression: principles and improvements. The Computer Journal 39(9), 731–740 (1996)

    Article  Google Scholar 

  8. Gessel, I.M., Reutenauer, C.: Counting permutations with given cycle structure and descent set. J. Combin. Theory Ser. A 64(2), 189–215 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  9. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  10. Ilie, L., Constantinescu, S.: Fine and Wilf’s theorem for any number of periods. TUCS (Turku Center for Computer Science) 25, 65–74 (2003); Proc. WORDS 2003

    MathSciNet  Google Scholar 

  11. Larsson, N.J., Sadakane, K.: Faster suffix sorting. Technical Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Department of Computer Science, Lund University, Sweden (1999)

    Google Scholar 

  12. Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)

    Article  Google Scholar 

  13. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Inform. Th. 12(5), 3250–3264 (2004)

    Article  Google Scholar 

  14. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  15. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  16. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 178–189. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Informat. Proc. Lett. 86, 241–246 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  18. Mantaci, S., Restivo, A., Sciortino, M.: An extension of the Burrows-Wheeler Transform to k words. Technical Report 267, University of Palermo, Dipartimento di Matematica ed Appl. (December 2004)

    Google Scholar 

  19. Mantaci, S., Restivo, A., Sciortino, M.: An Extension of the Burrows Wheeler Transform to k Words (Extended Abstract). In: 2005 Data Compression Conference (DCC 2005), Snowbird, UT, USA, 29-31 March 2005, p. 469. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  20. Manzini, G.: The Burrows-Wheeler transform: Theory and practice. In: Kutyłowski, M., Wierzbicki, T., Pacholski, L. (eds.) MFCS 1999. LNCS, vol. 1672, pp. 34–47. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  21. McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM 23(2), 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  22. Mignosi, F., Restivo, A.: Periodicity. In: Lothaire, M. (ed.) Algebraic Combinatorics on Words, ch. 8, pp. 237–274. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  23. Otu, H.H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16), 2122–2130 (2003)

    Article  Google Scholar 

  24. Tijdeman, R., Zamboni, L.: Fine and Wilf words for any periods. Indag. Math. 14(1), 135–147 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  25. Vinga, S., Almeida, J.: Alignment-free sequence comparison – a review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mantaci, S., Restivo, A., Rosone, G., Sciortino, M. (2005). A New Combinatorial Approach to Sequence Comparison. In: Coppo, M., Lodi, E., Pinna, G.M. (eds) Theoretical Computer Science. ICTCS 2005. Lecture Notes in Computer Science, vol 3701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11560586_28

Download citation

  • DOI: https://doi.org/10.1007/11560586_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29106-0

  • Online ISBN: 978-3-540-32024-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics