Skip to main content
Log in

Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l -Fitting Robinson Structures to Distances

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as far from it as possible. This goal is best achieved by considering the Robinson property: a distance d R on X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. If the distance d fails to satisfy the Robinson property, then we are lead to the problem of finding a reordering of d which is as close as possible to a Robinsonian distance.

In this paper, we present a factor 16 approximation algorithm for the following NP-hard fitting problem: given a finite set X and a dissimilarity d on X, we wish to find a Robinsonian dissimilarity d R on X minimizing the l -error ‖dd R =max x,yX{|d(x,y)−d R (x,y)|} between d and d R .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwala, R., Bafna, V., Farach, M., Narayanan, B., Paterson, M., Thorup, M.: On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Comput. 17, 1073–1085 (1999)

    Google Scholar 

  2. Ailon, N., Charikar, M.: Fitting tree metrics: hierarchical clustering and phyloleny. In: FOCS 2005

  3. Alon, N., Bădoiu, M., Demaine, E.D., Farach-Colton, M., Hajiaghayi, M., Sidiropoulos, A.: Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics. In: SODA 2005

  4. Aspval, B., Plass, M.F., Tarjan, R.E.: A linear time algorithm for testing the truth of certain quantified boolean formulas. Inf. Process. Lett. 8, 121–123 (1979)

    Article  Google Scholar 

  5. Atkins, J.E., Boman, E.G., Hendrickson, B.: A spectral algorithm for seriation and the consecutive ones problem. SIAM J. Comput. 28, 297–310 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bădoiu, M.: Approximation algorithm for embedding metrics into a two-dimensional space. In: SODA 2003

  7. Bădoiu, M., Chuzhoy, J., Indyk, P., Sidiropoulos, A.: Low-distortion embeddings of general metrics into the line. In: STOC 2005

  8. Bădoiu, M., Gupta, A., Dhamdhere, K., Rabinovich, Y., Räcke, H., Ravi, R., Chuzhoy, J., Sidiropoulos, A.: Approximation algorithms for low-distortion ebeddings into low-dimensional spaces. In: SODA 2005

  9. Barthelemy, J.-P., Brucker, F.: NP-hard approximation problems in overlapping clustering. J. Classif. 18, 159–183 (2001)

    MATH  MathSciNet  Google Scholar 

  10. Benzer, S.: The fine structure of the gene. Sci. Am. 206, 70–84 (1962)

    Google Scholar 

  11. Berry, M.W., Hendrickson, B., Raghavan, P.: Sparse matrix reordering schemes for browsing hypertext. In: Renegar, J., Shub, M., Smale, S. (eds.) Lectures in Applied Mathematics. The Mathematics of Numerical Analysis, vol. 32. American Mathematical Society, Providence (1996)

    Google Scholar 

  12. Bertin, J.: Graphics and Graphic Information Processing. de Gruyter, Berlin (1981)

    Google Scholar 

  13. Bertrand, P.: Etude de la représentation pyramidale. Thèse de 3ème cycle, Université Paris IX-Dauphine (1986)

  14. Booth, K.S., Lueker, G.E.: Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms. J. Comput. Syst. Sci. 13, 335–379 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  15. Brito, P., Bertrand, P., Cucumel, G., de Carvalho, F. (eds.): Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (2007)

    MATH  Google Scholar 

  16. Brusco, M.J.: A branch-and-bound algorithm for fitting anti-Robinson structures to symmetric dissimilarity matrices. Psychometrica 67, 459–471 (2002)

    Article  MathSciNet  Google Scholar 

  17. Caraux, G., Pinloche, S.: PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order. Bioinformatics 21, 1280–1281 (2005)

    Article  Google Scholar 

  18. Chen, C.H.: Generalized association plots for information visualization: the applications of the convergence of iteratively formed correlation matrices. Stat. Sin. 12, 1–23 (2002)

    Google Scholar 

  19. Chen, C.H., Hwu, H.G., Jang, W.J., Kao, C.H., Tien, Y.J., Tzeng, S., Wu, H.M.: Matrix visualization and information mining. In: COMPSTAT 2004, pp. 85–100

  20. Chepoi, V., Fichet, B.: Recognition of Robinsonian dissimilarities. J. Class. 14, 311–325 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  21. Chepoi, V., Fichet, B.: l -approximation via subdominants. J. Math. Psychol. 44, 600–616 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  22. Chepoi, V., Fichet, B., Seston, M.: Seriation in the presence of errors: NP-hardness of l -fitting Robinson structures to dissimilarity matrices. J. Class. (to appear)

  23. Chor, B., Sudan, M.: A geometric approach to betweenness. SIAM J. Discrete Math. 11, 511–523 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  24. Critchley, F., Fichet, B.: The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some properties of their basic properties. In: van Cutsen, B. (ed.) Classification and Dissimilarity Analysis. Lecture Notes in Statistics, pp. 5–65. Springer, Berlin (1994)

    Google Scholar 

  25. Cunningham, J.P., Shepard, R.N.: Monotone mapping of similarities into a general metric space. J. Math. Psychol. 11, 335–364 (1974)

    Article  MATH  Google Scholar 

  26. Dhamdhere, K.: Approximating additive distortion of line embeddings. In: APPROX-RANDOM 2004

  27. Diday, E.: Orders and overlapping clusters by pyramids. In: de Leeuw, J., Heiser, W., Meulman, J., Critchley, F. (eds.) Multidimensional Data Analysis, pp. 201–234. DSWO, Leiden (1986)

    Google Scholar 

  28. Durand, C.: Ordres et graphes pseudo-hiérarchiques: théorie et optimisation algorithmique. Thèse de l’Université de Provence, Marseille (1989)

  29. Durand, C., Fichet, B.: One-to-one correspondences in pyramidal representation: a unified approach. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 85–90. North-Holland, Amsterdam (1988)

    Google Scholar 

  30. Eisen, M.B., Spellman, P.T., Browndagger, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  31. Farach, M., Kannan, S., Warnow, T.: A robust model for finding optimal evolutionary trees. Algorithmica 13, 155–179 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  32. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–28 (1964)

    Article  MATH  MathSciNet  Google Scholar 

  33. Kruskal, J.B.: Non-metric multidimensional scaling. Psychometrika 29, 115–129 (1964)

    Article  MATH  MathSciNet  Google Scholar 

  34. Fichet, B.: Data analysis: geometric and algebraic structures. In: Prohorov, Yu.A., Sazonov, V.V. (eds.) First World Congress of Bernoulli Society Proceedings, Tashkent, USSR, 1986, vol. 2. VNU Science Press, Utrecht (1986)

    Google Scholar 

  35. Fulkerson, D.R., Gross, O.A.: Incidence matrices and interval graphs. Pac. J. Math. 15, 835–855 (1965)

    MATH  MathSciNet  Google Scholar 

  36. Hahsler, M., Hornik, K., Buchta, C.: Getting things in order: an introduction to the R package seriation. J. Stat. Softw. 25, 1–34 (2008)

    Google Scholar 

  37. Halperin, D.: Musical chronology by seriation. Comput. Humanit. 28, 13–18 (1994)

    Article  Google Scholar 

  38. Håstad, J., Ivansson, L., Lagergren, J.: Fitting points on the real line and its application to RH mapping. J. Algorithms 49(1), 42–62 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  39. Hubert, L.J.: Some applications of graph theory and related nonmetric techniques to problems of approximate seriation: The case of symmetric proximity measures. Br. J. Math. Stat. Psychol. 27, 133–153 (1974)

    MATH  Google Scholar 

  40. Hubert, L., Arabie, P., Meulman, J.: The Structural Representation of Proximity Matrices with Matlab. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2006)

    Book  MATH  Google Scholar 

  41. Indyk, P., Matousek, J.: Low-distortion embeddings of finite metric spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, 2nd edn., pp. 177–196. CRC Press, Boca Raton (2004)

    Google Scholar 

  42. Johnson, D.S.: The NP-completeness column: an outgoing guide. J. Algorithms 3, 182–195 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  43. Kendall, D.G.: Incidence matrices, interval graphs and seriation in archeology. Pac. J. Math. 28, 565–570 (1969)

    MATH  MathSciNet  Google Scholar 

  44.  Kendall, D.G.: Seriation from abundance matrices. In: Hodson, F.R., Kendall, D.G., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 215–252. Edinburgh University Press, Edinburgh (1971)

    Google Scholar 

  45. Klinz, B., Rudolf, R., Woeginger, G.: On the recognition of permutted bottleneck Monge matrices. Discrete Appl. Math. 63, 43–74 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  46. Ling, R.L.: A computer generated aid for cluster analysis. Commun. ACM 16, 355–361 (1973)

    Article  Google Scholar 

  47. Marquardt, W.: Advances in archaeological seriation. In: Schiffer, M. (ed.) Advances in Archaeological Method and Theory 1, pp. 257–314. Academic Press, San Diego (1978)

    Google Scholar 

  48. Miklos, I., Somodi, I., Podani, I.: Rearrangement of ecological data matrices via Markov chain Monte Carlo simulation. Ecology 86, 3398–3410 (2005)

    Article  Google Scholar 

  49. Mirkin, B., Rodin, S.: Graphs and Genes. Springer, Berlin (1984)

    MATH  Google Scholar 

  50.  Robinson, W.S.: A method for chronologically ordering archaeological deposits. Am. Antiq. 16, 293–301 (1951)

    Article  Google Scholar 

  51. Saxe, J.B.: Embeddability of weighted graphs in k-space is strongly NP-hard. In: Proceedings of the 17th Allerton Conference on Communications, Control, and Computing, pp. 480–489 (1979)

  52. Semple, C., Steel, M.: Phylogenetics. Oxford University Press, London (2003)

    MATH  Google Scholar 

  53. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Freeman and Company, San Francisco (1973)

    MATH  Google Scholar 

  54. Strehl, A., Ghosh, J.: Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput. 15, 208–230 (2003)

    Article  Google Scholar 

  55. Torgerson, W.S.: Multidimensional scaling I: theory and method. Psychometrika 17, 401–414 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  56. Van de Vel, M.: Theory of Convex Structures. Elsevier, Amsterdam (1993)

    MATH  Google Scholar 

  57. Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001)

    Google Scholar 

  58. Whittaker, R.H. (ed.): Ordination of Plant Communities. Junk, The Hague (1978)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Chepoi.

Additional information

An extended abstract of this paper appeared in the proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science, STACS 2009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chepoi, V., Seston, M. Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l -Fitting Robinson Structures to Distances. Algorithmica 59, 521–568 (2011). https://doi.org/10.1007/s00453-009-9319-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-009-9319-y

Keywords

Navigation