Abstract
The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as far from it as possible. This goal is best achieved by considering the Robinson property: a distance d R on X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. If the distance d fails to satisfy the Robinson property, then we are lead to the problem of finding a reordering of d which is as close as possible to a Robinsonian distance.
In this paper, we present a factor 16 approximation algorithm for the following NP-hard fitting problem: given a finite set X and a dissimilarity d on X, we wish to find a Robinsonian dissimilarity d R on X minimizing the l ∞-error ‖d−d R ‖∞=max x,y∈X{|d(x,y)−d R (x,y)|} between d and d R .
Similar content being viewed by others
References
Agarwala, R., Bafna, V., Farach, M., Narayanan, B., Paterson, M., Thorup, M.: On the approximability of numerical taxonomy (fitting distances by tree metrics). SIAM J. Comput. 17, 1073–1085 (1999)
Ailon, N., Charikar, M.: Fitting tree metrics: hierarchical clustering and phyloleny. In: FOCS 2005
Alon, N., Bădoiu, M., Demaine, E.D., Farach-Colton, M., Hajiaghayi, M., Sidiropoulos, A.: Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics. In: SODA 2005
Aspval, B., Plass, M.F., Tarjan, R.E.: A linear time algorithm for testing the truth of certain quantified boolean formulas. Inf. Process. Lett. 8, 121–123 (1979)
Atkins, J.E., Boman, E.G., Hendrickson, B.: A spectral algorithm for seriation and the consecutive ones problem. SIAM J. Comput. 28, 297–310 (1998)
Bădoiu, M.: Approximation algorithm for embedding metrics into a two-dimensional space. In: SODA 2003
Bădoiu, M., Chuzhoy, J., Indyk, P., Sidiropoulos, A.: Low-distortion embeddings of general metrics into the line. In: STOC 2005
Bădoiu, M., Gupta, A., Dhamdhere, K., Rabinovich, Y., Räcke, H., Ravi, R., Chuzhoy, J., Sidiropoulos, A.: Approximation algorithms for low-distortion ebeddings into low-dimensional spaces. In: SODA 2005
Barthelemy, J.-P., Brucker, F.: NP-hard approximation problems in overlapping clustering. J. Classif. 18, 159–183 (2001)
Benzer, S.: The fine structure of the gene. Sci. Am. 206, 70–84 (1962)
Berry, M.W., Hendrickson, B., Raghavan, P.: Sparse matrix reordering schemes for browsing hypertext. In: Renegar, J., Shub, M., Smale, S. (eds.) Lectures in Applied Mathematics. The Mathematics of Numerical Analysis, vol. 32. American Mathematical Society, Providence (1996)
Bertin, J.: Graphics and Graphic Information Processing. de Gruyter, Berlin (1981)
Bertrand, P.: Etude de la représentation pyramidale. Thèse de 3ème cycle, Université Paris IX-Dauphine (1986)
Booth, K.S., Lueker, G.E.: Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms. J. Comput. Syst. Sci. 13, 335–379 (1976)
Brito, P., Bertrand, P., Cucumel, G., de Carvalho, F. (eds.): Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (2007)
Brusco, M.J.: A branch-and-bound algorithm for fitting anti-Robinson structures to symmetric dissimilarity matrices. Psychometrica 67, 459–471 (2002)
Caraux, G., Pinloche, S.: PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order. Bioinformatics 21, 1280–1281 (2005)
Chen, C.H.: Generalized association plots for information visualization: the applications of the convergence of iteratively formed correlation matrices. Stat. Sin. 12, 1–23 (2002)
Chen, C.H., Hwu, H.G., Jang, W.J., Kao, C.H., Tien, Y.J., Tzeng, S., Wu, H.M.: Matrix visualization and information mining. In: COMPSTAT 2004, pp. 85–100
Chepoi, V., Fichet, B.: Recognition of Robinsonian dissimilarities. J. Class. 14, 311–325 (1997)
Chepoi, V., Fichet, B.: l ∞-approximation via subdominants. J. Math. Psychol. 44, 600–616 (2000)
Chepoi, V., Fichet, B., Seston, M.: Seriation in the presence of errors: NP-hardness of l ∞-fitting Robinson structures to dissimilarity matrices. J. Class. (to appear)
Chor, B., Sudan, M.: A geometric approach to betweenness. SIAM J. Discrete Math. 11, 511–523 (1998)
Critchley, F., Fichet, B.: The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some properties of their basic properties. In: van Cutsen, B. (ed.) Classification and Dissimilarity Analysis. Lecture Notes in Statistics, pp. 5–65. Springer, Berlin (1994)
Cunningham, J.P., Shepard, R.N.: Monotone mapping of similarities into a general metric space. J. Math. Psychol. 11, 335–364 (1974)
Dhamdhere, K.: Approximating additive distortion of line embeddings. In: APPROX-RANDOM 2004
Diday, E.: Orders and overlapping clusters by pyramids. In: de Leeuw, J., Heiser, W., Meulman, J., Critchley, F. (eds.) Multidimensional Data Analysis, pp. 201–234. DSWO, Leiden (1986)
Durand, C.: Ordres et graphes pseudo-hiérarchiques: théorie et optimisation algorithmique. Thèse de l’Université de Provence, Marseille (1989)
Durand, C., Fichet, B.: One-to-one correspondences in pyramidal representation: a unified approach. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 85–90. North-Holland, Amsterdam (1988)
Eisen, M.B., Spellman, P.T., Browndagger, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)
Farach, M., Kannan, S., Warnow, T.: A robust model for finding optimal evolutionary trees. Algorithmica 13, 155–179 (1995)
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–28 (1964)
Kruskal, J.B.: Non-metric multidimensional scaling. Psychometrika 29, 115–129 (1964)
Fichet, B.: Data analysis: geometric and algebraic structures. In: Prohorov, Yu.A., Sazonov, V.V. (eds.) First World Congress of Bernoulli Society Proceedings, Tashkent, USSR, 1986, vol. 2. VNU Science Press, Utrecht (1986)
Fulkerson, D.R., Gross, O.A.: Incidence matrices and interval graphs. Pac. J. Math. 15, 835–855 (1965)
Hahsler, M., Hornik, K., Buchta, C.: Getting things in order: an introduction to the R package seriation. J. Stat. Softw. 25, 1–34 (2008)
Halperin, D.: Musical chronology by seriation. Comput. Humanit. 28, 13–18 (1994)
Håstad, J., Ivansson, L., Lagergren, J.: Fitting points on the real line and its application to RH mapping. J. Algorithms 49(1), 42–62 (2003)
Hubert, L.J.: Some applications of graph theory and related nonmetric techniques to problems of approximate seriation: The case of symmetric proximity measures. Br. J. Math. Stat. Psychol. 27, 133–153 (1974)
Hubert, L., Arabie, P., Meulman, J.: The Structural Representation of Proximity Matrices with Matlab. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2006)
Indyk, P., Matousek, J.: Low-distortion embeddings of finite metric spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, 2nd edn., pp. 177–196. CRC Press, Boca Raton (2004)
Johnson, D.S.: The NP-completeness column: an outgoing guide. J. Algorithms 3, 182–195 (1982)
Kendall, D.G.: Incidence matrices, interval graphs and seriation in archeology. Pac. J. Math. 28, 565–570 (1969)
Kendall, D.G.: Seriation from abundance matrices. In: Hodson, F.R., Kendall, D.G., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 215–252. Edinburgh University Press, Edinburgh (1971)
Klinz, B., Rudolf, R., Woeginger, G.: On the recognition of permutted bottleneck Monge matrices. Discrete Appl. Math. 63, 43–74 (1995)
Ling, R.L.: A computer generated aid for cluster analysis. Commun. ACM 16, 355–361 (1973)
Marquardt, W.: Advances in archaeological seriation. In: Schiffer, M. (ed.) Advances in Archaeological Method and Theory 1, pp. 257–314. Academic Press, San Diego (1978)
Miklos, I., Somodi, I., Podani, I.: Rearrangement of ecological data matrices via Markov chain Monte Carlo simulation. Ecology 86, 3398–3410 (2005)
Mirkin, B., Rodin, S.: Graphs and Genes. Springer, Berlin (1984)
Robinson, W.S.: A method for chronologically ordering archaeological deposits. Am. Antiq. 16, 293–301 (1951)
Saxe, J.B.: Embeddability of weighted graphs in k-space is strongly NP-hard. In: Proceedings of the 17th Allerton Conference on Communications, Control, and Computing, pp. 480–489 (1979)
Semple, C., Steel, M.: Phylogenetics. Oxford University Press, London (2003)
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Freeman and Company, San Francisco (1973)
Strehl, A., Ghosh, J.: Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput. 15, 208–230 (2003)
Torgerson, W.S.: Multidimensional scaling I: theory and method. Psychometrika 17, 401–414 (1952)
Van de Vel, M.: Theory of Convex Structures. Elsevier, Amsterdam (1993)
Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001)
Whittaker, R.H. (ed.): Ordination of Plant Communities. Junk, The Hague (1978)
Author information
Authors and Affiliations
Corresponding author
Additional information
An extended abstract of this paper appeared in the proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science, STACS 2009.
Rights and permissions
About this article
Cite this article
Chepoi, V., Seston, M. Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l ∞-Fitting Robinson Structures to Distances. Algorithmica 59, 521–568 (2011). https://doi.org/10.1007/s00453-009-9319-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-009-9319-y