Skip to main content

Discrepancy Analysis of Complex Objects Using Dissimilarities

  • Chapter
Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 292))

Abstract

In this article we consider objects for which we have a matrix of dissimilarities and we are interested in their links with covariates. We focus on state sequences for which pairwise dissimilarities are given for instance by edit distances. The methods discussed apply however to any kind of objects and measures of dissimilarities. We start with a generalization of the analysis of variance (ANOVA) to assess the link of complex objects (e.g. sequences) with a given categorical variable. The trick is to show that discrepancy among objects can be derived from the sole pairwise dissimilarities, which permits then to identify factors that most reduce this discrepancy.We present a general statistical test and introduce an original way of rendering the results for state sequences. We then generalize the method to the case with more than one factor and discuss its advantages and limitations especially regarding interpretation. Finally, we introduce a new tree method for analyzing discrepancy of complex objects that exploits the former test as splitting criterion. We demonstrate the scope of the methods presented through a study of the factors that most discriminate Swiss occupational trajectories. All methods presented are freely accessible in our TraMineR package for the R statistical environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Anderson, M.J.: A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32–46 (2001)

    Article  Google Scholar 

  • Batagelj, V.: Generalized Ward and related clustering problems. In: Bock, H. (ed.) Classification and related methods of data analysis, pp. 67–74. North-Holland, Amsterdam (1988)

    Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Chapman and Hall, New York (1984)

    MATH  Google Scholar 

  • Excoffier, L., Smouse, P.E., Quattro, J.M.: Analysis of Molecular Variance Inferred from Metric Distances among DNA Haplotypes: Application to Human Mitochondrial DNA Restriction Data. Genetics 131, 479–491 (1992)

    Google Scholar 

  • Gabadinho, A., Ritschard, G., Studer, M., Müller, N.S.: Mining Sequence Data in R with the TraMineR package: A User’s Guide. Technical report, Department of Econometrics and Laboratory of Demography, University of Geneva, Geneva (2009), http://mephisto.unige.ch/traminer/

  • Gansner, E.R., North, S.C.: An Open Graph Visualization System and Its Applications to software engineering. Software - Practice and Experience 30, 1203–1233 (1999)

    Article  Google Scholar 

  • Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: Cohen, W.W., Moore, A. (eds.) ICML. ACM International Conference Proceeding Series, vol. 148, pp. 345–352. ACM, New York (2006)

    Chapter  Google Scholar 

  • Gower, J.C.: Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis. Biometrika 53(3/4), 325–338 (1966), http://www.jstor.org/stable/2333639

    Article  MATH  MathSciNet  Google Scholar 

  • Gower, J.C., Krzanowski, W.J.: Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance. Journal of the Royal Statistical Society: Series C (Applied Statistics) 48(4), 505–519 (1999)

    Article  MATH  Google Scholar 

  • Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2), 119–127 (1980)

    Article  Google Scholar 

  • Levy, R., Gauthier, J.-A., Widmer, E.: Entre contraintes institutionnelle et domestique : les parcours de vie masculins et féminins en Suisse. Cahiers canadiens de sociologie 31(4), 461–489 (2006)

    Article  Google Scholar 

  • McArdle, B.H., Anderson, M.J.: Fitting Multivariate Models to Community Data: A Comment on Distance-Based Redundancy Analysis. Ecology 82(1), 290–297 (2001), http://www.jstor.org/stable/2680104

    Article  Google Scholar 

  • Moore, D.S., McCabe, G., Duckworth, W., Sclove, S.: Bootstrap Methods and Permutation Tests. In: The Practice of Business Statistics: Using Data for Decisions, W. H. Freeman, New York (2003)

    Google Scholar 

  • Piccarreta, R., Billari, F.C.: Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society A 170(4), 1061–1078 (2007)

    Article  MathSciNet  Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0, http://www.r-project.org

  • Scherer, S.: Early Career Patterns: A Comparison of Great Britain and West Germany. European Sociological Review 17(2), 119–144 (2001)

    Article  Google Scholar 

  • Shaw, R.G., Mitchell-Olds, T.: Anova for Unbalanced Data: An Overview. Ecology 74(6), 1638–1645 (1993), http://www.jstor.org/stable/1939922

    Article  Google Scholar 

  • Snedecor, G.W., Cochran, W.G.: Statistical methods, 8th edn. Iowa State University Press (1989)

    Google Scholar 

  • Späth, H.: Cluster analyse algorithmen. R. Oldenbourg Verlag, München (1975)

    MATH  Google Scholar 

  • Zapala, M.A., Schork, N.J.: Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables. Proceedings of the National Academy of Sciences of the United States of America 103(51), 19430–19435 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Studer, M., Ritschard, G., Gabadinho, A., Müller, N.S. (2010). Discrepancy Analysis of Complex Objects Using Dissimilarities. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00580-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00579-4

  • Online ISBN: 978-3-642-00580-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics