Skip to main content

Extracting Knowledge from Life Courses: Clustering and Visualization

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

Abstract

This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the n most frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deonier, R., Tavaré, S., Waterman, M.: Computational Genome Analysis: an Introduction. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  2. Needleman, S.B., Wunsch, C.: General method applicable to the search for similarities in the animo acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  Google Scholar 

  3. Kruskal, J.: An overview of sequence comparison. In: Time warps, string edits, and macromolecules. The theory and practice of sequence comparison, pp. 1–44. Adison-Wesley, Don Mills (1983)

    Google Scholar 

  4. Abbott, A., Forrest, J.: Optimal matching methods for historical sequences. Journal of Interdisciplinary History 16, 471–494 (1986)

    Article  Google Scholar 

  5. Abbott, A., Hrycak, A.: Measuring resemblance in sequence data: An optimal matching analaysis of musician’s careers. American Journal of Sociolgy 96(1), 144–185 (1990)

    Article  Google Scholar 

  6. Abbott, A., Tsay, A.: Sequence analysis and optimal matching methods in sociology, Review and prospect. Sociological Methods and Research 29(1), 3–33 (2000) (With discussion, pp 34-76)

    Article  Google Scholar 

  7. Rohwer, G., Pötter, U.: TDA user’s manual. Software, Ruhr-Universität Bochum, Fakultät für Sozialwissenschaften, Bochum (2002)

    Google Scholar 

  8. Wu, L.: Some comments on sequence analysis and optimal matching methods in sociology: Review and prospect. Sociological Methods and Research 29, 41–64 (2000)

    Article  Google Scholar 

  9. Notredame, C., Bucher, P., Gauthier, J.A., Widmer, E.: T-COFFEE/SALTT: User guide and reference manual (2005), Available at, http://www.tcoffee.org/saltt

  10. Gauthier, J.A., Widmer, E.D., Bucher, P., Notredame, C.: How much does it cost? Optimization of costs in sequence analysis of social science data. Sociological Methods and Research (forthcoming, 2008)

    Google Scholar 

  11. Scherer, S.: Early career patterns: A comparison of Great Britain and West Germany. European Sociological Review 17(2), 119–144 (2001)

    Article  Google Scholar 

  12. Brzinsky-Fay, C., Kohler, U., Luniak, M.: Sequence analysis with Stata. The Stata Journal 6(4), 435–460 (2006)

    Google Scholar 

  13. Lesnard, L.: Describing social rhythms with optimal matching (2007)

    Google Scholar 

  14. Elzinga, C.H.: CHESA 2.1 User manual. User guide, Dept of Social Science Research methods, Vrije Universiteit, Amsterdam (2007)

    Google Scholar 

  15. Elzinga, C.H.: Sequence similarity: A nonaligning technique. Sociological Methods & Research 32, 3–29 (2003)

    Article  MathSciNet  Google Scholar 

  16. Elzinga, C.H.: Combinatorial representations of token sequences. Journal of Classification 22(22), 87–118 (2005)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müller, N.S., Gabadinho, A., Ritschard, G., Studer, M. (2008). Extracting Knowledge from Life Courses: Clustering and Visualization. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics