Extracting Knowledge from Life Courses: Clustering and Visualization

Müller, Nicolas S.; Gabadinho, Alexis; Ritschard, Gilbert; Studer, Matthias

doi:10.1007/978-3-540-85836-2_17

Nicolas S. Müller¹,
Alexis Gabadinho¹,
Gilbert Ritschard¹ &
…
Matthias Studer¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1870 Accesses
10 Citations

Abstract

This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the n most frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Deonier, R., Tavaré, S., Waterman, M.: Computational Genome Analysis: an Introduction. Springer, Heidelberg (2005)
MATH Google Scholar
Needleman, S.B., Wunsch, C.: General method applicable to the search for similarities in the animo acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Article Google Scholar
Kruskal, J.: An overview of sequence comparison. In: Time warps, string edits, and macromolecules. The theory and practice of sequence comparison, pp. 1–44. Adison-Wesley, Don Mills (1983)
Google Scholar
Abbott, A., Forrest, J.: Optimal matching methods for historical sequences. Journal of Interdisciplinary History 16, 471–494 (1986)
Article Google Scholar
Abbott, A., Hrycak, A.: Measuring resemblance in sequence data: An optimal matching analaysis of musician’s careers. American Journal of Sociolgy 96(1), 144–185 (1990)
Article Google Scholar
Abbott, A., Tsay, A.: Sequence analysis and optimal matching methods in sociology, Review and prospect. Sociological Methods and Research 29(1), 3–33 (2000) (With discussion, pp 34-76)
Article Google Scholar
Rohwer, G., Pötter, U.: TDA user’s manual. Software, Ruhr-Universität Bochum, Fakultät für Sozialwissenschaften, Bochum (2002)
Google Scholar
Wu, L.: Some comments on sequence analysis and optimal matching methods in sociology: Review and prospect. Sociological Methods and Research 29, 41–64 (2000)
Article Google Scholar
Notredame, C., Bucher, P., Gauthier, J.A., Widmer, E.: T-COFFEE/SALTT: User guide and reference manual (2005), Available at, http://www.tcoffee.org/saltt
Gauthier, J.A., Widmer, E.D., Bucher, P., Notredame, C.: How much does it cost? Optimization of costs in sequence analysis of social science data. Sociological Methods and Research (forthcoming, 2008)
Google Scholar
Scherer, S.: Early career patterns: A comparison of Great Britain and West Germany. European Sociological Review 17(2), 119–144 (2001)
Article Google Scholar
Brzinsky-Fay, C., Kohler, U., Luniak, M.: Sequence analysis with Stata. The Stata Journal 6(4), 435–460 (2006)
Google Scholar
Lesnard, L.: Describing social rhythms with optimal matching (2007)
Google Scholar
Elzinga, C.H.: CHESA 2.1 User manual. User guide, Dept of Social Science Research methods, Vrije Universiteit, Amsterdam (2007)
Google Scholar
Elzinga, C.H.: Sequence similarity: A nonaligning technique. Sociological Methods & Research 32, 3–29 (2003)
Article MathSciNet Google Scholar
Elzinga, C.H.: Combinatorial representations of token sequences. Journal of Classification 22(22), 87–118 (2005)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Econometrics and Laboratory of Demography, University of Geneva,
Nicolas S. Müller, Alexis Gabadinho, Gilbert Ritschard & Matthias Studer

Authors

Nicolas S. Müller
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Gabadinho
View author publications
You can also search for this author in PubMed Google Scholar
Gilbert Ritschard
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Studer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller, N.S., Gabadinho, A., Ritschard, G., Studer, M. (2008). Extracting Knowledge from Life Courses: Clustering and Visualization. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-85836-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics