Abstract
This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the n most frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deonier, R., Tavaré, S., Waterman, M.: Computational Genome Analysis: an Introduction. Springer, Heidelberg (2005)
Needleman, S.B., Wunsch, C.: General method applicable to the search for similarities in the animo acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Kruskal, J.: An overview of sequence comparison. In: Time warps, string edits, and macromolecules. The theory and practice of sequence comparison, pp. 1–44. Adison-Wesley, Don Mills (1983)
Abbott, A., Forrest, J.: Optimal matching methods for historical sequences. Journal of Interdisciplinary History 16, 471–494 (1986)
Abbott, A., Hrycak, A.: Measuring resemblance in sequence data: An optimal matching analaysis of musician’s careers. American Journal of Sociolgy 96(1), 144–185 (1990)
Abbott, A., Tsay, A.: Sequence analysis and optimal matching methods in sociology, Review and prospect. Sociological Methods and Research 29(1), 3–33 (2000) (With discussion, pp 34-76)
Rohwer, G., Pötter, U.: TDA user’s manual. Software, Ruhr-Universität Bochum, Fakultät für Sozialwissenschaften, Bochum (2002)
Wu, L.: Some comments on sequence analysis and optimal matching methods in sociology: Review and prospect. Sociological Methods and Research 29, 41–64 (2000)
Notredame, C., Bucher, P., Gauthier, J.A., Widmer, E.: T-COFFEE/SALTT: User guide and reference manual (2005), Available at, http://www.tcoffee.org/saltt
Gauthier, J.A., Widmer, E.D., Bucher, P., Notredame, C.: How much does it cost? Optimization of costs in sequence analysis of social science data. Sociological Methods and Research (forthcoming, 2008)
Scherer, S.: Early career patterns: A comparison of Great Britain and West Germany. European Sociological Review 17(2), 119–144 (2001)
Brzinsky-Fay, C., Kohler, U., Luniak, M.: Sequence analysis with Stata. The Stata Journal 6(4), 435–460 (2006)
Lesnard, L.: Describing social rhythms with optimal matching (2007)
Elzinga, C.H.: CHESA 2.1 User manual. User guide, Dept of Social Science Research methods, Vrije Universiteit, Amsterdam (2007)
Elzinga, C.H.: Sequence similarity: A nonaligning technique. Sociological Methods & Research 32, 3–29 (2003)
Elzinga, C.H.: Combinatorial representations of token sequences. Journal of Classification 22(22), 87–118 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Müller, N.S., Gabadinho, A., Ritschard, G., Studer, M. (2008). Extracting Knowledge from Life Courses: Clustering and Visualization. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-85836-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)