Abstract
Mining frequent patterns is a major topic in data mining research, resulting in many seminal papers and algorithms on item set and episode discovery. The combination of these, called composite episodes, has attracted far less attention in literature, however. The main reason is that the well-known frequent pattern explosion is far worse for composite episodes than it is for item sets or episodes. Yet, there are many applications where composite episodes are required, e.g., in developmental biology were sequences containing gene activity sets over time are analyzed.
This paper introduces an effective algorithm for the discovery of a small, descriptive set of composite episodes. It builds on our earlier work employing MDL for finding such sets for item sets and episodes. This combination yields an optimization problem. For the best results the components descriptive power has to be balanced. Again, this problem is solved using MDL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–289 (1997)
Zhang, S., Zhang, J., Zhu, X., Huang, Z.: Identifying follow-correlation itemset-pairs. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 765–774. IEEE Computer Society, Washington (2006)
Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: KDD 2006: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 730–735. ACM Press, New York (2006)
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression picks item sets that matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Siebes, A., Vreeken, J., van Leeuwen, M.: Itemsets that compress. In: SIAM 2006: Proceedings of the SIAM Conference on Data Mining, Maryland, USA, pp. 393–404 (2006)
Bathoorn, R., Koopman, A., Siebes, A.: Reducing the frequent pattern set. In: Tsumoto, S., Clifton, C., Zhong, N., Wu, X., Liu, J., Wah, B., Cheung, Y.M. (eds.) ICDM 2006: Proceedings of the 6th International Conference on Data Mining - Workshops, ICDM workshops, vol. 6, pp. 55–59. IEEE Computer Society, Los Alamitos (2006)
Grünwald, P.: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length, MIT Press, Cambridge (2005)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, 05 2000, pp. 1–12. ACM Press, New York (2000)
Welten, M.C.M., Verbeek, F.J., Meijer, A.H., Richardson, M.K.: Gene expression and digit homology in the chicken embryo wing. Evolution & Development 7, 18–28 (2005)
Rácz, B., Bodon, F., Schmidt-Thieme, L.: On benchmarking frequent itemset mining algorithms. In: Proceedings of the 1st International Workshop on Open Source Data Mining, in conjunction with ACM SIGKDD (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bathoorn, R., Siebes, A. (2008). Finding Composite Episodes. In: RaÅ›, Z.W., Tsumoto, S., Zighed, D. (eds) Mining Complex Data. MCD 2007. Lecture Notes in Computer Science(), vol 4944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68416-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-68416-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68415-2
Online ISBN: 978-3-540-68416-9
eBook Packages: Computer ScienceComputer Science (R0)