Abstract
Many data mining applications analyze structured data that span across many tables and accumulate in time. Incremental mining methods have been devised to adapt patterns to new tuples. However, they have been designed for data in one table only. We propose a method for incremental clustering on multiple interrelated streams - a “multi-table stream”: its components are streams that reference each other, arrive at different speeds and have attributes of a priori unknown value ranges. Our approach encompasses solutions for the maintenance of cach-es and sliding windows over the individual streams, the propagation of foreign keys across streams, the transformation of all streams into a single-table stream, and an incremental clustering algorithm that operates over that stream. We evaluate our method on two real datasets and show that it approximates well the performance of an ideal method that possesses unlimited resources and knows the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blockeel, H., Raedt, L.D.: Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998)
Dehaspe, L., Toivonen, H.: Discovery of relational association rules. In: Relational Data Mining, pp. 189–212. Springer, Heidelberg (2001)
Dehaspe, L., Toivonen, H.: Discovery of frequent datalog patterns. Data Min. Knowl. Discov. 3(1), 7–36 (1999)
Dzeroski, S., Lavrač, N.: Inductive learning in deductive databases. IEEE TKDE 5(6), 939–949 (1993)
Kramer, S., Widmer, G.: Inducing classification and regression trees in first order logic. In: Relational Data Mining, pp. 140–156. Springer, Heidelberg (2001)
Kroegel, M.A.: On Propositionalization for Knowledge Discovery in Relational Databases. PhD thesis, Otto-von-Guericke-University Magdeburg, Germany (2003)
Lavrač, N., Flach, P.: An extended transformation approach to inductive logic programming. ACM Trans. Comput. Logic 2(4), 458–494 (2001)
Bartolini, I., Ciaccia, P., Ntoutsi, I., Patella, M., Theodoridis, Y.: A unified and flexible framework for comparing simple and complex patterns. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 496–499. Springer, Heidelberg (2004)
Maddalena, A., Catania, B.: Towards an interoperable solution for pattern management. In: 3rd Int. Workshop on Database Interoperability INTERDB 2007 (in conjunction with VLDB 2007), Vienna, Austria (September 2007)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE TKDE 15(3), 515–528 (2003)
Beringer, J., Huellermeier, E.: Online clustering of parallel data streams. Data & Knowledge Engineering 58(2), 180–204 (2006)
Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. of SIGMOD 2003 (2003)
Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: Proc. of VLDB 2004, VLDB Endowment, pp. 324–335 (2004)
Xie, J., Yang, J., Chen, Y.: On joining and caching stochastic streams. In: Proc. of SIGMOD 2005, pp. 359–370. ACM, New York (2005)
Muggleton, S., Raedt, L.D.: Inductive logic programming: Theory and methods. J. Log. Program. 19/20, 629–679 (1994)
Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Proc. of ICML 1996, pp. 122–130. Morgan Kaufmann, San Francisco (1996)
Kirsten, M., Wrobel, S., Horváth, T.: Distance based approaches to relational learning and clustering. In: Rel. Data Mining, pp. 213–230. Springer, Heidelberg (2001)
Knobbe, A.J., de Haas, M., Siebes, A.: Propositionalisation and aggregates. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS, vol. 2168, pp. 277–388. Springer, Heidelberg (2001)
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: Yale: Rapid prototyping for complex data mining tasks. In: Ungar, L., Craven, M., Gunopulos, D., Eliassi-Rad, T. (eds.) Proc. of KDD 2006, pp. 935–940. ACM, New York (2006)
Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2(2), 86–98 (2000)
Tan, P.N., Steinbach, M., Kumar, V.: Intro. to Data Mining. Wiley, Chichester (2004)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Siddiqui, Z.F., Spiliopoulou, M. (2009). Combining Multiple Interrelated Streams for Incremental Clustering. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-02279-1_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)