Combining Multiple Interrelated Streams for Incremental Clustering

Siddiqui, Zaigham Faraz; Spiliopoulou, Myra

doi:10.1007/978-3-642-02279-1_38

Zaigham Faraz Siddiqui¹⁷ &
Myra Spiliopoulou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1446 Accesses
10 Citations
1 Altmetric

Abstract

Many data mining applications analyze structured data that span across many tables and accumulate in time. Incremental mining methods have been devised to adapt patterns to new tuples. However, they have been designed for data in one table only. We propose a method for incremental clustering on multiple interrelated streams - a “multi-table stream”: its components are streams that reference each other, arrive at different speeds and have attributes of a priori unknown value ranges. Our approach encompasses solutions for the maintenance of cach-es and sliding windows over the individual streams, the propagation of foreign keys across streams, the transformation of all streams into a single-table stream, and an incremental clustering algorithm that operates over that stream. We evaluate our method on two real datasets and show that it approximates well the performance of an ideal method that possesses unlimited resources and knows the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blockeel, H., Raedt, L.D.: Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998)
Article MathSciNet MATH Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of relational association rules. In: Relational Data Mining, pp. 189–212. Springer, Heidelberg (2001)
Chapter Google Scholar
Dehaspe, L., Toivonen, H.: Discovery of frequent datalog patterns. Data Min. Knowl. Discov. 3(1), 7–36 (1999)
Article Google Scholar
Dzeroski, S., Lavrač, N.: Inductive learning in deductive databases. IEEE TKDE 5(6), 939–949 (1993)
Google Scholar
Kramer, S., Widmer, G.: Inducing classification and regression trees in first order logic. In: Relational Data Mining, pp. 140–156. Springer, Heidelberg (2001)
Chapter Google Scholar
Kroegel, M.A.: On Propositionalization for Knowledge Discovery in Relational Databases. PhD thesis, Otto-von-Guericke-University Magdeburg, Germany (2003)
Google Scholar
Lavrač, N., Flach, P.: An extended transformation approach to inductive logic programming. ACM Trans. Comput. Logic 2(4), 458–494 (2001)
Article Google Scholar
Bartolini, I., Ciaccia, P., Ntoutsi, I., Patella, M., Theodoridis, Y.: A unified and flexible framework for comparing simple and complex patterns. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 496–499. Springer, Heidelberg (2004)
Chapter Google Scholar
Maddalena, A., Catania, B.: Towards an interoperable solution for pattern management. In: 3rd Int. Workshop on Database Interoperability INTERDB 2007 (in conjunction with VLDB 2007), Vienna, Austria (September 2007)
Google Scholar
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE TKDE 15(3), 515–528 (2003)
Google Scholar
Beringer, J., Huellermeier, E.: Online clustering of parallel data streams. Data & Knowledge Engineering 58(2), 180–204 (2006)
Article Google Scholar
Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. of SIGMOD 2003 (2003)
Google Scholar
Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: Proc. of VLDB 2004, VLDB Endowment, pp. 324–335 (2004)
Google Scholar
Xie, J., Yang, J., Chen, Y.: On joining and caching stochastic streams. In: Proc. of SIGMOD 2005, pp. 359–370. ACM, New York (2005)
Google Scholar
Muggleton, S., Raedt, L.D.: Inductive logic programming: Theory and methods. J. Log. Program. 19/20, 629–679 (1994)
Article MathSciNet MATH Google Scholar
Emde, W., Wettschereck, D.: Relational instance based learning. In: Saitta, L. (ed.) Proc. of ICML 1996, pp. 122–130. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Kirsten, M., Wrobel, S., Horváth, T.: Distance based approaches to relational learning and clustering. In: Rel. Data Mining, pp. 213–230. Springer, Heidelberg (2001)
Chapter Google Scholar
Knobbe, A.J., de Haas, M., Siebes, A.: Propositionalisation and aggregates. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS, vol. 2168, pp. 277–388. Springer, Heidelberg (2001)
Chapter Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: Yale: Rapid prototyping for complex data mining tasks. In: Ungar, L., Craven, M., Gunopulos, D., Eliassi-Rad, T. (eds.) Proc. of KDD 2006, pp. 935–940. ACM, New York (2006)
Google Scholar
Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2(2), 86–98 (2000)
Article Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Intro. to Data Mining. Wiley, Chichester (2004)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Otto-von-Guericke-University of Magdeburg, Magdeburg, 39106, Germany
Zaigham Faraz Siddiqui & Myra Spiliopoulou

Authors

Zaigham Faraz Siddiqui
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, IL 61801, Urbana, USA
Marianne Winslett

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siddiqui, Z.F., Spiliopoulou, M. (2009). Combining Multiple Interrelated Streams for Incremental Clustering. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-02279-1_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics