Skip to main content
Log in

Interesting pattern mining in multi-relational data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-relational data. Nevertheless, multi-relational data is a more truthful and therefore often also a more powerful representation of reality. Mining patterns of a suitably expressive syntax directly from this representation, is thus a research problem of great importance. In this paper we introduce a novel approach to mining patterns in multi-relational data. We propose a new syntax for multi-relational patterns as complete connected subsets of database entities. We show how this pattern syntax is generally applicable to multi-relational data, while it reduces to well-known tiles “ Geerts et al. (Proceedings of Discovery Science, pp 278–289, 2004)” when the data is a simple binary or attribute-value table. We propose RMiner, a simple yet practically efficient divide and conquer algorithm to mine such patterns which is an instantiation of an algorithmic framework for efficiently enumerating all fixed points of a suitable closure operator “Boley et al. (Theor Comput Sci 411(3):691–700, 2010)”. We show how the interestingness of patterns of the proposed syntax can conveniently be quantified using a general framework for quantifying subjective interestingness of patterns “De Bie (Data Min Knowl Discov 23(3):407–446, 2011b)”. Finally, we illustrate the usefulness and the general applicability of our approach by discussing results on real-world and synthetic databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. In contrast to some traditional fixpoint enumeration algorithms, as they are for instance used in the context of formal concept analysis, this divide and conquer approach does neither assume an underlying complete lattice nor that the fixpoint set is closed under intersection. This is important because the set system of CCSs is not necessarily closed under intersection (due to connectivity) and two MCCSs cannot be joined to a common supremum (due to completeness).

  2. Strongly accessible set systems generalize greedoids such as, e.g., poset ideals [see Boley (2011, Sect. 3.5.2) and Korte and Lovász (1985)].

  3. Please note that by entities and entity types here, we actually refer to our notion of the terms. The same notions are defined as objects and entities respectively in Nijssen et al. (2011).

  4. Note that practically, the quadratic space complexity of RMiner results from multiplying a linear space complexity with the maximal search tree depth, which, as we will show in Sect. 7.3, is practically a small constant. Also, as we discussed in Sect. 3.5, the practical time delay of RMiner depends on the density of the data set and can be optimised in practice by taking particular implementation choices. Thus, even though the theoretical complexities of Makino and Uno (2004) and RMiner are comparable, RMiner probably scales better in practice.

  5. See http://www.imdb.com/

  6. See http://www.informatik.uni-trier.de/~ley/db/

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499

  • Angles R, Gutierrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1:1–1:39

    Google Scholar 

  • Birkhoff G (1967) Lattice theory. American Mathematical Society, Providence

    MATH  Google Scholar 

  • Boley M (2011) The efficient discovery of interesting closed pattern collections. PhD thesis, University of Bonn, Bonn

  • Boley M, Horvath T, Poigné A, Wrobel S (2010) Listing closed sets of strongly accessible set systems with applications to data mining. Theor Comput Sci 411(3):691–700

    Article  MATH  Google Scholar 

  • Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577

    Article  MATH  Google Scholar 

  • Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504

    Article  Google Scholar 

  • Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Discov 14(1):171–206

    Article  MathSciNet  Google Scholar 

  • Cerf L, Besson J, Robardet C, Boulicaut JF (2009) Closed patterns meet n-ary relations. ACM Trans Knowl Discov Data 3(1):3:1–3:36

    Google Scholar 

  • Cover TM, Thomas JA (2005) Elements of information theory. Wiley, Hoboken

    Book  Google Scholar 

  • De Bie T (2011a) An information theoretic framework for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 564–572

  • De Bie T (2011b) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–446

    Article  MATH  MathSciNet  Google Scholar 

  • De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. In: SIGKDD explorations, pp 92–100

  • De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the SIAM international conference on data mining (SDM), pp 237–248

  • Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3:7–36

    Article  Google Scholar 

  • Elmasri R, Navathe SB (2006) Fundamentals of database systems. Addison Wesley, Boston

    Google Scholar 

  • Garriga GC, Khardon R, De Raedt L (2007) On mining closed sets in multi-relational data. In: Proceedings of the 20th international joint conference on artifical intelligence (IJCAI), pp 804–809

  • Geerts F, Goethals B, Mielikainen T (2004) Tiling databases. In: Proceedings of discovery science, pp 278–289

  • Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. In: ACM computing surveys, vol 38. ACM, New York

  • Gionis A, Mannila H, Mielikinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3):14

    Article  Google Scholar 

  • Goethals B, Le Page W (2008) Mining association rules of simple conjunctive queries. In: Proceedings of the SIAM international conference on data mining (SDM), Atlanta

  • Goethals B, Page WL, Mampaey M (2010) Mining interesting sets and rules in relational databases. In: Proceedings of the ACM symposium on applied computing (SAC), pp 997–1001

  • Gupta R, Fang G, Field B, Steinbach M, Kumar V (2008) Quantitative evaluation of approximate frequent pattern mining algorithms. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 301–309

  • Hanhijarvi S, Ojala M, Vuokko N, Puolamaki K, Tatti N, Mannila H (2009) Tell me something i don’t know: randomization strategies for iterative data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 379–388

  • Jäschke R, Hotho A, Schmitz C, Ganter B, Stumme G (2008) Discovering shared conceptualizations in folksonomies. Web Semant 6(1):38–53

    Article  Google Scholar 

  • Jen TY, Laurent D, Spyratos N (2010) Computing supports of conjunctive queries on relational tables with functional dependencies. Fundam Inf 99(3):263–292

    MATH  MathSciNet  Google Scholar 

  • Ji M, Han J, Danilevsky M (2011) Ranking-based classification of heterogeneous information networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1298–1306

  • Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: ECML/PKDD (1), pp 570–586

  • Ji L, Tan KL, Tung AKH (2006) Mining frequent closed cubes in 3d datasets. In: Proceedings of the international conference on very large data bases, VLDB endowment, VLDB, pp 811–822

  • Kontonasios K, Spyropoulou E, De Bie T (2012) Knowledge discovery interestingness measures based on unexpectedness. In: Wiley interdisciplinary reviews: data mining and knowledge discovery, pp 386–399

  • Koopman A, Siebes A (2008) Discovering relational item sets efficiently. In: Proceedings of the SIAM conference on data mining (SDM), pp 108–119

  • Koopman A, Siebes A (2009) Characteristic relational patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 437–446

  • Korte B, Lovász L (1985) Relations between subclasses of greedoids. Math Methods Oper Res 29:249–267

    Article  Google Scholar 

  • Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 313–320

  • Lawler EL, Lenstra JK, Kan AHGR (1980) Generating all maximal independent sets: Np-hardness and polynomial-time algorithms. SIAM J Comput 9(3):558–565

    Article  MATH  MathSciNet  Google Scholar 

  • Makino K, Uno T (2004) New algorithms for enumerating all maximal cliques. In: Scandinavia workshop on algorithm theory (SWAT), pp 260–272

  • Maruhashi K, Guo F, Faloutsos C (2011) Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In: Proceedings of the international conference on advances in social networks analysis and mining, ASONAM ’11, pp 203–210

  • Ng EKK, Ng K, Fu AWC, Wang K (2002) Mining association rules from stars. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 322–329

  • Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: ICDM workshops, pp 1120–1127

  • Nijssen S, Kok J (2003) Efficient frequent query discovery in FARMER. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 350–362

  • Ojala M, Garriga GC, Gionis A, Mannila H (2010) Evaluating query result significance in databases via randomizations. In: Proceedings of the SIAM conference on data mining (SDM), pp 906–917

  • Pardalos PM, Xue J (1994) The maximum clique problem. J Glob Optim 4:301–328

    Article  MATH  MathSciNet  Google Scholar 

  • Poernomo AK, Gopalkrishnan V (2009) Towards efficient mining of proportional fault-tolerant frequent itemsets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 697–706

  • Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the SIAM conference on data mining (SDM), pp 393–404

  • Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 675–684

  • Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12

  • Sun Y, Han J, Aggarwal CC, Chawla NV (2012a) When will it happen?: relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pp 663–672

  • Sun Y, Norick B, Han J, Yan X, Yu PS, Yu X (2012b) Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: KDD, pp 1348–1356

  • Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 797–806

  • Tang L, Wang X, Liu H (2012) Community detection via heterogeneous interaction analysis. Data Min Knowl Discov 25(1):1–33

    MathSciNet  Google Scholar 

  • Trabelsi C, Jelassi N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining, pp 231–242

  • Uno T, Asai T, Uchida Y, Arimura H (2004a) An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery science, pp 16–31

  • Uno T, Kiyomi M, Arimura H (2004b) Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI), Brighton

  • Voutsadakis G (2002) Polyadic concept analysis. Order 19(3):295–304

    Article  MATH  MathSciNet  Google Scholar 

  • Yahia B, Hamrouni T, Nguifo EM (2006) Frequent closed itemset based algorithms: a thorough structural and analytical survey. SIGKDD Explor Newsl 8(1):93–104

    Article  Google Scholar 

  • Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 721–730

  • Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 286–295

  • Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  MathSciNet  Google Scholar 

  • Zaki M, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  • Zaki MJ, Peters M, Assent I, Seidl T (2007) Clicks: an effective algorithm for mining subspace clusters in categorical datasets. Data Knowl Eng 60(1):51–70

    Article  Google Scholar 

  • Zaki M, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the SIAM international conference on data mining (SDM), pp 457–473

  • Zaki M, Ogihara M (1998) Theoretical foundations of association rules. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego

Download references

Acknowledgments

We are grateful to Michael Mampaey for providing the Smurfig code and data and for his support in using Smurfig, Siegfried Nijssen for his assistance in using Farmer and Thomas Gärtner for discussions on this work. This work was partially funded by PASCAL 2 Network of Excellence. Eirini Spyropoulou and Tijl De Bie are supported by EPSRC Grant EP/G056447/1. Mario Boley is partially funded by DFG (German National Research Foundation) under GA 1615/2-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eirini Spyropoulou.

Additional information

Responsible editor: M.J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spyropoulou, E., De Bie, T. & Boley, M. Interesting pattern mining in multi-relational data. Data Min Knowl Disc 28, 808–849 (2014). https://doi.org/10.1007/s10618-013-0319-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0319-9

Keywords

Navigation