Skip to main content

Interesting Patterns

  • Chapter
  • First Online:

Abstract

Pattern mining is one of the most important aspects of data mining. By far the most popular and well-known approach is frequent pattern mining. That is, to discover patterns that occur in many transactions. This approach has many virtues including monotonicity, which allows efficient discovery of all frequent patterns. Nevertheless, in practice frequent pattern mining rarely gives good results—the number of discovered patterns is typically gargantuan and they are heavily redundant.

Consequently, a lot of research effort has been invested toward improving the quality of the discovered patterns. In this chapter we will give an overview of the interestingness measures and other redundancy reduction techniques that have been proposed to this end.

In particular, we first present classic techniques such as closed and non-derivable itemsets that are used to prune unnecessary itemsets. We then discuss techniques for ranking patterns on how expected their score is under a null hypothesis—considering patterns that deviate from this expectation to be interesting. These models can either be static, as well as dynamic; we can iteratively update this model as we discover new patterns. More generally, we also give a brief overview on pattern set mining techniques, where we measure quality over a set of patterns, instead of individually. This setup gives us freedom to explicitly punish redundancy which leads to a more to-the-point results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, pages 487–499, 1994.

    Google Scholar 

  2. C. C. Aggarwal and P. S. Yu. A new framework for itemset generation. In PODS, pages 18–24. ACM, 1998.

    Google Scholar 

  3. R. Agrawal, T. Imielinksi, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207–216. ACM, 1993.

    Google Scholar 

  4. M. Al Hasan and M. J. Zaki. Output space sampling for graph patterns. PVLDB, 2(1):730–741, 2009.

    Google Scholar 

  5. M. Al Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. Origami: Mining representative orthogonal graph patterns. In ICDM, pages 153–162. IEEE, 2007.

    Google Scholar 

  6. R. Bayardo. Efficiently mining long patterns from databases. In SIGMOD, pages 85–93, 1998.

    Google Scholar 

  7. M. Boley, C. Lucchese, D. Paurat, and T. Gärtner. Direct local pattern sampling by efficient two-step random procedures. In KDD, pages 582–590. ACM, 2011.

    Google Scholar 

  8. M. Boley, S. Moens, and T. Gärtner. Linear space direct pattern sampling using coupling from the past. In KDD, pages 69–77. ACM, 2012.

    Google Scholar 

  9. J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Disc., 7(1):5–22, 2003.

    Article  MathSciNet  Google Scholar 

  10. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In SIGMOD, pages 265–276. ACM, 1997.

    Google Scholar 

  11. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. MAFIA: A maximal frequent itemset algorithm. IEEE TKDE, 17(11):1490–1504, 2005.

    Google Scholar 

  12. T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In PKDD, pages 74–85, 2002.

    Google Scholar 

  13. C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE TIT, 14(3):462–467, 1968.

    MATH  MathSciNet  Google Scholar 

  14. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. IEEE TKDE, 13(1):64–78, 2001.

    Google Scholar 

  15. R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic networks and expert systems. In Statistics for Engineering and Information Science. Springer-Verlag, 1999.

    Google Scholar 

  16. I. Csiszár. I-divergence geometry of probability distributions and minimization problems. Annals Prob., 3(1):146–158, 1975.

    Google Scholar 

  17. T. De Bie. An information theoretic framework for data mining. In KDD, pages 564–572. ACM, 2011.

    Google Scholar 

  18. T. De Bie. Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Disc., 23(3):407–446, 2011.

    Article  MATH  MathSciNet  Google Scholar 

  19. R. A. Fisher. On the interpretation of χ2from contingency tables, and the calculation of P. J. R. Statist. Soc., 85(1):87–94, 1922.

    Article  Google Scholar 

  20. A. Gallo, N. Cristianini, and T. De Bie. MINI: Mining informative non-redundant itemsets. In ECML PKDD, pages 438–445. Springer, 2007.

    Google Scholar 

  21. F. Geerts, B. Goethals, and T. Mielikäinen. Tiling databases. In DS, pages 278–289, 2004.

    Google Scholar 

  22. A. Gionis, H. Mannila, and J. K. Seppänen. Geometric and combinatorial tiles in 0-1 data. In PKDD, pages 173–184. Springer, 2004.

    Google Scholar 

  23. A. Gionis, H. Mannila, T. Mielikäinen, and P. Tsaparas. Assessing data mining results via swap randomization. TKDD, 1(3):167–176, 2007.

    Article  Google Scholar 

  24. B. Goethals and M. Zaki. Frequent itemset mining dataset repository (FIMI). http://fimi.ua.ac.be/, 2004.

  25. W. Hämäläinen. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Sys., 32(2):383–414, 2012.

    Google Scholar 

  26. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, pages 1–12. ACM, 2000.

    Google Scholar 

  27. D. Hand, N. Adams, and R. Bolton, editors. Pattern Detection and Discovery. Springer-Verlag, 2002.

    Google Scholar 

  28. S. Hanhijärvi, G. C. Garriga, and K. Puolamäki. Randomization techniques for graphs. In SDM, pages 780–791. SIAM, 2009.

    Google Scholar 

  29. S. Hanhijärvi, M. Ojala, N. Vuokko, K. Puolamäki, N. Tatti, and H. Mannila. Tell me something I don’t know: randomization strategies for iterative data mining. In KDD, pages 379–388. ACM, 2009.

    Google Scholar 

  30. H. Heikinheimo, J. K. Seppänen, E. Hinkkanen, H. Mannila, and T. Mielikäinen. Finding low-entropy sets and trees from binary data. In KDD, pages 350–359, 2007.

    Google Scholar 

  31. H. Heikinheimo, J. Vreeken, A. Siebes, and H. Mannila. Low-entropy set selection. Lowentropy set selection. In SDM, pages 569–580, 2009.

    Google Scholar 

  32. A. Henelius, J. Korpela, and K. Puolamäki. Explaining interval sequences by randomization. In ECML PKDD, pages 337–352. Springer, 2013.

    Google Scholar 

  33. IBM. IBM Intelligent Miner User’s Guide, Version 1, Release 1, 1996.

    Google Scholar 

  34. S. Jaroszewicz and D. A. Simovici. Interestingness of frequent itemsets using bayesian networks as background knowledge. In KDD, pages 178–186. ACM, 2004.

    Google Scholar 

  35. E. Jaynes. On the rationale of maximum-entropy methods. Proc. IEEE, 70(9):939–952, 1982.

    Article  Google Scholar 

  36. R. M. Karp. Reducibility among combinatorial problems. In Proc. Compl. Comp. Comput., pages 85–103, New York, USA, 1972.

    Google Scholar 

  37. K.-N. Kontonasios and T. De Bie. An information-theoretic approach to finding noisy tiles in binary databases. In SDM, pages 153–164. SIAM, 2010.

    Google Scholar 

  38. K.-N. Kontonasios and T. De Bie. Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In IDA, pages 161–171, 2012.

    Google Scholar 

  39. J. Lijffijt, P. Papapetrou, and K. Puolamäki. A statistical significance testing approach to mining the most informative set of patterns. Data Min. Knowl. Disc., pages 1–26, 2012.

    Google Scholar 

  40. C. Lucchese, S. Orlando, and R. Perego. Mining top-k patterns from binary datasets in presence of noise. In SDM, pages 165–176, 2010.

    Google Scholar 

  41. M. Mampaey. Mining non-redundant information-theoretic dependencies between itemsets. In DaWaK, pages 130–141, 2010.

    Google Scholar 

  42. M. Mampaey, J. Vreeken, and N. Tatti. Summarizing data succinctly with the most informative itemsets. TKDD, 6:1–44, 2012.

    Article  Google Scholar 

  43. H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In KDD, pages 189–194, 1996.

    Google Scholar 

  44. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In KDD, pages 181–192, 1994.

    Google Scholar 

  45. H. Mannila, H. Toivonen, and A. I. Verkamo. Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc., 1(3):241–258, 1997.

    Article  Google Scholar 

  46. R. Meo. Theory of dependence values. ACM Trans. Database Syst., 25(3):380–406, 2000.

    Article  Google Scholar 

  47. P. Miettinen and J. Vreeken. Model order selection for Boolean matrix factorization. In KDD, pages 51–59. ACM, 2011.

    Google Scholar 

  48. P. Miettinen and J. Vreeken. mdl4bmf: Minimum description length for Boolean matrix factorization. Technical Report MPI-I-2012-5-001, Max Planck Institute for Informatics, 2012.

    Google Scholar 

  49. P. Miettinen, T. Mielikäinen, A. Gionis, G. Das, and H. Mannila. The discrete basis problem. IEEE TKDE, 20(10):1348–1362, 2008.

    Google Scholar 

  50. F. Moerchen, M. Thies, and A. Ultsch. Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl. Inf. Sys., 29(1):55–80, 2011.

    Article  Google Scholar 

  51. M. Ojala. Assessing data mining results on matrices with randomization. In ICDM, pages 959–964, 2010.

    Google Scholar 

  52. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398–416. ACM, 1999.

    Google Scholar 

  53. D. Pavlov, H. Mannila, and P. Smyth. Beyond independence: Probabilistic models for query approximation on binary transaction data. IEEE TKDE, 15(6):1409–1421, 2003.

    Google Scholar 

  54. J. Pei, A. K. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. Data Min. Knowl. Disc., 1:4–2, 2001.

    Google Scholar 

  55. R. G. Pensa, C. Robardet, and J.-F. Boulicaut. A bi-clustering framework for categorical data. In PKDD, pages 643–650. Springer, 2005.

    Google Scholar 

  56. A. K. Poernomo and V. Gopalkrishnan. Towards efficient mining of proportional fault-tolerant frequent itemsets. In KDD, pages 697–706, New York, NY, USA, 2009. ACM.

    Google Scholar 

  57. G. Rasch. Probabilistic Models for Some Intelligence and Attainnment Tests. Danmarks paedagogiske Institut, 1960.

    Google Scholar 

  58. J. Rissanen. Modeling by shortest data description. Automatica, 14(1):465–471, 1978.

    Article  MATH  Google Scholar 

  59. G. Schwarz. Estimating the dimension of a model. Annals Stat., 6(2):461–464, 1978.

    Article  MATH  Google Scholar 

  60. J. K. Seppanen and H. Mannila. Dense itemsets. In KDD, pages 683–688, 2004.

    Google Scholar 

  61. A. Siebes and R. Kersten. A structure function for transaction data. In SDM, pages 558–569. SIAM, 2011.

    Google Scholar 

  62. A. Siebes, J. Vreeken, and M. van Leeuwen. Item sets that compress. In SDM, pages 393–404. SIAM, 2006.

    Google Scholar 

  63. N. Tatti. Computational complexity of queries based on itemsets. Inf. Process. Lett., 98(5):183–187, 2006.

    Article  MATH  MathSciNet  Google Scholar 

  64. N. Tatti. Maximum entropy based significance of itemsets. Knowl. Inf. Sys., 17(1):57–77, 2008.

    Article  Google Scholar 

  65. N. Tatti and M. Mampaey. Using background knowledge to rank itemsets. Data Min. Knowl. Disc., 21(2):293–309, 2010.

    Article  MathSciNet  Google Scholar 

  66. N. Tatti and F. Moerchen. Finding robust itemsets under subsampling. In ICDM, pages 705–714. IEEE, 2011.

    Google Scholar 

  67. N. Tatti and J. Vreeken. Comparing apples and oranges - measuring differences between exploratory data mining results. Data Min. Knowl. Disc., 25(2):173–207, 2012.

    Article  MATH  MathSciNet  Google Scholar 

  68. N. Tatti and J. Vreeken. Discovering descriptive tile trees by fast mining of optimal geometric subtiles. In ECML PKDD. Springer, 2012.

    Google Scholar 

  69. C. Tew, C. Giraud-Carrier, K. Tanner, and S. Burton. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min. Knowl. Disc., pages 1–42, 2013.

    Google Scholar 

  70. J. Vreeken, M. van Leeuwen, and A. Siebes. Krimp: Mining itemsets that compress. Data Min. Knowl. Disc., 23(1):169–214, 2011.

    Article  MATH  Google Scholar 

  71. C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD, pages 730–735, 2006.

    Google Scholar 

  72. G. I. Webb. Self-sufficient itemsets: An approach to screening potentially interesting associations between items. TKDD, 4(1):1–20, 2010.

    Google Scholar 

  73. G. I. Webb. Filtered-top-k association discovery. WIREs DMKD, 1(3):183–192, 2011.

    Google Scholar 

  74. Y. Xiang, R. Jin, D. Fuhry, and F. F. Dragan. Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In KDD, pages 758–766, 2008.

    Google Scholar 

  75. Y. Xiang, R. Jin, D. Fuhry, and F. Dragan. Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Disc., 2010.

    Google Scholar 

  76. M. J. Zaki. Scalable algorithms for association mining. IEEE TKDE, 12(3):372–390, 2000.

    MathSciNet  Google Scholar 

  77. M. J. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In SDM, pages 457–473. SIAM, 2002.

    Google Scholar 

  78. M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In KDD, Aug 1997.

    Google Scholar 

Download references

Acknowledgments

Jilles Vreeken is supported by the Cluster of Excellence “Multimodal Computing and Interaction” within the Excellence Initiative of the German Federal Government. Nikolaj Tatti is supported by Academy of Finland grant 118653 (sc algodan).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jilles Vreeken .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Vreeken, J., Tatti, N. (2014). Interesting Patterns. In: Aggarwal, C., Han, J. (eds) Frequent Pattern Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-07821-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07821-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07820-5

  • Online ISBN: 978-3-319-07821-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics