Skip to main content
Log in

Sampling frequent and minimal boolean patterns: theory and application in classification

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns in disjunctive normal form (DNF). We propose a novel theoretical characterization of the minimal DNF expressions, which allows us to prune the pattern search space effectively. Our approach can provide a near-uniform sample of the minimal DNF patterns. We perform an extensive set of experiments to demonstrate the effectiveness of our sampling method. We also show that minimal DNF patterns make effective features for classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad U et al (eds) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, pp 307–328

    Google Scholar 

  • Akutsu T, Kuhara S, Maruyama O, Miyano S (1998) Identification of gene regulatory networks by strategic gene disruptions and gene overexpressions. In: ACM-SIAM symposium on discrete algorithms

  • Antonie M-L, Zaiane O (2004) Mining positive and negative association rules: an approach for confined rules. In: European conference on principles and practice of knowledge discovery in databases

  • Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor 2(2):66–75

    Article  Google Scholar 

  • Bayardo RJ, Agrawal R (1999) Mining the most interesting rules. In: ACM SIGKDD international conference on knowledge discovery and data mining

  • Boley M, Gärtner T, Grosskreutz H, Fraunhofer I (2010) Formal concept sampling for counting and threshold-free local pattern mining. In: SIAM data mining conference

  • Boley M, Grosskreutz H (2009) Approximating the number of frequent sets in dense data. Knowl Inform Syst 21(1):65–89

    Article  Google Scholar 

  • Boley M, Lucchese C, Paurat D, Gärtner T (2011) Direct local pattern sampling by efficient two-step random procedures. In: ACM SIGKDD international conference on knowledge discovery and data mining

  • Bshouty N (1995) Exact learning boolean functions via the monotone theory. Inform Comput 123(1):146–153

    Article  MATH  MathSciNet  Google Scholar 

  • Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: European conference on principles and practice of knowledge discovery in databases

  • Calders T, Goethals B (2005) Quick inclusion-exclusion. In: Proceedings ECML-PKDD workshop on knowledge discovery in inductive databases

  • Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–39

    Article  Google Scholar 

  • Chaoji V, Hasan MA, Salem S, Besson J, Zaki MJ (2008) ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Stat Anal Data Min 1(2):67–84

    Article  MathSciNet  Google Scholar 

  • Cowles M, Carlin B (1996) Markov chain monte carlo convergence diagnostics: a comparative review. J Am Stat 91(434):883–904

    Article  MATH  MathSciNet  Google Scholar 

  • Curk T, Demsar J, Xu Q, Leban G, Petrovic U, Bratko I, Shaulsky G, Zupan B (2005) Microarray data mining with visual programming. Bioinformatics 21(3):396–398

    Article  Google Scholar 

  • Dong G, Jiang C, Pei J, Li J, Wong L (2005) Mining succinct systems of minimal generators of formal concepts. In: International conference database systems for advanced applications

  • Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on articial intelligence

  • Frank A, Asuncion A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences, (http://archive.ics.uci.edu/ml)

  • Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin

    Book  MATH  Google Scholar 

  • Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: report on FIMI’03. SIGKDD Explor 6(1):109–117

    Article  Google Scholar 

  • Gunopulos D, Khardon R, Mannila H, Saluja S, Toivonen H, Sharma R (2003) Discovering all most specific sentences. ACM Trans Database Syst 28(2):140–174

    Article  Google Scholar 

  • Gunopulos D, Mannila H, Saluja S (1997) Discovering all most specific sentences by randomized algorithm. In: 6th international conference on database theory

  • Hamrouni T, Yahia S Ben, Mephu Nguifo E (2009) Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets. Data & Knowl Eng 68(10):1091–1111

    Article  Google Scholar 

  • Hasan MA, Zaki MJ (2009) Musk: uniform sampling of k maximal patterns. In: 9th SIAM international conference on data mining

  • Hasan MA, Zaki MJ (2009) Output space sampling for graph patterns. Proc VLDB Endow 2(1):730–741

    Article  Google Scholar 

  • Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: Proceedings of the eleventh international joint conference on artificial intelligence

  • Jaroszewicz S, Simovici DA (2002) Support approximations using bonferroni-type inequalities. In: 6th European conference on principles of data mining and knowledge discovery

  • Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: IEEE International conference on data mining

  • Kryszkiewicz M (2005) Generalized disjunction-free representation of frequent patterns with negation. J Exp Theor Artif Intell 17(1/2):63–82

    Article  Google Scholar 

  • Li G, Zaki MJ (2012) Sampling minimal frequent boolean (DNF) patterns. In: 18th ACM SIGKDD international conference on knowledge discovery and data mining

  • Loekito E, Bailey J (2006) Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In: ACM SIGKDD international conference on knowledge discovery and data mining

  • Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: International conference on knowledge discovery and data mining

  • Mitchell T (1982) Generalization as search. Artif Intell 18:203–226

    Article  Google Scholar 

  • Nanavati A, Chitrapura K, Joshi S, Krishnapuram R (2001) Association rule mining: Mining generalised disjunctive association rules. In: ACM international conference on information and knowledge management

  • Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm R (Aug. 2004) Turning CARTwheels: an alternating algorithm for mining redescriptions. In: ACM SIGKDD international conference on knowledge discovery and data mining

  • Rubinstein RY, Kroese DK (2008) Simulation and the Monte Carlo method, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Savasere A, Omiecinski E, Navathe S (1998) Mining for strong negative associations in a large database of customer transactions. In: IEEE International conference on data engineeging

  • Shima Y, Mitsuishi S, Hirata K, Harao M (2004) Extracting minimal and closed monotone dnf formulas. In: International conference on discovery science

  • Stumme G, Taouil R, Bastide Y, Pasquier N, Lakhal L (2002) Computing iceberg concept lattices with titanic. Data Knowl Eng 42(2):189–222

    Article  MATH  Google Scholar 

  • Veloso A, Meira W, Zaki MJ (2006) Lazy associative classification. In: IEEE International conference on data mining

  • Vimieiro R, Moscato P (2012) Mining disjunctive minimal generators with titanicor. Expert Syst Appl 39(9):8228–8238

    Article  Google Scholar 

  • Vimieiro R, Moscato P (2014) Disclosed: an efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data. Inform Sci 280:171–187

    Article  MathSciNet  Google Scholar 

  • Vreeken J, Van Leeuwen M, Siebes A (2011) Krimp: mining itemsets that compress. Data Min Knowl Discov 23(1):169–214

    Article  MATH  MathSciNet  Google Scholar 

  • Wu X, Zhang C, Zhang S (2004) Efficient mining of both positive and negative association rules. ACM Trans Inform Syst 22(3):381–405

    Article  Google Scholar 

  • Yuan X, Buckles BP, Yuan Z, Zhang J (2002) Mining negative association rules. In: 7th international symposium on computers and communications

  • Zaki M, Ramakrishnan N (2005) Reasoning about sets using redescription mining. In: ACM SIGKDD international conference on knowledge discovery and data mining

  • Zaki MJ (Aug. 2000) Generating non-redundant association rules. In: 6th ACM SIGKDD international conference on knowledge discovery and data mining

  • Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478

    Article  Google Scholar 

  • Zaki MJ, Ramakrishnan N, Zhao L (2010) Mining frequent boolean expressions: application to gene expression and regulatory modeling. Int J Knowl Discov Bioinform 1(3):68–96 Special issue on mining complex structures in biology

    Article  Google Scholar 

  • Zhao L, Zaki MJ, Ramakrishnan N (2006) Blosom: a framework for mining arbitrary boolean expressions. In: 12th ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Acknowledgments

This work was supported in part by NSF Awards CCF-1240646 and IIS-1302231.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed J. Zaki.

Additional information

Responsible editor: Bart Goethals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Zaki, M.J. Sampling frequent and minimal boolean patterns: theory and application in classification. Data Min Knowl Disc 30, 181–225 (2016). https://doi.org/10.1007/s10618-015-0409-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0409-y

Keywords

Navigation